如何简化和格式化此功能?

时间:2014-08-22 07:23:17

标签: python-3.x split simplify

所以我有这个凌乱的代码,我想从frankenstein.txt获取每个单词,按字母顺序排序,删除一个和两个字母单词,并将它们写入一个新文件。

def Dictionary():

    d = []
    count = 0

    bad_char = '~!@#$%^&*()_+{}|:"<>?\`1234567890-=[]\;\',./ '
    replace = ' '*len(bad_char)
    table = str.maketrans(bad_char, replace)

    infile = open('frankenstein.txt', 'r')
    for line in infile:
        line = line.translate(table)
        for word in line.split():
            if len(word) > 2:
                d.append(word)
                count += 1
    infile.close()
    file = open('dictionary.txt', 'w')
    file.write(str(set(d)))
    file.close()

Dictionary() 

如何简化它并使其更具可读性,以及如何在新文件中将文字垂直写入(它在水平列表中写入):

abbey
abhorred
about
etc....

2 个答案:

答案 0 :(得分:0)

以下几项改进:

from string import digits, punctuation

def create_dictionary():

    words = set()

    bad_char = digits + punctuation + '...' # may need more characters
    replace = ' ' * len(bad_char)
    table = str.maketrans(bad_char, replace)

    with open('frankenstein.txt') as infile:
        for line in infile:
            line = line.strip().translate(table)
            for word in line.split():
                if len(word) > 2:
                    words.add(word)

    with open('dictionary.txt', 'w') as outfile:
        outfile.writelines(sorted(words)) # note 'lines'

一些注意事项:

  • 关注the style guide
  • string包含可用于提供“不良字符”的常量;
  • 您从未使用count(无论如何只是len(d));
  • 使用with上下文管理器进行文件处理;和
  • 从一开始就使用set可以防止重复,但不会对它们进行排序(因此sorted)。

答案 1 :(得分:0)

使用 re 模块。

import re

words = set()

with open('frankenstein.txt') as infile:
    for line in infile:
        words.extend([x for x in re.split(r'[^A-Za-z]*', line) if len(x) > 2])

with open('dictionary.txt', 'w') as outfile:
    outfile.writelines(sorted(words))

re.split 中的 r'[^ A-Za-z] *',将'A-Za-z'替换为您想要的字符包含在dictionary.txt中。