Question

所以我有这个凌乱的代码，我想从frankenstein.txt获取每个单词，按字母顺序排序，删除一个和两个字母单词，并将它们写入一个新文件。

def Dictionary():

    d = []
    count = 0

    bad_char = '~!@#$%^&*()_+{}|:"<>?\`1234567890-=[]\;\',./ '
    replace = ' '*len(bad_char)
    table = str.maketrans(bad_char, replace)

    infile = open('frankenstein.txt', 'r')
    for line in infile:
        line = line.translate(table)
        for word in line.split():
            if len(word) > 2:
                d.append(word)
                count += 1
    infile.close()
    file = open('dictionary.txt', 'w')
    file.write(str(set(d)))
    file.close()

Dictionary()

如何简化它并使其更具可读性，以及如何在新文件中将文字垂直写入（它在水平列表中写入）：

abbey
abhorred
about
etc....

Answer 1

以下几项改进：

from string import digits, punctuation

def create_dictionary():

    words = set()

    bad_char = digits + punctuation + '...' # may need more characters
    replace = ' ' * len(bad_char)
    table = str.maketrans(bad_char, replace)

    with open('frankenstein.txt') as infile:
        for line in infile:
            line = line.strip().translate(table)
            for word in line.split():
                if len(word) > 2:
                    words.add(word)

    with open('dictionary.txt', 'w') as outfile:
        outfile.writelines(sorted(words)) # note 'lines'

一些注意事项：

关注the style guide
string包含可用于提供“不良字符”的常量;
您从未使用count（无论如何只是len(d)）;
使用with上下文管理器进行文件处理;和
从一开始就使用set可以防止重复，但不会对它们进行排序（因此sorted）。

Answer 2

使用 re 模块。

import re

words = set()

with open('frankenstein.txt') as infile:
    for line in infile:
        words.extend([x for x in re.split(r'[^A-Za-z]*', line) if len(x) > 2])

with open('dictionary.txt', 'w') as outfile:
    outfile.writelines(sorted(words))

从 re.split 中的 r'[^ A-Za-z] *'，将'A-Za-z'替换为您想要的字符包含在dictionary.txt中。

如何简化和格式化此功能？

2 个答案: