Question

我有两个文本文件。第一个文件包含英语句子，第二个文件包含许多英语单词（词汇）。我想从词汇表中不存在的第一个文件的句子中删除这些单词，然后将处理后的文本保存回第一个文件中。

我编写了代码，从中可以获得包含第二个文件（词汇）中不可用的单词的句子。

这是我的代码：

s = open('eng.txt').readlines()

for i in s:

print(i)

for word in i.split(' '):
    print(word)
    if word in open("vocab30000.txt").read():
        print("Word exist in vocab")
    else:

        #print("I:", i)
        print("Word does not exist")
        #search_in_file_func(i)
        print("I:", i)
        file1 = open("MyFile.txt","a+") 
        if i in file1:
            print("Sentence already exist")
        else:
            file1.write(i)

但是，我无法删除这些单词。

Answer 1

这应该有效：

with open('vocab30000.txt') as f:
    vocabulary = set(word.strip() for word in f.readlines())

with open('eng.txt', 'r+') as f:
    data = [line.strip().split(' ') for line in f.readlines()]
    removed = [[word for word in line if word in vocabulary] for line in data]
    result = '\n'.join(' '.join(word for word in line) for line in removed)
    f.seek(0)
    f.write(result)
    f.truncate()

Answer 2

#Read the two files

with open('vocab30000.txt') as f:
    vocabulary = f.readlines()

with open('eng.txt', 'r+') as f:
    eng = f.readlines()

vocab_sentences = [i.split(" ") for i in vocabulary]
eng = [i.split(" ") for i in eng]

cleaned_sentences = []
# loop over the sentences and exclude words in eng
for sent in vocab_sentences:
    cleaned_sentences.append(" ".join([i for i in sent if i not in eng]))
#write the file
with open('vocab30000.txt', 'w') as f:
    f.writelines(cleaned_sentences)

Answer 3

您可以尝试使用此代码。如果文件较大，我尝试不使用任何循环来保存运行时。

def clean(i, queue):
    details = {}
    for index, column in i.iterrows():
        for key,val in column.items():
            if isinstance(val, str):
                details[" ".join(key.split()).replace(" ","_").replace('.','').lower()] =  " ".join(val.split())
            else:
                details[" ".join(key.split()).replace(" ","_").replace('.','').lower()] = val
        queue.put(details)
        # queue.task_done()


    return queue

如何从文本文件中的句子中删除特定单词？

3 个答案: