Question

infile = open('file.txt', 'r')
string = infile.read()

def extract_edu(string):
    with open('totaleducation.txt', 'r') as totaledu:
        edu_set=[]
        for edu in totaledu:
            if edu in string:
                print(edu)
                edu_set.append(edu)
    return edu_set

我想从totaleducation文件中匹配的字符串中提取单词。如果返回正确，如果它在像BCA这样的单词中，但是当我像MCA（计算机应用程序的主）那样提取时，它会忽略这一行。

String is just a document text file like ACADEMICS:

Year
Degree
Institute/College
University
CGPA/Percentage
2016
MSc (Computer-Science)
South Asian University
South Asian University
6.6/9
2012
BCA
Ignou, Patna
IGNOU
65
2009
Class XII(Science)
BSSRPP Inter College Deoria
BHSIE
61
2006
Class X
Buxar High School
BSEB
67.8
, and totaleducation.txt is just like 
MSc
BCA
MCA
Master's of Science

Answer 1

经过长时间的讨论，我们澄清了问题：

我有一个名为totaleducation.txt的文本文件，另一个名为sample.txt的text / csv文件;我希望totaleducation.txt中找到sample.txt中也存在的所有单词。

因此，您应该逐行阅读totaleducation.txt，并检查这些行中是否存在sample.txt的任何一行。

def match():
    words = []
    with open('totaleducation.txt', 'r') as f1:
        for edu in f1:
            with open('sample.txt', 'r') as f2:
                for string in f2:
                    if edu.strip('\n') in string.strip('\n'):
                        words.append(edu.strip('\n'))
    return words

致电match()将为您提供totaleducation.txt的所有字词，这些字词也存在于sample.txt的任意一行。

注意.strip('\n')。在file1中说你有'MSc'而在file2你有'MSc（计算机科学）'。如果省略.strip('\n')，则无法验证“MSc”是否在“MSc（计算机科学）”中。因为实际上这两行是'MSc \ n'和'MSc（计算机科学）\ n'，而第一行不在第二行。

做同样事情的第二种方式 - 如果你的文件不是太大而不会导致内存问题 - ：

education = []
with open('totaleducation.txt', 'r') as f1:
    for line in f1:
        education.append(line.strip('\n'))

sample = []
with open('sample.txt', 'r') as f2:
    for line in f2:
        sample.append(line.strip('\n'))

match = [e for e in education for s in sample if e in s]

如何在python中获取与文件中的单词完全匹配的单词

1 个答案: