infile = open('file.txt', 'r')
string = infile.read()
def extract_edu(string):
with open('totaleducation.txt', 'r') as totaledu:
edu_set=[]
for edu in totaledu:
if edu in string:
print(edu)
edu_set.append(edu)
return edu_set
我想从totaleducation文件中匹配的字符串中提取单词。如果返回正确,如果它在像BCA这样的单词中,但是当我像MCA(计算机应用程序的主)那样提取时,它会忽略这一行。
String is just a document text file like ACADEMICS:
Year
Degree
Institute/College
University
CGPA/Percentage
2016
MSc (Computer-Science)
South Asian University
South Asian University
6.6/9
2012
BCA
Ignou, Patna
IGNOU
65
2009
Class XII(Science)
BSSRPP Inter College Deoria
BHSIE
61
2006
Class X
Buxar High School
BSEB
67.8
, and totaleducation.txt is just like
MSc
BCA
MCA
Master's of Science
答案 0 :(得分:0)
经过长时间的讨论,我们澄清了问题:
我有一个名为totaleducation.txt
的文本文件,另一个名为sample.txt
的text / csv文件;我希望totaleducation.txt
中找到sample.txt
中也存在的所有单词。
因此,您应该逐行阅读totaleducation.txt
,并检查这些行中是否存在sample.txt
的任何一行。
def match():
words = []
with open('totaleducation.txt', 'r') as f1:
for edu in f1:
with open('sample.txt', 'r') as f2:
for string in f2:
if edu.strip('\n') in string.strip('\n'):
words.append(edu.strip('\n'))
return words
致电match()
将为您提供totaleducation.txt
的所有字词,这些字词也存在于sample.txt
的任意一行。
注意.strip('\n')
。在file1中说你有'MSc'而在file2你有'MSc(计算机科学)'。如果省略.strip('\n')
,则无法验证“MSc”是否在“MSc(计算机科学)”中。因为实际上这两行是'MSc \ n'和'MSc(计算机科学)\ n',而第一行不在第二行。
做同样事情的第二种方式 - 如果你的文件不是太大而不会导致内存问题 - :
education = []
with open('totaleducation.txt', 'r') as f1:
for line in f1:
education.append(line.strip('\n'))
sample = []
with open('sample.txt', 'r') as f2:
for line in f2:
sample.append(line.strip('\n'))
match = [e for e in education for s in sample if e in s]