比较两个文件并替换

时间:2014-02-26 13:59:11

标签: python

file1中有超过1000行,例如:

:)
still good
not
candy....wasn't even the good stuff.
how could i ever forget? #biggestdayoftheyear
not even think
will be

file2中有超过1000行,例如:

1,even,2
2,be,1
3,good,2
4,:),1
5,forget?,1
6,i,1
7,stuff.,1
8,#biggestdayoftheyear,1
9,think,1
10,will,1
11,how,1
12,not,2
13,the,1
14,still,1
15,ever,1
16,could,1
17,candy....wasn't,1

代码:

file1 = 'C:/Users/Desktop/file1.txt'
file2 = 'C:/Users/Desktop/file2.txt'

with open(file1) as f1:
    for line1 in f1:
        sline1 = str(line1.strip().split(' '))
        print sline1

with open(file2) as f2:
    for line2 in f2:
        sline2 = line2.split(',')
        #print sline2[0], sline2[1]
        if sline2[1] in sline1:
            print sline1.replace(sline1, sline2[0])

结果仅显示以下代码:

2
6
10

我错过了什么?有什么建议吗?

我想从第二列检查它们是否是相同的单词后,将file1中的所有单词替换为file2中第1列的数字。

预期结果:

4
14 3
12
17 1 13 3 7
1 16 6 15 5 8
12 1 9
10 2

2 个答案:

答案 0 :(得分:1)

您需要从file2构建inverted index

inverted_index = {}
with open(file2) as f2:
   for line in f2:
       key, value, _ = line.split(',')
       inverted_index[value] = key

然后,在循环遍历file1时使用该反向索引进行检查:

with open(file1) as f1:
    for line in f1:
        print ' '.join([inverted_index.get(word, word) for word in line.strip().split(' ')])

答案 1 :(得分:0)

我注意到你循环浏览文件1并明确设置sline1。退出循环后,循环访问文件2进行比较。因此,您将只处理sline1的最后一个值(因为您退出该循环)。一旦你构建Menno所示的字典倒排索引,你就可以设置替换过程。