两个文本文件之间的差异和交叉报告

时间:2013-04-29 22:42:51

标签: python list shell compare

免责声明:我是编程和脚本编程的新手,所以请原谅缺乏技术术语

所以我有两个包含列出名称的文本文件数据集:

First File | Second File
bob        | bob
mark       | mark
larry      | bruce
tom        | tom

我想运行一个脚本(pref python),它输出一个文本文件中的交叉线和另一个文本文件中的不同行,例如:

matches.txt

bob 
mark 
tom 

differences.txt

bruce

我如何用Python实现这一目标?或者使用Unix命令行,如果它很容易吗?

6 个答案:

答案 0 :(得分:16)

排序| uniq很好,但是comm可能会更好。 “man comm”了解更多信息。

从手册页:

EXAMPLES
       comm -12 file1 file2
              Print only lines present in both file1 and file2.

       comm -3 file1 file2
              Print lines in file1 not in file2, and vice versa.

您也可以使用Python集类型,但comm更容易。

答案 1 :(得分:9)

Unix shell解决方案 - :

# duplicate lines
sort text1.txt text2.txt | uniq -d

# unique lines
sort text1.txt text2.txt | uniq -u

答案 2 :(得分:4)

words1 = set(open("some1.txt").read().split())
words2 = set(open("some2.txt").read().split())

duplicates  = words1.intersection(words2)
uniques = words1.difference(words2).union(words2.difference(words1))

print "Duplicates(%d):%s"%(len(duplicates),duplicates)
print "\nUniques(%d):%s"%(len(uniques),uniques)
至少

这样的东西

答案 3 :(得分:1)

Python字典是O(1)或非常接近,换句话说它们非常快(但如果您要编制索引的文件很大,它们会占用大量内存)。所以首先在第一个文件中读取并构建一个字典,如:

left = [x.strip() for x in open('left.txt').readlines()]

列表理解和strip()是必需的,因为readlines将带有尾随换行的行保持原样。这将创建文件中所有项目的列表,假设每行一个(如果它们都在一行上,则使用.split)。

现在建立一个字典:

ldi = dict.fromkeys(left)

这将构建一个字典,列表中的项目为键。这也涉及重复。现在遍历第二个文件并检查密钥是否在dict中:

matches = open('matches.txt', 'w')
uniq = open('uniq.txt', 'w')
for l in open('right.txt').readlines():
    if l.strip() in ldi:
        # write to matches
        matches.write(l)
    else:
        # write to uniq
        uniq.write(l)
matches.close()
uniq.close()

答案 4 :(得分:0)

>>> with open('first.txt') as f1, open('second.txt') as f2:
        w1 = set(f1)
        w2 = set(f2)


>>> with open('matches.txt','w') as fout1, open('differences.txt','w') as fout2:
        fout1.writelines(w1 & w2)
        fout2.writelines(w2 - w1)


>>> with open('matches.txt') as f:
        print f.read()


bob
mark
tom
>>> with open('differences.txt') as f:
        print f.read()


bruce

答案 5 :(得分:0)

用水平线制作一个;


file_1_list = []

with open(input('Enter the first file name: ')) as file:
    file_1 = file.read() 

    file.seek(0) 

    lines = file.readlines()
    for line in lines:
        line = line.strip()
        file_1_list.append(line)

 with open(input('Enter the second file name: ')) as file:
    file_2 = file.read()
    file.seek(0)
    lines = file.readlines()
    for line in lines:
        line = line.strip()

if file_1 == file_2:
    print("Yes")

else:
        print("No")
        print(file_1)
        print("--------------")
        print(file_2)