逐行比较两个文件

时间:2016-03-03 14:09:01

标签: python

我有这个程序只需要两个文件并逐行比较。只要两个文件具有相同的行数,它就可以正常工作。我的问题是如果例如file2有多行而不是file1?或者相反。当发生这种情况时,我得到IndexError:list index超出范围的错误。我该怎么做才能考虑到这一点?

#Compares two files
def compare(baseline, newestFile):



    baselineHolder = open(baseline)
    newestFileHolder = open(newestFile)



    lines1 = baselineHolder.readlines()
    a = returnName(baseline)
    b = returnName(newestFile)


    for i,lines2 in enumerate(newestFileHolder):
        if lines2 != lines1[i]:
            add1 = i + 1
            print ("line ", add1, " in newestFile is different \n")
            print("TAKE A LOOK HERE----------------------TAKE A LOOK HERE")
            print (lines2)
        else:
            addRow = 1 + i
            print ("line  " + str(addRow) + " is identical")

3 个答案:

答案 0 :(得分:4)

为什么不使用内置的difflib而不是重新发明轮子?以下是使用文档中的difflib.unified_diff的示例:

>>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
>>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
>>> for line in unified_diff(s1, s2, fromfile='before.py', tofile='after.py'):
...     sys.stdout.write(line)   
--- before.py
+++ after.py
@@ -1,4 +1,4 @@
-bacon
-eggs
-ham
+python
+eggy
+hamster
 guido

答案 1 :(得分:1)

也许您可以使用itertools.izip_longest。如果一个序列已用尽,则会发出一些填充值(默认情况下为None):

import itertools

for l, r in itertools.izip_longest(open('foo.txt'), open('bar.txt')):
    if l is None: # foo.txt has been exhausted
        ...
    elif r is None: # bar.txt has been exhausted
        ...
    else: # both still have lines - compare now the content of l and r
        ...

编辑正如@danidee正确指出的那样,对于Py3,它是zip_longest

答案 2 :(得分:1)

您应该抓住IndexError,然后停止比较

    for i,lines2 in enumerate(newestFileHolder):
        try:
            if lines2 != lines1[i]:
                add1 = i + 1
                print ("line ", add1, " in newestFile is different \n")
                print("TAKE A LOOK HERE----------------------TAKE A LOOK HERE")    
                print (lines2)
            else:
                addRow = 1 + i
                print ("line  " + str(addRow) + " is identical")
        except IndexError:
            print("Exit comparison")
            break