比较两个csv文件和输出有什么区别?

时间:2018-04-29 06:35:46

标签: python csv difference

我正在比较两个csv文件,但update.csv文件与new.csv相同

import csv

with open('old.csv', 'r') as t1:
    old_csv = t1.readlines()

with open('new.csv', 'r') as t2:
    new_csv = t2.readlines()

with open('update.csv', 'w') as out_file:
        line_in_new = 0
        line_in_old = 0
        while line_in_new < len(new_csv) and line_in_old < len(old_csv):
            if old_csv[line_in_old] != new_csv[line_in_new]:
                out_file.write(new_csv[line_in_new])
            else:
        line_in_old += 1
    line_in_new += 1

我希望输出与样本相同。

示例:

输入:

old.csv

a,b,c
1,2,3
4,5,6
8,9,9

new.csv

a,b,c
1,2,3
5,6,7
8,9,7

输出:

update.csv

4,5,6,deleted
5,6,7,new added 
8,9,9,change

请帮助我在update.csv

上找到唯一的区别

1 个答案:

答案 0 :(得分:2)

使用pandas的解决方案:

import pandas as pd

df1 = pd.read_csv('old.csv')
df2 = pd.read_csv('new.csv')

df1['flag'] = 'old'
df2['flag'] = 'new'

df = pd.concat([df1, df2])

dups_dropped = df.drop_duplicates(df.columns.difference(['flag']), keep=False)
dups_dropped.to_csv('update.csv', index=False)

输入

<强> old.csv

a,b,c
1,2,3
4,5,6

<强> new.csv

a,b,c
1,2,3
5,6,7

输出

<强> update.csv

a,b,c,flag
4,5,6,old
5,6,7,new