如何比较两个文件并显示文本和INT

时间:2018-05-18 07:50:54

标签: csv python-3.6

我正在尝试创建一个程序,它将比较两个csv文件并在新的csv文件中显示结果。 在csv文件中,单元格也具有文本值和整数值。我想如果发生更改并且单元格值是TEXT,它应该在新的csv文件中对该值附加True,如果发生更改并且单元格值为Integer,则应该附加此文本"结果为正:更改值& #34;和"结果是否定的:价值变化"

以下是代码:

import csv
with open('book1.csv', 'r') as t1:
    old_csv = t1.readlines()
with open('book2.csv', 'r') as t2:
    new_csv = t2.readlines()

with open('update.csv', 'w') as out_file:
    line_in_new = 0
    line_in_old = 0
    while line_in_new < len(new_csv) and line_in_old < len(old_csv):
        if old_csv[line_in_old] != new_csv[line_in_new]:
            out_file.write(new_csv[line_in_new])
        else:
            line_in_old += 1
        line_in_new += 1

请指导。

EDITED

您好我也尝试了不同的方法,但收到了KeyError:&#34; [&#39; XID&#39;]不在索引&#34;

请查看同一主题的我的其他代码

import pandas as pd

file1 = 'Book1.csv'
file2 = 'Book2.csv'
file3 = 'update.csv'

cols_to_show = ['XID', 'TCO', 'Payment Plan','Livable Area','Brochure', 'Banks']

old = pd.read_csv(file1)
new = pd.read_csv(file2)


def report_diff(x):
    return x[0] if x[1] == x[0] else '{0} --> {1}'.format(*x)


old['version'] = 'old'
new['version'] = 'new'

full_set = pd.concat([old, new], ignore_index=True)

changes = full_set.drop_duplicates(subset=cols_to_show, keep='last')

dupe_names = changes.set_index('XID').index.get_duplicates()

dupes = changes[changes['XID'].isin(dupe_names)]

change_new = dupes[(dupes['version'] == 'new')]
change_old = dupes[(dupes['version'] == 'old')]

change_new = change_new.drop(['version'], axis=1)
change_old = change_old.drop(['version'], axis=1)

change_new.set_index('XID', inplace=True)
change_old.set_index('XID', inplace=True)

diff_panel = pd.Panel(dict(df1=change_old, df2=change_new))
diff_output = diff_panel.apply(report_diff, axis=0)

changes['duplicate'] = changes['XID'].isin(dupe_names)
removed_names = changes[(changes['duplicate'] == False) & (changes['version'] == 'old')]
removed_names.set_index('XID', inplace=True)
new_name_set = full_set.drop_duplicates(subset=cols_to_show)

new_name_set['duplicate'] = new_name_set['XID'].isin(dupe_names)

added_names = new_name_set[(new_name_set['duplicate'] == False) & (new_name_set['version'] == 'new')]
added_names.set_index('XID', inplace=True)
print(added_names)
df = pd.concat([diff_output, removed_names, added_names], keys=('changed', 'removed', 'added'))
print(df)
df[cols_to_show].to_csv(file3)

0 个答案:

没有答案