我有两个CSV文件,它们虽然不同但相似。我想比较它们并输出更改,以及是否已添加或删除变量。我想以新的CSV或文本文件输出更改。
下面是一些我已经尝试过的代码以及两个csv文件。我也愿意使用difflib并将其输出到文本文件。
file1.csv:
name1,2.0001
name2,3.4010
name4,4.0000
name5,1.0000
name6,1.0000
name8,1.9001
name10,2.7654
file2.csv:
name1,3.0000
name2,3.4010
name3,1.0000
name5,1.0901
name6,1.0000
name7,3.4445
name11,8.0009
name12,0.1180
这是我尝试过的代码:
with open('file1.csv', 'r') as file1, open('file2.csv', 'r') as file2:
file1 = file1.readlines()
file2 = file2.readlines()
with open('new_file.csv', 'w') as outFile:
for line in file2:
if line not in file1:
outFile.write(line)
预期的输出将是显示以下内容的csv文件或文本文件:
name1 value changed from 2.0001 to 3.0000
name3 value added
name4 value removed
name5 value changed from 1.0000 to 1.0901
name7 value added
name8 value removed
name10 value removed
name11 value added
name12 value added
答案 0 :(得分:1)
我的解决方案是将每个csv转换成字典,第一列作为键,第二列作为值。之后,我可以遍历各个键并确定是否更改,删除或添加了相应的值。
import csv
import re
def csv2dict(filename):
with open(filename) as file_handle:
reader = csv.reader(file_handle)
dict_object = dict(reader)
return dict_object
def separate_text_and_number(value):
text, number = re.match(r'(\D+)(\d+)', value).groups()
number = int(number)
return (text, number)
def main():
""" Entry """
csv1 = csv2dict('file1.csv')
csv2 = csv2dict('file2.csv')
all_keys = csv1.keys() | csv2.keys()
for key in sorted(all_keys, key=separate_text_and_number):
if key not in csv2:
print(f'{key} value removed')
elif key not in csv1:
print(f'{key} value added')
elif csv1[key] != csv2[key]:
print(f'{key} value changed from {csv1[key]} to {csv2[key]}')
if __name__ == '__main__':
main()
name1 value changed from 2.0001 to 3.0000
name3 value added
name4 value removed
name5 value changed from 1.0000 to 1.0901
name7 value added
name8 value removed
name10 value removed
name11 value added
name12 value added
csv2dict
将打开一个文件并将其内容转换为字典separate_text_and_number
将name14
拆分为('name', 14)
,以帮助对键进行排序dict.keys()
方法返回一个包含所有键的类集合对象。我使用|
运算符来查找两组键的并集。separate_text_and_number
答案 1 :(得分:0)
使用文件比较工具,例如diff(1)
在Unix / Linux下。
答案 2 :(得分:0)
您要比较两个表。关系数据库是完成这项工作的正确工具。
您的代码段使用Python。 Python内置有sqlite3数据库引擎,但我看不出有理由将python用于您请求的简单处理任务。
相反,我会用sqlite3
本身(用外壳脚本包装)来完成此操作:
#!/bin/bash
# compare-CSVs.bash
sqlite3 <<EOF
.mode csv
.header on
.separator ',' "\n"
-- import data:
.import file1.csv file1
.import file2.csv file2
-- sadly sqlite does not support full joins, so we will augment left join with data missing from file1.csv:
create table data as
select
file1.*
, file2.*
from (
select
name as file1_name
, value as file1_value
from file1
) file1
left join (
select
name as file2_name
, value as file2_value
from file2
) file2
on file2.file2_name == file1.file1_name
union all
select
file1.*
, file2.*
from file2
left join file1
on file1.name == file2.name
where file1.name is null
;
-- output to stdout:
select
file1_name || ' value removed' as "changes:"
from data
where file2_name is null
union all
select
file2_name || ' value added'
from data
where file1_name is null
union all
select
file1_name || ' value changed from ' || file1_value || ' to ' || file2_value
from data
where file1_value != file2_value
;
.exit
EOF
MWE:
cat > file1.csv <<EOF
name,value
name1,2.0001
name2,3.4010
name4,4.0000
name5,1.0000
name6,1.0000
name8,1.9001
name10,2.7654
EOF
cat > file2.csv <<EOF
name,value
name1,3.0000
name2,3.4010
name3,1.0000
name5,1.0901
name6,1.0000
name7,3.4445
name11,8.0009
name12,0.1180
EOF
./compare-CSVs.bash
输出:
changes:
"name4 value removed"
"name8 value removed"
"name10 value removed"
"name3 value added"
"name7 value added"
"name11 value added"
"name12 value added"
"name1 value changed from 2.0001 to 3.0000"
"name5 value changed from 1.0000 to 1.0901"