我在比较CSV文件中的行时遇到了问题。
我可以使用带有len()的csv.reader并且它工作正常,但我必须在密钥上对文件进行排序。
我有唯一的键,所以我想使用DictReader,但len()似乎读取dict中的所有值,包括空单元格:
with open (baseline, 'r') as baselineF:
readBaseline=csv.DictReader(baselineF, delimiter=',', quotechar='"')
for rowb in readBaseline:
print('rowb: ',len(rowb))
with open (tested, 'r') as testedF:
readTested=csv.DictReader(testedF, delimiter=',', quotechar='"')
for rowt in readTested:
print ('rowt: ', len(rowt))
# Rows are the same len
if len(rowb)==len(rowt):
writerSameOracle.writerow(rowb)
writerSameHPCC.writerow(rowt)
print ('Rows are the same')
break
使用此代码即使行具有相同数量的填充单元格,它似乎也会将len()=返回到每个文件中的标题数。
答案 0 :(得分:1)
你正在做的事似乎有点令人困惑,但过滤掉任何可疑的东西都是微不足道的:
>>> rowb = [1,2,0,3]
# using list comprehension
>>> len([x for x in rowb if x])
3
# alternatively using filter in Python 2
>>> len(filter(None, rowb))
3
答案 1 :(得分:0)
所以我决定将dict的值加载到list然后计算len()。基于此使用适当的if语句来完成这项工作。
with open (baseline, 'r') as baselineF:
readBaseline=csv.DictReader(baselineF,delimiter=',', quotechar='"')
for rowb in readBaseline:
with open (tested, 'r') as testedF:
readTested=csv.DictReader(testedF, delimiter=',', quotechar='"')
for rowt in readTested:
if rowt['key'] == rowb['key']:
for value in rowb.values():
list1.append(value)
cleaned1 = [x for x in list1 if x != None]
list1=[]
for value in rowt.values():
list2.append(value)
cleaned2 = [x for x in list2 if x != None]
list1=[]
#rowb baseline
#rowt tested
#Rows are the same len
if len(cleaned1)==len(cleaned2):
writerSameOracle.writerow(rowb)
writerSameHPCC.writerow(rowt)
print ('Rows are the same)
break