如何比较从csv文件生成的嵌套列表?

时间:2019-01-31 16:20:39

标签: python csv

我最初是从52行格式为[名称,属性1,属性2]的csv文件开始的。我已经导入了csv文件,并为每行创建了所有可能的size组合,每个组合的大小为2,所以我有类似这样的列表:

([Bill, Long, Blonde], [Sally, Short, Blonde]),
([Bobby, Long, Brown], [James, Short, Orange])

等...

我希望能够比较属性1和属性2,然后最终对其加权,这样我就可以找到2个组中最共有的组。我正在努力寻找一种方法,可以轻松地比较属性1和2,而无需首先拆除组。

我到目前为止编写的代码如下:

import csv
from itertools import combinations


with open('dc.csv', 'r') as f:

  csvreader = csv.reader(f)

  comb = combinations(csv.reader(f), 2)

  for i in list(comb):
    print (i)

编辑: 我想要的输出是按照最佳匹配顺序与最小匹配集的顺序排列列表。像这样:

([James, Short, Orange], [Bridgett, Short, Orange], 2)
([Bill, Long, Blonde], [Sally, Short, Blonde], 1),
([Bobby, Long, Brown], [James, Short, Orange], 0),

那是因为James和Bridgett在头发颜色(1)和头发长度(1)上都匹配,所以它们的得分为2,依此类推。这样一来,我便可以按匹配程度最高和匹配程度最低的顺序对它们进行排序。

2 个答案:

答案 0 :(得分:0)

据我了解,您想要的是计算每个元素的“相似度”。

这就是我所做的:

a = [['Bill', 'Long','Blonde'], ['Sally', 'Short', 'Blonde'], ['Bobby', 'Long', 'Brown'], ['James', 'Short', 'Orange']]

def likenessCalculator(groupA, indexA, groupB, indexB):
  # Function that calculates how close the attributes are
  likeness = 0
  if groupA[1] == groupB[1]:
    likeness += 1
  if groupA[2] == groupB[2]:
    likeness += 1
  return (groupA, groupB, likeness)

results = []

for idx, element in enumerate(a):
  for idx2, element2 in enumerate(a):
    # Here I iterate through the array, and for every element, I compare it with each other element
    # This code doesn't remove duplicates yet, but it shouldn't be hard to implement.
    results.append(likenessCalculator(element, idx, element2, idx2))

print(results)

答案 1 :(得分:0)

据我了解,一旦您从csv文件中获取列表,就可以尝试执行以下操作:

the_list = [([Bill, Long, Blonde], [Sally, Short, Blonde]), ([Bobby, Long, Brown], [James, Short, Orange])]
result= []
for names in the_list:
    n = len(set(names[0]).intersection(set(names[1])))
    new = list(names)
    new.append(n)
    result.append(new)
print(result)

您将在result变量中找到最终列表

相关问题