在python中:两个列表之间的差异

时间:2013-04-21 02:48:44

标签: python list compare

我有两个像这样的列表

found = ['CG', 'E6', 'E1', 'E2', 'E4', 'L2', 'E7', 'E5', 'L1', 'E2BS', 'E2BS', 'E2BS', 'E2', 'E1^E4', 'E5']
expected = ['E1', 'E2', 'E4', 'E1^E4', 'E6', 'E7', 'L1', 'L2', 'CG', 'E2BS', 'E3']

我想找到两个列表之间的差异 我做完了

list(set(expected)-set(found))

list(set(found)-set(expected))

分别返回['E3']['E5']

然而,我需要的答案是:

'E3' is missing from found.
'E5' is missing from expected.
There are 2 copies of 'E5' in found.
There are 3 copies of 'E2BS' in found.
There are 2 copies of 'E2' in found.

欢迎任何帮助/建议!

3 个答案:

答案 0 :(得分:8)

collections.Counter类将擅长枚举多重集之间的差异:

>>> from collections import Counter
>>> found = Counter(['CG', 'E6', 'E1', 'E2', 'E4', 'L2', 'E7', 'E5', 'L1', 'E2BS', 'E2BS', 'E2BS', 'E2', 'E1^E4', 'E5'])
>>> expected = Counter(['E1', 'E2', 'E4', 'E1^E4', 'E6', 'E7', 'L1', 'L2', 'CG', 'E2BS', 'E3'])
>>> list((found - expected).elements())
['E2', 'E2BS', 'E2BS', 'E5', 'E5']
>>> list((expected - found).elements())

您可能也对difflib.Differ感兴趣:

>>> from difflib import Differ
>>> found = ['CG', 'E6', 'E1', 'E2', 'E4', 'L2', 'E7', 'E5', 'L1', 'E2BS', 'E2BS', 'E2BS', 'E2', 'E1^E4', 'E5']
>>> expected = ['E1', 'E2', 'E4', 'E1^E4', 'E6', 'E7', 'L1', 'L2', 'CG', 'E2BS', 'E3']
>>> for d in Differ().compare(expected, found):
...     print(d)

+ CG
+ E6
  E1
  E2
  E4
+ L2
+ E7
+ E5
+ L1
+ E2BS
+ E2BS
+ E2BS
+ E2
  E1^E4
+ E5
- E6
- E7
- L1
- L2
- CG
- E2BS
- E3

答案 1 :(得分:4)

利用Python set classCounter class而不是滚动自己的解决方案:

  1. symmetric_difference:找到一组或另一组中的元素,但不是两者。
  2. intersection:找到与这两组相同的元素。
  3. difference:这实际上就是你从另一个
  4. 中减去一个集合所做的

    代码示例

    • found.difference(expected) # set(['E5'])
      
    • expected.difference(found) # set(['E3'])
      
    • found.symmetric_difference(expected) # set(['E5', 'E3'])
      
    • 查找对象的副本:this question已被引用。使用该技术可以获得所有重复项,并使用生成的Counter对象,您可以找到多少重复项。例如:

      collections.Counter(found)['E5'] # 2
      

答案 2 :(得分:2)

你已经回答了前两个问题:

print('{0} missing from found'.format(list(set(expected) - set(found)))
print('{0} missing from expected'.format(list(set(found) - set(expected)))

后两个要求你考虑计算列表中的重复项,有很多解决方案可以在网上找到(包括这个:Find and list duplicates in a list?)。