Question

这更像是一个数学问题而不是其他任何问题。假设我在Python中有两个不同大小的列表

listA = ["Alice", "Bob", "Joe"]
listB = ["Joe", "Bob", "Alice", "Ken"]

我想知道这两个列表的重叠百分比。订单在列表中并不重要。找到重叠很容易，我已经看过其他关于如何做到这一点的帖子，但我不能在脑海中扩展它以找出它们重叠的百分比。如果我按照不同的顺序比较列表，结果会有不同的结果吗？这样做的最佳方式是什么？

Answer 1

从主要观点来看，我说你可能会问两个明智的问题：

与第一个列表相比，重叠的百分比是多少？即与第一个清单相比，共同部分有多大？
第二个清单同样如此。
与＆＃34; Universe＆＃34;相比，重叠的百分比是多少？（即两个名单的联合）？

肯定会发现其他含义，并且会有很多其他含义。总而言之，你应该知道你试图解决的问题。

从编程的角度来看，解决方案很简单：

listA = ["Alice", "Bob", "Joe"]
listB = ["Joe", "Bob", "Alice", "Ken"]

setA = set(listA)
setB = set(listB)

overlap = setA & setB
universe = setA | setB

result1 = float(len(overlap)) / len(setA) * 100
result2 = float(len(overlap)) / len(setB) * 100
result3 = float(len(overlap)) / len(universe) * 100

Answer 2

最大差异是两个列表具有完全不同的元素。因此，我们最多有<target name="runAll" depends="runAll-win,runAll-unix"/> <target name"runAll-win" if="win"> <myMacro-win .../>  ... <myMacro-win .../> </target> <target name"runAll-unix" if="unix"> <myMacro-unix .../>  ... <myMacro-unix .../> </target>个离散元素，其中n + m是第一个列表的大小，n是第二个列表的大小。一项措施可以是：

其中2 * c / (n + m)是公共元素的数量。这可以像这样计算为百分比：

Answer 3

>>> len(set(listA)&set(listB)) / float(len(set(listA) | set(listB))) * 100
75.0

我会计算总不同项目中的常用项目。

len(set(listA)&set(listB))返回公共项目（示例中为3）。

len(set(listA) | set(listB))返回不同项目的总数（4）。

乘以100得到百分比。

Answer 4

def computeOverlap(L1, L2):
    d1, d2 = {}, {}
    for e in L1:
        if e not in d1:
            d1[e] = 1
        d1[e] += 1

    for e in L2:
        if e not in d2:
            d2[e] = 0
        d2[e] += 1

    o1, o2 = 0, 0
    for k in d1:
        o1 += min(d1[k], d2.get(k,0))
    for k in d2:
        o2 += min(d1.get(k,0), d2[k])

    print((100*o1) if o1 else 0 "% of the first list overlaps with the second list")
    print((100*o2) if o2 else 0 "% of the second list overlaps with the first list")

当然，您可以使用defaultdict和counter来执行此操作，以使事情变得更容易：

from collections import defaultdict, Counter

def computeOverlap(L1, L2):
    d1 = defaultdict(int, Counter(L1))
    d2 = defaultdict(int, Counter(L2))

    o1, o2 = 0, 0
    for k in d1:
        o1 += min(d1[k], d2[k])
    for k in d2:
        o2 += min(d1[k,0], d2[k])

    print((100*o1) if o1 else 0 "% of the first list overlaps with the second list")
    print((100*o2) if o2 else 0 "% of the second list overlaps with the first list")

两个列表的重叠百分比

4 个答案: