计算列表中的频率

时间:2017-03-05 15:26:47

标签: python itertools

我有一份清单清单:

countall = [[5, 0], [4, 1], [4, 1], [3, 2], [4, 1], [3, 2], [3, 2], [2, 3], [4, 1], [3, 2], [3, 2], [2, 3], [3, 2], [2, 3], [2, 3], [1, 4], [4, 1], [3, 2], [3, 2], [2, 3], [3, 2], [2, 3], [2, 3], [1, 4], [3, 2], [2, 3], [2, 3], [1, 4], [2, 3], [1, 4], [1, 4], [0, 5]]

我想在上面的列表中找到子列表的频率。

我试过使用itertools:

freq = [len(list(group)) for x in countall for key, group in groupby(x)]

然而,我得到了错误的结果:

[1, 2, 1, 1, 2, 1, 1, 2, 1, 1, 2, 1, 1, 2, 1, 1, 2, 1, 1, 2, 1, 1, 2, 1]

我的列表理解有什么问题?

3 个答案:

答案 0 :(得分:4)

Groupby似乎处理彼此之后的序列。要使用它,您需要先对列表进行排序。另一种选择是使用Counter类:

from collections import Counter
countall = [[5, 0], [4, 1], [4, 1], [3, 2], [4, 1], [3, 2], [3, 2], [2, 3], [4, 1], [3, 2], [3, 2], [2, 3], [3, 2], [2, 3], [2, 3], [1, 4], [4, 1], [3, 2], [3, 2], [2, 3], [3, 2], [2, 3], [2, 3], [1, 4], [3, 2], [2, 3], [2, 3], [1, 4], [2, 3], [1, 4], [1, 4], [0, 5]]
Counter([tuple(x) for x in countall])

输出:

Counter({(3, 2): 10, (2, 3): 10, (1, 4): 5, (4, 1): 5, (5, 0): 1, (0, 5): 1})

答案 1 :(得分:3)

如ForceBru所指出的那样首先对你的列表进行排序,然后使用groupby:

from itertools import groupby
countall = [[5, 0], [4, 1], [4, 1], [3, 2], [4, 1], [3, 2], [3, 2], [2, 3], [4, 1], [3, 2], [3, 2], [2, 3], [3, 2], [2, 3], [2, 3], [1, 4], [4, 1], [3, 2], [3, 2], [2, 3], [3, 2], [2, 3], [2, 3], [1, 4], [3, 2], [2, 3], [2, 3], [1, 4], [2, 3], [1, 4], [1, 4], [0, 5]]

freq = [(key, len(list(x))) for key, x in groupby(sorted(countall))]
print(freq)

输出:

[([0, 5], 1), ([1, 4], 5), ([2, 3], 10), ([3, 2], 10), ([4, 1], 5), ([5, 0], 1)]

您的代码有错误:

freq = [len(list(group)) for x in countall for key, group in groupby(x)]
                       ^paranthesis missing

然后,您将countall中不需要的每个列表分组。

for x in countall for key, group in groupby(x)

你可以在排序(countall)

上直接groupby

另外,正如@Bemmu所回答,您可以使用collections.Counter。但是这不支持list所以首先你必须将数据转换为tupple或string然后使用Counter

答案 2 :(得分:1)

如评论中所述,如果您使用的是groupby,则需要进行排序。

<强>代码:

import itertools as it
freq = {tuple(key): len(list(group)) for key, group in it.groupby(sorted(countall))}

测试代码:

countall = [[5, 0], [4, 1], [4, 1], [3, 2], [4, 1], [3, 2], [3, 2], [2, 3],
           [4, 1], [3, 2], [3, 2], [2, 3], [3, 2], [2, 3], [2, 3], [1, 4],
           [4, 1], [3, 2], [3, 2], [2, 3], [3, 2], [2, 3], [2, 3], [1, 4],
           [3, 2], [2, 3], [2, 3], [1, 4], [2, 3], [1, 4], [1, 4], [0, 5]]

print(freq)

<强>结果:

{(3, 2): 10, (1, 4): 5, (2, 3): 10, (5, 0): 1, (0, 5): 1, (4, 1): 5}