在词典列表中查找最常见的词典的最佳方法

时间:2019-11-02 04:24:06

标签: python json dictionary

我有一个字典列表,其中每个字典都有键“ shape”和“ colour”。例如:

info = [
    {'shape': 'pentagon', 'colour': 'red'},
    {'shape': 'rectangle', 'colour': 'white'},
    # etc etc
]

我需要找到最常见的形状/颜色组合。我决定通过在列表中找到最常见的字典来做到这一点。我将方法缩减为:

frequency = defaultdict(int)

for i in info:
    hashed = json.dumps(i) # Get dictionary, turn into string to store as key in frequency dict
    frequency[hashed] += 1

most_common = max(frequency, key = frequency.get) # Get most common key and unhash it back into dict
print(json.loads(most_common))

我是python的新手,我总是最终会发现大约1-2行的函数,这些函数最终会做我想做的事情。我想知道在这种情况下是否存在更快的方法?也许这最终可以帮助另一个初学者,因为经过数年的谷歌搜索之后我找不到任何东西。

3 个答案:

答案 0 :(得分:5)

如果列表中的项目具有一致的键,则更好的选择是使用namedtuple代替dict,例如:

from collections import namedtuple

# Define the named tuple
MyItem = namedtuple("MyItem", "shape colour")

# Create your list of data
info = [
    MyItem('pentagon', 'red'),
    MyItem('rectangle', 'white'),
    # etc etc
]

这提供了许多好处:

# To instantiate
item = MyItem("pentagon", "red")

# or using keyword arguments
item = MyItem(shape="pentagon", colour="red")

# or from your existing dict
item = MyItem(**{'shape': 'pentagon', 'colour': 'red'})

# Accessors
print(item.shape)
print(item.colour)

# Decomposition
shape, colour = item

但是,回到计数匹配项的问题,因为可以使用namedtuple是可哈希的collections.Counter,然后计数代码变为:

from collections import Counter

frequency = Counter(info)

# Get the items in the order of most common
frequency.most_common()

享受!

答案 1 :(得分:2)

  1. 不要将dict转换为特定的字符串表示形式,而是从每个字典中获取所需的数据。对这两个字符串值进行元组化后,您可以将其哈希值用作dict键。

  2. Python标准库为此精确计数目的提供了collections.Counter

因此:

from collections import Counter
info = # ...
histogram = Counter((item['shape'], item['colour']) for item in info)
# the most_common method gives a list of the n most common items.
shape, colour = histogram.most_common(1)[0]
# re-assemble the dict, if desired, and print it.
print({'shape': shape, 'colour': colour})

答案 2 :(得分:1)

使用熊猫会让您的问题变得更简单。

import pandas as pd

info = [
    {'shape': 'pentagon', 'colour': 'red'},
    {'shape': 'rectangle', 'colour': 'white'},
    # etc etc
]

df = pd.DataFrame(info)

# to get the most commonly occurring shape
# to get the count of values
print (df['shape'].value_counts())

# to get the most commonly occurring value
print (df['shape'].value_counts().argmax())

#or
print (df['shape'].value_counts().idxmax())

为了获得最常见的颜色,只需将形状更改为颜色 例如。 print (df['shape'].value_counts())print (df['colour'].value_counts())

不仅如此,pandas还为您提供了许多很酷的内置功能供您玩耍。 要了解更多信息,只需使用Google搜索pandas,您就会拥有它。

注意: 使用前请先安装熊猫。

pip install pandas 

pip3 install pandas