Question

我有一个dicts列表，其中每个dict包含3个键：name，url和location 只有＆＃39; name＆＃39;的价值在整个dicts中可以是相同的，并且两者都是＆＃39; url＆＃39;和＆＃39; location＆＃39;在整个清单中总是有不同的价值。

示例：

[
{"name":"A1", "url":"B1", "location":"C1"}, 
{"name":"A1", "url":"B2", "location":"C2"}, 
{"name":"A2", "url":"B3", "location":"C3"},
{"name":"A2", "url":"B4", "location":"C4"}, ...
]

然后我想根据＆＃39; name＆＃39;中的值对它们进行分组。如下。

预期：

[
{"name":"A1", "url":"B1, B2", "location":"C1, C2"},
{"name":"A2", "url":"B3, B4", "location":"C3, C4"},
]

（实际列表包含＆gt; 2,000个词组）

我很高兴能够解决这个问题任何建议/答案将不胜感激。

提前致谢。

Answer 1

使用辅助分组dict（对于Python＆gt; 3.5）：

data = [
    {"name":"A1", "url":"B1", "location":"C1"}, 
    {"name":"A1", "url":"B2", "location":"C2"}, 
    {"name":"A2", "url":"B3", "location":"C3"},
    {"name":"A2", "url":"B4", "location":"C4"}
]

groups = {}
for d in data:
    if d['name'] not in groups:
        groups[d['name']] = {'url': d['url'], 'location': d['location']}
    else:
        groups[d['name']]['url'] += ', ' + d['url']
        groups[d['name']]['location'] += ', ' + d['location']
result = [{**{'name': k}, **v} for k, v in groups.items()]

print(result)

输出：

[{'name': 'A1', 'url': 'B1, B2', 'location': 'C1, C2'}, {'name': 'A2', 'url': 'B3, B4', 'location': 'C3, C4'}]

Answer 2

由于您的数据集相对较小，我猜这里的时间复杂度不是很大，所以您可以考虑使用以下代码。

from collections import defaultdict
given_data = [
    {"name":"A1", "url":"B1", "location":"C1"}, 
    {"name":"A1", "url":"B2", "location":"C2"}, 
    {"name":"A2", "url":"B3", "location":"C3"},
    {"name":"A2", "url":"B4", "location":"C4"},
] 
D = defaultdict(list)
for item in given_data:
    D[item['name']].append(item)
result = []
for x in D:
    urls = ""
    locations = ""
    for pp in D[x]:
        urls += pp['url']+" "
        locations += pp['location']+" "
    result.append({'name': x, 'url': urls.strip(), 'location': locations.strip()})

Answer 3

其中res是：

[{'location': 'C1', 'name': 'A1', 'url': 'B1'},
 {'location': 'C2', 'name': 'A1', 'url': 'B2'},
 {'location': 'C3', 'name': 'A2', 'url': 'B3'},
 {'location': 'C4', 'name': 'A2', 'url': 'B4'}]

您可以使用defaultdict处理数据并将结果解压缩到列表解析中：

from collections import defaultdict

result = defaultdict(lambda: defaultdict(list))

for items in res:
     result[items['name']]['location'].append(items['location'])
     result[items['name']]['url'].append(items['url'])

final = [
    {'name': name, **{inner_names: ' '.join(inner_values) for inner_names, inner_values in values.items()}}
    for name, values in result.items()
]

final是：

In [57]: final
Out[57]:
[{'location': 'C1 C2', 'name': 'A1', 'url': 'B1 B2'},
 {'location': 'C3 C4', 'name': 'A2', 'url': 'B3 B4'}]

Answer 4

使用@Yaroslav Surzhikov评论，这是使用itertools.groupby的解决方案

from itertools import groupby

dicts = [
    {"name":"A1", "url":"B1", "location":"C1"},
    {"name":"A1", "url":"B2", "location":"C2"},
    {"name":"A2", "url":"B3", "location":"C3"},
    {"name":"A2", "url":"B4", "location":"C4"},
]

def merge(dicts):
    new_list = []
    for key, group in groupby(dicts, lambda x: x['name']):
        new_item = {}
        new_item['name'] = key
        new_item['url'] = []
        new_item['location'] = []
        for item in group:
            new_item['url'].extend([item.get('url', '')])
            new_item['location'].extend([item.get('location', '')])
        new_item['url'] = ', '.join(new_item.get('url', ''))
        new_item['location'] = ', '.join(new_item.get('location', ''))
        new_list.append(new_item)
    return new_list

print(merge(dicts))

Answer 5

这样的东西？小偏差：我倾向于将网址和位置存储在 resdict 内的列表中，而不是附加 str 。

myDict = [
{"name":"A1", "url":"B1", "location":"C1"}, 
{"name":"A1", "url":"B2", "location":"C2"}, 
{"name":"A2", "url":"B3", "location":"C3"},
{"name":"A2", "url":"B4", "location":"C4"}
]

resDict = []

def getKeys(d):
    arr = []
    for row in d:
        arr.append(row["name"])
    ret = list(set(arr))
    return ret

def filteredDict(d, k):
    arr = []
    for row in d:
        if row["name"] == k:
            arr.append(row)
    return arr

def compressedDictRow(rowArr):
    urls = []
    locations = []
    name = rowArr[0]['name']

    for row in rowArr:
       urls.append(row['url'])
       locations.append(row['location'])
    return {"name":name,"urls":urls, "locations":locations}

keys = getKeys(myDict)

for key in keys:
    rowArr = filteredDict(myDict,key)
    row = compressedDictRow(rowArr)
    resDict.append(row)
print(resDict)

输出（在一行中）：

[
    {'name': 'A2', 'urls': ['B3', 'B4'], 'locations': ['C3', 'C4']}, 
    {'name': 'A1', 'urls': ['B1', 'B2'], 'locations': ['C1', 'C2']}
]

Answer 6

这里有一个变体（它很难读甚至，感觉就像是用左手抓挠我的右侧，但此时，我不是＆＃ 39;知道如何缩短它：使用：

[Python]: itertools - Functions creating iterators for efficient looping
- groupby
- accumulate
理解（list和dict）

>>> pprint.pprint(initial_list)
[{'location': 'C1', 'name': 'A1', 'url': 'B1'},
 {'location': 'C2', 'name': 'A1', 'url': 'B2'},
 {'location': 'C3', 'name': 'A2', 'url': 'B3'},
 {'location': 'C4', 'name': 'A2', 'url': 'B4'}]
>>>
>>> NAME_KEY = "name"
>>>
>>> final_list = [list(itertools.accumulate(group_list, func=lambda x, y: {key: x[key] if key == NAME_KEY else " ".join([x[key], y[key]]) for key in x}))[-1] \
...     for group_list in [list(group[1]) for group in itertools.groupby(sorted(initial_list, key=lambda x: x[NAME_KEY]), key=lambda x: x[NAME_KEY])]]
>>>
>>> pprint.pprint(final_list)
[{'location': 'C1 C2', 'name': 'A1', 'url': 'B1 B2'},
 {'location': 'C3 C4', 'name': 'A2', 'url': 'B3 B4'}]

基本原理（从外部到内部）：

根据与 name 键对应的值（itertools.groupby）将字典分组到初始列表中
- 此操作正常的辅助操作是在分组之前对列表中的相同值进行排序（sorted）
对于每个这样的词典组，执行他们的＆＃34; 总和＆＃34; （itertools.accumulate）
- func参数＆＃34; sum s＆＃34; 2个词典，基于键：
  - 如果密钥是 name ，只需从1 ^st字典中获取值（无论如何，这两个字典都是相同的）
  - 否则只需添加2个值（字符串），其间有空格

<强>考虑：

词典必须保持同质（所有词必须具有相同的结构（键））
只有名称键是硬编码的（但是，如果您决定添加非字符串的其他键，则您还必须调整func）
可拆分以便于阅读
不确定lambda s（表现明智）

python：如何基于值合并dicts列表中的dict

6 个答案: