Question

我有一个代表顾客购买物品的清单：

purchases = [
   { 
       'id': 1, 'product': 'Item 1', 'price': 12.4, 'qty' : 4
   }, 
   { 
       'id': 1, 'product': 'Item 1', 'price': 12.4, 'qty' : 8
   },
   { 
       'id': 2, 'product': 'Item 2', 'price': 7.5, 'qty': 10
   }, 
   { 
       'id': 3, 'product': 'Item 3', 'price': 18, 'qty': 7
   }
]

现在我希望输出返回带有聚合数量的不同product。

result = [
   { 
       'id': 1, 'product': 'Item 1', 'price': 12.4, 'qty' : 12 # 8 + 4
   }, 
   { 
       'id': 2, 'product': 'Item 2', 'price': 7.5, 'qty': 10
   }, 
   { 
       'id': 3, 'product': 'Item 3', 'price': 18, 'qty': 7
   }
]

这里的答案对我来说永远不会有意义 How to sum dict elements

Answer 1

在pandas中，这很简单 - groupby与aggregate，最后to_dict：

import pandas as pd

df = pd.DataFrame(purchases)
print (df)
   id  price product  qty
0   1   12.4  Item 1    4
1   1   12.4  Item 1    8
2   2    7.5  Item 2   10
3   3   18.0  Item 3    7

print (df.groupby('product', as_index=False)
         .agg({'id':'first','price':'first','qty':'sum'})
         .to_dict(orient='records'))

[{'qty': 12, 'product': 'Item 1', 'price': 12.4, 'id': 1}, 
 {'qty': 10, 'product': 'Item 2', 'price': 7.5, 'id': 2}, 
 {'qty': 7, 'product': 'Item 3', 'price': 18.0, 'id': 3}]

如果可能由3个元素组成：

print (df.groupby(['id','product', 'price'], as_index=False)['qty'].sum()
         .to_dict(orient='records'))
[{'qty': 12, 'product': 'Item 1', 'id': 1, 'price': 12.4}, 
 {'qty': 10, 'product': 'Item 2', 'id': 2, 'price': 7.5}, 
 {'qty': 7, 'product': 'Item 3', 'id': 3, 'price': 18.0}]

from itertools import groupby
from operator import itemgetter

grouper = itemgetter("id", "product", "price")
result = []
for key, grp in groupby(sorted(purchases, key = grouper), grouper):
    temp_dict = dict(zip(["id", "product", "price"], key))
    temp_dict["qty"] = sum(item["qty"] for item in grp)
    result.append(temp_dict)

print(result)
[{'qty': 12, 'product': 'Item 1', 'id': 1, 'price': 12.4}, 
 {'qty': 10, 'product': 'Item 2', 'id': 2, 'price': 7.5}, 
 {'qty': 7, 'product': 'Item 3', 'id': 3, 'price': 18}]

通过评论编辑：

purchases = [
   { 
       'id': 1, 'product': { 'id': 1, 'name': 'item 1' }, 'price': 12.4, 'qty' : 4
   }, 
   { 
       'id': 1, 'product': { 'id': 1, 'name': 'item 2' }, 'price': 12.4, 'qty' : 8
   },
   { 
       'id': 2, 'product':{ 'id': 2, 'name': 'item 3' }, 'price': 7.5, 'qty': 10
   }, 
   { 
       'id': 3, 'product': { 'id': 3, 'name': 'item 4' }, 'price': 18, 'qty': 7
   }
]

from pandas.io.json import json_normalize    
df = json_normalize(purchases)
print (df)
   id  price  product.id product.name  qty
0   1   12.4           1       item 1    4
1   1   12.4           1       item 2    8
2   2    7.5           2       item 3   10
3   3   18.0           3       item 4    7

print (df.groupby(['id','product.id', 'price'], as_index=False)['qty'].sum()
         .to_dict(orient='records'))

[{'qty': 12.0, 'price': 12.4, 'id': 1.0, 'product.id': 1.0}, 
 {'qty': 10.0, 'price': 7.5, 'id': 2.0, 'product.id': 2.0}, 
 {'qty': 7.0, 'price': 18.0, 'id': 3.0, 'product.id': 3.0}]

Answer 2

另一种解决方案，不是最优雅，但更容易理解

from collections import Counter
c = Counter()
some = [((x['id'], x['product'], x['price']), x['qty']) for x in purchases]
for x in some:
    c[x[0]] += x[1]

[{'id': k[0], 'product': k[1], 'price': k[2], 'qty': v} for k, v in c.items()]

我用@jezrael的groupby解决方案测量了该解决方案

100000 loops, best of 3: 9.03 µs per loop vs @ jezrael＆＃39; s 100000 loops, best of 3: 12.2 µs per loop

如何聚合列表中特定属性所属的特定属性值

2 个答案: