我有以下数据框:
import pandas as pd
df = pd.DataFrame({'id' :["c1","c1","c1","c2","c2","c3","c3","c3","c3","c4","c4","c5","c6","c6","c6","c7","c7"],'store' : ["first","second","second","first",
"second","first","third","fourth",
"fifth","second","fifth","first",
"first","second","third","fourth","fifth"],
'purchase': [10,10,10,20,20,30,30,30,30,40,40,50,60,60,60,70,70]})
分组后:
df_group= df.groupby(['id','store']).agg({'purchase': ["sum"]})
我想让每张卡的不同商店中的所有购买都出现在同一行中,例如:
id 1_store 1_sum 2_store 2_sum 3_store 3_sum 4_store 4_sum...
0 c1 first 10 second 20
1 C2 first 20 second 20
2 c3 fifth 30 first 30 fourth 30 third 30
我不想在商店中使用unstack,原因是商店太多,这将导致商店的列过多,并且大多数列为空。
如何获得以上结果? 谢谢
答案 0 :(得分:0)
需要创建一个cumcount
变量来获取列标签,然后这成为一个.pivot_table
问题:您在列上得到了MultiIndex
,我们可以将其折叠:>
df_group['idx'] = df_group.groupby(level=0).cumcount()+1
df_res = (df_group.reset_index()
.pivot_table(index='id',
columns='idx',
values=['store', 'purchase'],
aggfunc='first')
.sort_index(level=2, axis=1))
purchase store purchase store purchase store purchase store
sum sum sum sum
idx 1 1 2 2 3 3 4 4
id
c1 10.0 first 20.0 second NaN NaN NaN NaN
c2 20.0 first 20.0 second NaN NaN NaN NaN
c3 30.0 fifth 30.0 first 30.0 fourth 30.0 third
c4 40.0 fifth 40.0 second NaN NaN NaN NaN
c5 50.0 first NaN NaN NaN NaN NaN NaN
c6 60.0 first 60.0 second 60.0 third NaN NaN
c7 70.0 fifth 70.0 fourth NaN NaN NaN NaN
如果需要折叠列(可能是个好主意,因为它不再按列排序了):
df_res.columns = ['_'.join(map(str, [y for y in x[::-1] if y != ''])) for x in df_res.columns]
1_sum_purchase 1_store 2_sum_purchase 2_store 3_sum_purchase 3_store 4_sum_purchase 4_store
id
c1 10.0 first 20.0 second NaN NaN NaN NaN
c2 20.0 first 20.0 second NaN NaN NaN NaN
c3 30.0 fifth 30.0 first 30.0 fourth 30.0 third
c4 40.0 fifth 40.0 second NaN NaN NaN NaN
c5 50.0 first NaN NaN NaN NaN NaN NaN
c6 60.0 first 60.0 second 60.0 third NaN NaN
c7 70.0 fifth 70.0 fourth NaN NaN NaN NaN