如何将groupby结果放入同一行

时间:2019-02-21 01:39:19

标签: pandas dataframe

我有以下数据框:

import pandas as pd

df = pd.DataFrame({'id' :["c1","c1","c1","c2","c2","c3","c3","c3","c3","c4","c4","c5","c6","c6","c6","c7","c7"],'store'  : ["first","second","second","first",
                                "second","first","third","fourth",
                                "fifth","second","fifth","first",
                                "first","second","third","fourth","fifth"],
                      'purchase': [10,10,10,20,20,30,30,30,30,40,40,50,60,60,60,70,70]})

分组后:

df_group= df.groupby(['id','store']).agg({'purchase': ["sum"]}) 

Result of df_group

我想让每张卡的不同商店中的所有购买都出现在同一行中,例如:

  id 1_store 1_sum 2_store 2_sum 3_store 3_sum 4_store 4_sum...
0 c1 first   10    second  20
1 C2 first   20    second  20
2 c3 fifth   30    first   30    fourth  30    third   30

我不想在商店中使用unstack,原因是商店太多,这将导致商店的列过多,并且大多数列为空。

如何获得以上结果? 谢谢

1 个答案:

答案 0 :(得分:0)

需要创建一个cumcount变量来获取列标签,然后这成为一个.pivot_table问题:您在列上得到了MultiIndex,我们可以将其折叠:

df_group['idx'] = df_group.groupby(level=0).cumcount()+1

df_res = (df_group.reset_index()
              .pivot_table(index='id', 
                           columns='idx', 
                           values=['store', 'purchase'], 
                           aggfunc='first')
              .sort_index(level=2, axis=1))

输出:

    purchase  store purchase   store purchase   store purchase  store
         sum             sum              sum              sum       
idx        1      1        2       2        3       3        4      4
id                                                                   
c1      10.0  first     20.0  second      NaN     NaN      NaN    NaN
c2      20.0  first     20.0  second      NaN     NaN      NaN    NaN
c3      30.0  fifth     30.0   first     30.0  fourth     30.0  third
c4      40.0  fifth     40.0  second      NaN     NaN      NaN    NaN
c5      50.0  first      NaN     NaN      NaN     NaN      NaN    NaN
c6      60.0  first     60.0  second     60.0   third      NaN    NaN
c7      70.0  fifth     70.0  fourth      NaN     NaN      NaN    NaN

如果需要折叠列(可能是个好主意,因为它不再按列排序了):

df_res.columns = ['_'.join(map(str, [y for y in x[::-1] if y != ''])) for x in df_res.columns]

    1_sum_purchase 1_store  2_sum_purchase 2_store  3_sum_purchase 3_store  4_sum_purchase 4_store
id                                                                                                
c1            10.0   first            20.0  second             NaN     NaN             NaN     NaN
c2            20.0   first            20.0  second             NaN     NaN             NaN     NaN
c3            30.0   fifth            30.0   first            30.0  fourth            30.0   third
c4            40.0   fifth            40.0  second             NaN     NaN             NaN     NaN
c5            50.0   first             NaN     NaN             NaN     NaN             NaN     NaN
c6            60.0   first            60.0  second            60.0   third             NaN     NaN
c7            70.0   fifth            70.0  fourth             NaN     NaN             NaN     NaN