使用Pandas中的不同列进行聚合分组

时间:2017-06-14 10:57:27

标签: python pandas calculated-columns pandas-groupby

在pandas中有一个包含ID和交付天数的数据框(例如,每周7天): enter image description here

我想使用groupby()pandas函数并创建以下内容 - 每天创建7个不同的列(例如,delivery_day_1,delivery_day_2等),并根据数据框中的ID计算出现的分组数。怎么能这样做?

感谢。

1 个答案:

答案 0 :(得分:2)

我认为您首先需要groupby + size + unstackcrosstab进行重塑。

然后,如有必要,请在reindex_axis和最后add_prefix之间添加遗失的weekday

样品:

df = pd.DataFrame({'subscription_id':[1,2,3,1], 'delivery_weekday':[1,1,2,1]})

print (df)
   delivery_weekday  subscription_id
0                 1                1
1                 1                2
2                 2                3
3                 1                1
df = df.groupby(['subscription_id','delivery_weekday']) \
       .size() \
       .unstack(fill_value=0) \
       .reindex_axis(range(1,8), fill_value=0, axis=1) \
       .add_prefix('delivery_day_')

print (df)
delivery_weekday  delivery_day_1  delivery_day_2  delivery_day_3  \
subscription_id                                                    
1                              2               0               0   
2                              1               0               0   
3                              0               1               0   

delivery_weekday  delivery_day_4  delivery_day_5  delivery_day_6  \
subscription_id                                                    
1                              0               0               0   
2                              0               0               0   
3                              0               0               0   

delivery_weekday  delivery_day_7  
subscription_id                   
1                              0  
2                              0  
3                              0  
df = pd.crosstab(df['subscription_id'],df['delivery_weekday']) \
       .reindex_axis(range(1,8), fill_value=0, axis=1) \
       .add_prefix('delivery_day_')
print (df)

delivery_weekday  delivery_day_1  delivery_day_2  delivery_day_3  \
subscription_id                                                    
1                              2               0               0   
2                              1               0               0   
3                              0               1               0   

delivery_weekday  delivery_day_4  delivery_day_5  delivery_day_6  \
subscription_id                                                    
1                              0               0               0   
2                              0               0               0   
3                              0               0               0   

delivery_weekday  delivery_day_7  
subscription_id                   
1                              0  
2                              0  
3                              0