按总和分组,从组中查找最小日期

时间:2017-11-13 18:00:48

标签: python pandas dataframe

我有一张桌子如下。

                                           msno      date  num_25  num_50  num_75  num_985  num_100  num_unq
1  PNxIsSLWOJDCm7pNPFzRO/6Mmg2WeZA2nf6hw6t1x3g=  20151201       3       3       2        0        8       11   
2  PNxIsSLWOJDCm7pNPFzRO/6Mmg2WeZA2nf6hw6t1x3g=  20160628       0       0       1        1        1        3   
3  PNxIsSLWOJDCm7pNPFzRO/6Mmg2WeZA2nf6hw6t1x3g=  20170106       2       1       0        0       35       34
4  KXF9c/T66LZIzFq+xS64icWMhDQE6miCZAtdXRjZHX8=  20150803       0       0       0        0       16       11   
5  KXF9c/T66LZIzFq+xS64icWMhDQE6miCZAtdXRjZHX8=  20160527       4       3       0        2        2       11   
6  KXF9c/T66LZIzFq+xS64icWMhDQE6miCZAtdXRjZHX8=  20160808      14       3       4        1       15       31 

我希望通过总结num_(25到unq)来对它们进行分组,然后确定最早的日期和最晚的日期出现在相同的msno中。

df = df_user_logs_v2.drop('date', axis=1).groupby('msno', as_index=False).sum()

上面的代码可以汇总所有值,但必须删除日期。我希望保留日期的最小值和最大值,以及行数。

第一个msno的预期输出:

                                          msno  num_25_sum  num_50_sum  num_75_sum  num_985_sum  num_100_sum  num_unq_sum date_earliest date_latest count
1 PNxIsSLWOJDCm7pNPFzRO/6Mmg2WeZA2nf6hw6t1x3g=           5           4           3            1           44           48      20151201    20170106     3

1 个答案:

答案 0 :(得分:0)

让我们试试这个:

d = dict((i,'sum') for i in df.columns[2:])
d['date'] = ['min','max']
d['msno'] = 'count'
df_out = df.groupby('msno').agg(d)
df_out.columns = df_out.columns.map('_'.join)

df_out

输出:

                                              msno_count  date_min  date_max  \
msno                                                                           
KXF9c/T66LZIzFq+xS64icWMhDQE6miCZAtdXRjZHX8=           3  20150803  20160808   
PNxIsSLWOJDCm7pNPFzRO/6Mmg2WeZA2nf6hw6t1x3g=           3  20151201  20170106   

                                              num_75_sum  num_50_sum  \
msno                                                                   
KXF9c/T66LZIzFq+xS64icWMhDQE6miCZAtdXRjZHX8=           4           6   
PNxIsSLWOJDCm7pNPFzRO/6Mmg2WeZA2nf6hw6t1x3g=           3           4   

                                              num_985_sum  num_25_sum  \
msno                                                                    
KXF9c/T66LZIzFq+xS64icWMhDQE6miCZAtdXRjZHX8=            3          18   
PNxIsSLWOJDCm7pNPFzRO/6Mmg2WeZA2nf6hw6t1x3g=            1           5   

                                              num_100_sum  num_unq_sum  
msno                                                                    
KXF9c/T66LZIzFq+xS64icWMhDQE6miCZAtdXRjZHX8=           33           53  
PNxIsSLWOJDCm7pNPFzRO/6Mmg2WeZA2nf6hw6t1x3g=           44           48  

enter image description here

相关问题