Pandas数据帧中日期之间的差异

时间:2017-10-17 19:59:36

标签: python pandas datetime dataframe pandas-groupby

这是AfterSelect,但现在我需要找到存储在'YYYY-MM-DD'中的日期之间的差异。基本上date,site,country_code,kind,ID,rank,votes,sessions,avg_score,count 2017-03-20,website1,US,0,84,226,0.0,15.0,3.370812,53.0 2017-03-21,website1,US,0,84,214,0.0,15.0,3.370812,53.0 2017-03-22,website1,US,0,84,226,0.0,16.0,3.370812,53.0 2017-03-23,website1,US,0,84,234,0.0,16.0,3.369048,54.0 2017-03-24,website1,US,0,84,226,0.0,16.0,3.369048,54.0 2017-03-25,website1,US,0,84,212,0.0,16.0,3.369048,54.0 2017-03-27,website1,US,0,84,228,0.0,16.0,3.369048,58.0 2017-02-15,website2,AU,1,91,144,4.0,148.0,4.727272,521.0 2017-02-16,website2,AU,1,91,144,3.0,147.0,4.727272,524.0 2017-02-20,website2,AU,1,91,100,4.0,148.0,4.727272,531.0 2017-02-21,website2,AU,1,91,118,6.0,149.0,4.727272,533.0 2017-02-22,website2,AU,1,91,114,4.0,151.0,4.727272,534.0 列中值之间的差异是我们需要的,但是按每行之间的天数进行标准化。

我的数据框是:

date+site+country+kind+ID

我希望找到按[date,site,country_code,kind,ID,rank,votes,sessions,avg_score,count,day_diff 2017-03-20,website1,US,0,84,226,0.0,15.0,3.370812,0,0 2017-03-21,website1,US,0,84,214,0.0,15.0,3.370812,0,1 2017-03-22,website1,US,0,84,226,0.0,16.0,3.370812,0,1 2017-03-23,website1,US,0,84,234,0.0,16.0,3.369048,0,1 2017-03-24,website1,US,0,84,226,0.0,16.0,3.369048,0,1 2017-03-25,website1,US,0,84,212,0.0,16.0,3.369048,0,1 2017-03-27,website1,US,0,84,228,0.0,16.0,3.369048,4,2 2017-02-15,website2,AU,1,91,144,4.0,148.0,4.727272,0,0 2017-02-16,website2,AU,1,91,144,3.0,147.0,4.727272,3,1 2017-02-20,website2,AU,1,91,100,4.0,148.0,4.727272,7,4 2017-02-21,website2,AU,1,91,118,6.0,149.0,4.727272,3,1 2017-02-22,website2,AU,1,91,114,4.0,151.0,4.727272,1,1] 元组分组后每个日期之间的差异。

date

一种选择是使用datetimepd.to_datetime()列转换为Panda diff并使用x days函数,但会产生值{{1 “,类型为timetelda64。我想用这个差异来找出每日平均数,所以如果这可以在一个/不那么痛苦的步骤中完成,那就行得很好。

1 个答案:

答案 0 :(得分:2)

您可以使用.dt.days访问者:

In [72]: df['date'] = pd.to_datetime(df['date'])

In [73]: df['day_diff'] = df.groupby(['site','country_code','kind','ID'])['date'] \
                            .diff().dt.days.fillna(0)

In [74]: df
Out[74]:
         date      site country_code  kind  ID  rank  votes  sessions  avg_score  count  day_diff
0  2017-03-20  website1           US     0  84   226    0.0      15.0   3.370812   53.0       0.0
1  2017-03-21  website1           US     0  84   214    0.0      15.0   3.370812   53.0       1.0
2  2017-03-22  website1           US     0  84   226    0.0      16.0   3.370812   53.0       1.0
3  2017-03-23  website1           US     0  84   234    0.0      16.0   3.369048   54.0       1.0
4  2017-03-24  website1           US     0  84   226    0.0      16.0   3.369048   54.0       1.0
5  2017-03-25  website1           US     0  84   212    0.0      16.0   3.369048   54.0       1.0
6  2017-03-27  website1           US     0  84   228    0.0      16.0   3.369048   58.0       2.0
7  2017-02-15  website2           AU     1  91   144    4.0     148.0   4.727272  521.0       0.0
8  2017-02-16  website2           AU     1  91   144    3.0     147.0   4.727272  524.0       1.0
9  2017-02-20  website2           AU     1  91   100    4.0     148.0   4.727272  531.0       4.0
10 2017-02-21  website2           AU     1  91   118    6.0     149.0   4.727272  533.0       1.0
11 2017-02-22  website2           AU     1  91   114    4.0     151.0   4.727272  534.0       1.0