timedeltas为pandas中的groupby列

时间:2018-03-28 08:38:53

标签: python pandas datetime dataframe pandas-groupby

对于给定的数据框df

timestamps = [
    datetime.datetime(2018, 1, 1, 10, 0, 0, 0), # person 1
    datetime.datetime(2018, 1, 1, 10, 0, 0, 0), # person 2
    datetime.datetime(2018, 1, 1, 11, 0, 0, 0), # person 2
    datetime.datetime(2018, 1, 2, 11, 0, 0, 0), # person 2
    datetime.datetime(2018, 1, 1, 10, 0, 0, 0), # person 3
    datetime.datetime(2018, 1, 2, 11, 0, 0, 0), # person 3
    datetime.datetime(2018, 1, 4, 10, 0, 0, 0), # person 3
    datetime.datetime(2018, 1, 5, 12, 0, 0, 0)  # person 3
]
df = pd.DataFrame({'person': [1, 2, 2, 2, 3, 3, 3, 3], 'timestamp': timestamps })

我想为每个人(df.groupby('person'))计算该人的所有时间戳之间的时差,我将diff()

df.groupby('person').timestamp.diff()

只是一半,因为丢失了回映给人的地图。

解决方案怎么样?

2 个答案:

答案 0 :(得分:2)

我认为你应该使用

df.groupby('person').timestamp.transform(pd.Series.diff)

答案 1 :(得分:1)

问题diff没有汇总值,因此可能的解决方案是transform

df['new'] = df.groupby('person').timestamp.transform(pd.Series.diff)
print (df)
   person           timestamp             new
0       1 2018-01-01 10:00:00             NaT
1       2 2018-01-01 10:00:00             NaT
2       2 2018-01-01 11:00:00 0 days 01:00:00
3       2 2018-01-02 11:00:00 1 days 00:00:00
4       3 2018-01-01 10:00:00             NaT
5       3 2018-01-02 11:00:00 1 days 01:00:00
6       3 2018-01-04 10:00:00 1 days 23:00:00
7       3 2018-01-05 12:00:00 1 days 02:00:00