对于给定的数据框df
timestamps = [
datetime.datetime(2018, 1, 1, 10, 0, 0, 0), # person 1
datetime.datetime(2018, 1, 1, 10, 0, 0, 0), # person 2
datetime.datetime(2018, 1, 1, 11, 0, 0, 0), # person 2
datetime.datetime(2018, 1, 2, 11, 0, 0, 0), # person 2
datetime.datetime(2018, 1, 1, 10, 0, 0, 0), # person 3
datetime.datetime(2018, 1, 2, 11, 0, 0, 0), # person 3
datetime.datetime(2018, 1, 4, 10, 0, 0, 0), # person 3
datetime.datetime(2018, 1, 5, 12, 0, 0, 0) # person 3
]
df = pd.DataFrame({'person': [1, 2, 2, 2, 3, 3, 3, 3], 'timestamp': timestamps })
我想为每个人(df.groupby('person')
)计算该人的所有时间戳之间的时差,我将diff()
。
df.groupby('person').timestamp.diff()
只是一半,因为丢失了回映给人的地图。
解决方案怎么样?
答案 0 :(得分:2)
我认为你应该使用
df.groupby('person').timestamp.transform(pd.Series.diff)
答案 1 :(得分:1)
问题diff
没有汇总值,因此可能的解决方案是transform
:
df['new'] = df.groupby('person').timestamp.transform(pd.Series.diff)
print (df)
person timestamp new
0 1 2018-01-01 10:00:00 NaT
1 2 2018-01-01 10:00:00 NaT
2 2 2018-01-01 11:00:00 0 days 01:00:00
3 2 2018-01-02 11:00:00 1 days 00:00:00
4 3 2018-01-01 10:00:00 NaT
5 3 2018-01-02 11:00:00 1 days 01:00:00
6 3 2018-01-04 10:00:00 1 days 23:00:00
7 3 2018-01-05 12:00:00 1 days 02:00:00