如何从熊猫数据帧中找到调和平均速度

时间:2017-02-07 15:46:51

标签: python python-3.x sorting date pandas

我有一个pandas数据帧,其中列的速度单位为KmH,还有一列时间戳:

Date,                     Speed
2016-07-07 13:38:02.000,  50.718590
2016-07-18 11:28:00.000,   2.357645
2016-07-15 15:03:08.000,  14.652172
2016-07-18 06:53:00.000,  24.530390
...                       ...
2016-07-18 18:41:31.000,  31.761416
2016-07-14 05:28:42.187,   7.532758

我想要的是平均每天15分钟harmonic average speed

Time,  Speed
00:00, 32
00:15, 10
00:30, 12
00:45, 41
01:00, 12
...
23:30, 30
23:45, 31

我最初的尝试是从每个时间戳中删除日期,将其设置为索引,然后使用TimeGrouper查找平均值。 (我的数据框称为输出)代码是:

output['Speed'] = output['Speed']**-1
output['Date'] = output['Date'].apply( lambda d : d.time() )
output = output.set_index(['Date'])
output = output.groupby(pd.TimeGrouper('15Min')).mean()
output['Speed'] = output['Speed']**-1

但是代码没有用,因为它给了我一个错误:

 Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Int64Index'

1 个答案:

答案 0 :(得分:2)

我认为您要做的是规范化日期,然后进行重新取样:

In [177]:
df['Date'] = pd.to_datetime(df['Date'].dt.strftime('%H:%M:%S'))
df

Out[177]:
                 Date      Speed
0 2017-02-07 13:38:02  50.718590
1 2017-02-07 11:28:00   2.357645
2 2017-02-07 15:03:08  14.652172
3 2017-02-07 06:53:00  24.530390
4 2017-02-07 18:41:31  31.761416
5 2017-02-07 05:28:42   7.532758

现在所有日期都是相同的,默认情况下是今天的日期,然后按照自己的意愿行事:

In [178]:
output = df.set_index('Date')
output = output.groupby(pd.TimeGrouper('15Min')).mean()
output['Speed'] = output['Speed']**-1
output

Out[178]:
                        Speed
Date                         
2017-02-07 05:15:00  0.132754
2017-02-07 05:30:00       NaN
2017-02-07 05:45:00       NaN
2017-02-07 06:00:00       NaN
2017-02-07 06:15:00       NaN
2017-02-07 06:30:00       NaN
2017-02-07 06:45:00  0.040766
2017-02-07 07:00:00       NaN
2017-02-07 07:15:00       NaN
2017-02-07 07:30:00       NaN
2017-02-07 07:45:00       NaN
2017-02-07 08:00:00       NaN
2017-02-07 08:15:00       NaN
2017-02-07 08:30:00       NaN
2017-02-07 08:45:00       NaN
2017-02-07 09:00:00       NaN
2017-02-07 09:15:00       NaN
2017-02-07 09:30:00       NaN
2017-02-07 09:45:00       NaN
2017-02-07 10:00:00       NaN
2017-02-07 10:15:00       NaN
2017-02-07 10:30:00       NaN
2017-02-07 10:45:00       NaN
2017-02-07 11:00:00       NaN
2017-02-07 11:15:00  0.424152
2017-02-07 11:30:00       NaN
2017-02-07 11:45:00       NaN
2017-02-07 12:00:00       NaN
2017-02-07 12:15:00       NaN
2017-02-07 12:30:00       NaN
2017-02-07 12:45:00       NaN
2017-02-07 13:00:00       NaN
2017-02-07 13:15:00       NaN
2017-02-07 13:30:00  0.019717
2017-02-07 13:45:00       NaN
2017-02-07 14:00:00       NaN
2017-02-07 14:15:00       NaN
2017-02-07 14:30:00       NaN
2017-02-07 14:45:00       NaN
2017-02-07 15:00:00  0.068249
2017-02-07 15:15:00       NaN
2017-02-07 15:30:00       NaN
2017-02-07 15:45:00       NaN
2017-02-07 16:00:00       NaN
2017-02-07 16:15:00       NaN
2017-02-07 16:30:00       NaN
2017-02-07 16:45:00       NaN
2017-02-07 17:00:00       NaN
2017-02-07 17:15:00       NaN
2017-02-07 17:30:00       NaN
2017-02-07 17:45:00       NaN
2017-02-07 18:00:00       NaN
2017-02-07 18:15:00       NaN
2017-02-07 18:30:00  0.031485

所以这个:

df['Date'] = pd.to_datetime(df['Date'].dt.strftime('%H:%M:%S'))

这样做的目的是使用dt.strftime提取字符串,我们可以使用to_datetime创建一个datetime64系列,其中所有日期都相同