对时间序列数据进行分组和重采样

时间:2017-02-01 23:40:45

标签: python pandas

数据:

ohlc_dict = {
'Open':'first',
'High':'max',
'Low':'min',
'Last': 'last',
'Volume': 'sum'}

data['hod'] = [r.hour for r in data.index]

data.head(10)
Out[61]:

                    Open    High    Low    Last    Volume   hod dow
Timestamp                           
2014-05-08 08:00:00 136.230 136.290 136.190 136.290 7077    8   Thursday
2014-05-08 08:15:00 136.290 136.300 136.240 136.250 3881    8   Thursday
2014-05-08 08:30:00 136.240 136.270 136.230 136.230 2540    8   Thursday
2014-05-08 08:45:00 136.230 136.260 136.230 136.250 2293    8   Thursday
2014-05-08 09:00:00 136.250 136.360 136.240 136.360 15014   9   Thursday
2014-05-08 09:15:00 136.350 136.360 136.260 136.270 11697   9   Thursday
2014-05-08 09:30:00 136.270 136.270 136.190 136.200 15600   9   Thursday
2014-05-08 09:45:00 136.200 136.270 136.200 136.240 9025    9   Thursday
2014-05-08 10:00:00 136.240 136.270 136.240 136.260 7128    10  Thursday
2014-05-08 10:15:00 136.250 136.260 136.200 136.200 6100    10  Thursday

问题:

以下两项都将时间范围从15分钟更改为1小时间隔:

方法1:

data['2016'].groupby('hod').Volume.mean().head()

hod
8     8452.597
9    16485.398
10   15619.626
11   14132.666
12   11470.058
Name: Volume, dtype: float64

方法2:

df_h1 = data.resample('1h').agg(ohlc_dict).dropna()
df_h1['hod'] = [r.hour for r in df_h1.index]
df_h1['2016'].groupby('hod')['Volume'].mean()

Timestamp
2014-05-08 08:00:00   15791.000
2014-05-08 09:00:00   51336.000
2014-05-08 10:00:00   28855.000
2014-05-08 11:00:00   56543.000
2014-05-08 12:00:00   25249.000
Name: Volume, dtype: float64

只有方法2才能显示出体积数据的准确输出。

如何更改方法1,为方法2提供相同的Volume输出,但使用groupby代替resample?我不确定如何在方法1中使用ohlc_dict,并认为这是必需的。

0 个答案:

没有答案