如何使用熊猫对时间序列数据进行重新采样

时间:2018-07-09 15:25:04

标签: python pandas

我正尝试从15分钟到每周重新采样时间序列数据。但这无法解决问题,我阅读了文档和许多相关问题,但不理解。

我的代码如下

Date    Actual   Forecast   Demand
0   01/01/2017 00:00 1049 1011.0 2922 
1   01/01/2017 00:15 961 1029.0 2892 
2   01/01/2017 00:30 924 1048.0 2858 
3   01/01/2017 00:45 852 1066.0 2745 

原始数据如下

Date
2017-01-01    01/01/2017 00:0001/01/2017 00:1501/01/2017 00:...
2017-01-08    01/02/2017 00:0001/02/2017 00:1501/02/2017 00:...
2017-01-15    01/09/2017 00:0001/09/2017 00:1501/09/2017 00:...
2017-01-22    16/01/2017 00:0016/01/2017 00:1516/01/2017 00:...

重新采样后,数据变成这样

{{1}}

我只想每周分别汇总“实际”,“预测”和“需求”,您知道我做错了吗?

1 个答案:

答案 0 :(得分:2)

您要在仅包含resample变量作为字符串的pd.Series上调用Date,因此pandas通过在每一行中将它们连接在一起来总结这些字符串。更改此:

Wind_Weekly = Wind['Date'].resample('W').sum() 

对此:

Wind_Weekly = Wind.resample('W').sum()
# Next also works, and removes Date column from the resulting sum
Wind_Weekly = Wind.resample('W')['Actual', 'Forecast', 'Demand'].sum() 

调用Wind['Date']将返回一个pd.Series,它仅包含转换为datetime之前的日期。因此,实际上没有ActualForecastDemand变量传递给resample调用。

您可以检查:

>>> type(Wind['Date'])
<class 'pandas.core.series.Series'>

为了进行测试,我用以下代码重现了您的问题:

import pandas as pd
import numpy as np

rng = pd.date_range('1/1/2012', periods=100, freq='D')
df = pd.DataFrame( # Construct df with a datetime index and some numbers
    {'ones': np.ones(100), 'twos': np.full(100, 2), 'zeros': np.zeros(100)}, 
    index=rng
)
df['Date'] = rng.astype(str) # re-add index as a str

在口译员中:

>>> df.resample('W').sum() # works out of the box
            ones  twos  zeros
2012-01-01   1.0     2    0.0
2012-01-08   7.0    14    0.0
2012-01-15   7.0    14    0.0
...

>>> df['Date'].resample('W').sum() # same result as you, only resample 'Date' column
2012-01-01                                           2012-01-01
2012-01-08    2012-01-022012-01-032012-01-042012-01-052012-0...
2012-01-15    2012-01-092012-01-102012-01-112012-01-122012-0...
2012-01-22    2012-01-162012-01-172012-01-182012-01-192012-0...
...