使用移动窗口处理pandas数据帧

时间:2017-10-26 08:02:43

标签: python-3.x pandas

以下面的数据帧为例:

                     a
2011-01-01 00:00:00  9
2011-01-01 01:00:00  4
2011-01-01 02:00:00  2
2011-01-01 03:00:00  5
2011-01-01 04:00:00  3
2011-01-01 05:00:00  7
2011-01-01 06:00:00  8
2011-01-01 07:00:00  4

我想使用移动窗口处理“a”列的块,使用下面的函数autoCorr()相应地生成“b”列。

def autoCorr(x):
    return np.corrcoef(x[:-1], x[1:])[0,1]

对于长度为5的移动窗口,最终结果应为:

                     a       b
2011-01-01 00:00:00  9  0.0000
2011-01-01 01:00:00  4  0.0000
2011-01-01 02:00:00  2  0.0000
2011-01-01 03:00:00  5  0.0000
2011-01-01 04:00:00  3 -0.0877    # autoCorr([9,4,2,5,3])
2011-01-01 05:00:00  7 -0.6404    # autoCorr([4,2,5,3,7])
2011-01-01 06:00:00  8  0.2880    # autoCorr([2,5,3,7,8])
2011-01-01 07:00:00  4 -0.1580    # autoCorr([5,3,7,8,4])

1 个答案:

答案 0 :(得分:1)

使用Rolling.apply + fillna

df = df.rolling(5).apply(autoCorr).fillna(0)
print (df)
                            a
2011-01-01 00:00:00  0.000000
2011-01-01 01:00:00  0.000000
2011-01-01 02:00:00  0.000000
2011-01-01 03:00:00  0.000000
2011-01-01 04:00:00 -0.087706
2011-01-01 05:00:00 -0.640445
2011-01-01 06:00:00  0.288136
2011-01-01 07:00:00 -0.157877
df = df.rolling(5).apply(lambda x: pd.Series(x).autocorr(1)).fillna(0)
print (df)

                            a
2011-01-01 00:00:00  0.000000
2011-01-01 01:00:00  0.000000
2011-01-01 02:00:00  0.000000
2011-01-01 03:00:00  0.000000
2011-01-01 04:00:00 -0.087706
2011-01-01 05:00:00 -0.640445
2011-01-01 06:00:00  0.288136
2011-01-01 07:00:00 -0.157877

编辑:

df1 = df.join(df.rolling(5).apply(autoCorr).fillna(0)['a'].rename('b'))
print (df1)
                     a         b
2011-01-01 00:00:00  9  0.000000
2011-01-01 01:00:00  4  0.000000
2011-01-01 02:00:00  2  0.000000
2011-01-01 03:00:00  5  0.000000
2011-01-01 04:00:00  3 -0.087706
2011-01-01 05:00:00  7 -0.640445
2011-01-01 06:00:00  8  0.288136
2011-01-01 07:00:00  4 -0.157877
相关问题