如何反转滚动总和?

时间:2017-03-29 18:42:53

标签: python pandas

我在分组数据框架上计算了滚动总和,但是当我需要过去的总和时,它是错误的方式,它是未来的总和。

我在这里做错了什么?

我导入数据并按维度和日期排序(我已尝试删除日期排序)

df = pd.read_csv('Input.csv', parse_dates=True)
df.sort_values(['Dimension','Date'])
print(df)

然后我创建一个新列,它是一个按滚动窗口分组的多索引

new_column = df.groupby('Dimension').Value1.apply(lambda x: 
x.rolling(window=3).sum())

然后我将索引重置为与原始

相同
df['Sum_Value1'] = new_column.reset_index(level=0, drop=True)
print(df)

我还尝试在计算之前反转索引,但也失败了。

输入

Dimension,Date,Value1,Value2
1,4/30/2002,10,20
1,1/31/2002,10,20
1,10/31/2001,10,20
1,7/31/2001,10,20
1,4/30/2001,10,20
1,1/31/2001,10,20
1,10/31/2000,10,20
2,4/30/2002,10,20
2,1/31/2002,10,20
2,10/31/2001,10,20
2,7/31/2001,10,20
2,4/30/2001,10,20
2,1/31/2001,10,20
2,10/31/2000,10,20
3,4/30/2002,10,20
3,1/31/2002,10,20
3,10/31/2001,10,20
3,7/31/2001,10,20
3,1/31/2001,10,20
3,10/31/2000,10,20

输出:

    Dimension        Date  Value1  Value2  Sum_Value1
0           1   4/30/2002      10      20         NaN
1           1   1/31/2002      10      20         NaN
2           1  10/31/2001      10      20        30.0
3           1   7/31/2001      10      20        30.0
4           1   4/30/2001      10      20        30.0
5           1   1/31/2001      10      20        30.0
6           1  10/31/2000      10      20        30.0
7           2   4/30/2002      10      20         NaN
8           2   1/31/2002      10      20         NaN
9           2  10/31/2001      10      20        30.0
10          2   7/31/2001      10      20        30.0
11          2   4/30/2001      10      20        30.0
12          2   1/31/2001      10      20        30.0
13          2  10/31/2000      10      20        30.0

目标输出:

    Dimension        Date  Value1  Value2  Sum_Value1
0           1   4/30/2002      10      20        30.0
1           1   1/31/2002      10      20        30.0
2           1  10/31/2001      10      20        30.0
3           1   7/31/2001      10      20        30.0
4           1   4/30/2001      10      20        30.0
5           1   1/31/2001      10      20         NaN
6           1  10/31/2000      10      20         NaN
7           2   4/30/2002      10      20        30.0
8           2   1/31/2002      10      20        30.0
9           2  10/31/2001      10      20        30.0
10          2   7/31/2001      10      20        30.0
11          2   4/30/2001      10      20        30.0
12          2   1/31/2001      10      20         Nan
13          2  10/31/2000      10      20         NaN

5 个答案:

答案 0 :(得分:4)

你需要一个向后的总和,因此在总结它之前反转你的系列:

lambda x: x[::-1].rolling(window=3).sum()

答案 1 :(得分:2)

向后滚动与向前滚动相同,然后移动结果:

x.rolling(window=3).sum().shift(-2)

答案 2 :(得分:1)

您可以按window-1移动结果以获得左对齐结果:

df["sum_value1"] = (df.groupby('Dimension').Value1
                      .apply(lambda x: x.rolling(window=3).sum().shift(-2)))

enter image description here

答案 3 :(得分:0)

def reverse_rolling(series, window, func):
    index = series.index
    series = pd.DataFrame(series.iloc[::-1])
    series = series.rolling(window, 1).apply(func)
    series = series.iloc[::-1]

    series['index'] = index
    series = series.set_index('index')
    return series[0]

答案 4 :(得分:0)

你可以使用

import pandas as pd

from pandas.api.indexers import FixedForwardWindowIndexer

df = pd.read_csv(r'C:\Users\xxxx\python\data.txt')

indexer = FixedForwardWindowIndexer(window_size=3)

df1 = df.join(df.groupby('Dimension')['Value1'].rolling(indexer, min_periods=3).sum().to_frame().reset_index(), rsuffix='_sum')

del df1['Dimension_sum']
del df1['level_1']

df1

输入:

    Dimension        Date  Value1  Value2
0           1   4/30/2002      10      20
1           1   1/31/2002      10      20
2           1  10/31/2001      10      20
3           1   7/31/2001      10      20
4           1   4/30/2001      10      20
5           1   1/31/2001      10      20
6           1  10/31/2000      10      20
7           2   4/30/2002      10      20
8           2   1/31/2002      10      20
9           2  10/31/2001      10      20
10          2   7/31/2001      10      20
11          2   4/30/2001      10      20
12          2   1/31/2001      10      20
13          2  10/31/2000      10      20
14          3   4/30/2002      10      20
15          3   1/31/2002      10      20
16          3  10/31/2001      10      20
17          3   7/31/2001      10      20
18          3   1/31/2001      10      20
19          3  10/31/2000      10      20

输出:

    Dimension        Date  Value1  Value2  Value1_sum
0           1   4/30/2002      10      20        30.0
1           1   1/31/2002      10      20        30.0
2           1  10/31/2001      10      20        30.0
3           1   7/31/2001      10      20        30.0
4           1   4/30/2001      10      20        30.0
5           1   1/31/2001      10      20         NaN
6           1  10/31/2000      10      20         NaN
7           2   4/30/2002      10      20        30.0
8           2   1/31/2002      10      20        30.0
9           2  10/31/2001      10      20        30.0
10          2   7/31/2001      10      20        30.0
11          2   4/30/2001      10      20        30.0
12          2   1/31/2001      10      20         NaN
13          2  10/31/2000      10      20         NaN
14          3   4/30/2002      10      20        30.0
15          3   1/31/2002      10      20        30.0
16          3  10/31/2001      10      20        30.0
17          3   7/31/2001      10      20        30.0
18          3   1/31/2001      10      20         NaN
19          3  10/31/2000      10      20         NaN