条件滚动总和

时间:2016-10-31 11:52:55

标签: r dataframe rolling-sum

这是数据框

Date        ID cost Value
15/12/2016   1  yes    200
15/10/2016   1  yes    100
15/9/2016    1  yes    55
15/04/2016   1  yes    1000
15/12/2016   2  yes    300
15/10/2016   2  yes    200
15/9/2016    2  yes    100
15/04/2016   2  yes    1000
15/12/2016   3  no     300
15/10/2016   3  no     200
15/9/2016    3  no     100
15/04/2016   3  no     1000

我想在每个ID上重复3个月的滚动金额,这个ID的成本为="是"。请注意,在示例中,ID仅为3,但在我的数据库中为n。

输出应为

Date        ID  Value  Rolling_Sum
15/12/2016   1   200   355
15/10/2016   1   100   155
15/9/2016    1   55    55
15/04/2016   1   1000  1000
15/12/2016   2   300   600
15/10/2016   2   200   300
15/9/2016    2   100   100
15/04/2016   2   1000  1000

我在其他问题中看到过很多例子。我最大的问题之一是日期不会继续..所以我可以在不同数据之间有不同的延迟。

由于

1 个答案:

答案 0 :(得分:2)

您可以使用foverlaps - 包中的data.table函数:

library(data.table)
library(lubridate)

# convert the data to a 'data.table'
setDT(dt)
# convert the Date column to date-class
dt[, Date := as.Date(Date, '%d/%m/%Y')]
# create an exact same column to be used by the 'foverlaps' function
dt[, bdate := Date]
# create a reference 'data.table' with the 3 month intervals
dtc <- copy(dt)[, bdate := Date %m-% months(3)]
# set the keys for the reference data.table (needed for the 'foverlaps' function) 
setkey(dtc, ID, bdate, Date)
# create the overlaps and summarise
foverlaps(dt[cost=='yes'], dtc, type = 'within')[, .(val = sum(i.Value)), by = .(ID, Date)]

给出:

   ID       Date  val
1:  1 2016-12-15  355
2:  1 2016-10-15  155
3:  1 2016-09-15   55
4:  1 2016-04-15 1000
5:  2 2016-12-15  600
6:  2 2016-10-15  300
7:  2 2016-09-15  100
8:  2 2016-04-15 1000