使用data.table和group by滚动n期平均值

时间:2019-04-16 17:44:23

标签: r dplyr data.table

假设我有以下data.table

library(data.table)
foo <- data.table(accountid = c(rep(1, 366), rep(2, 366), rep(3, 366)), 
                  orderday = as.POSIXct(rep(seq(as.Date("2018/1/1"), as.Date("2019/1/1"), "days"), 3)),
                  ordervolume = rep(rnorm(366), 3))

> foo
      accountid            orderday ordervolume
   1:         1 2017-12-31 16:00:00  -1.1675551
   2:         1 2018-01-01 16:00:00   0.7074944
   3:         1 2018-01-02 16:00:00  -1.4654386
   4:         1 2018-01-03 16:00:00   0.5341484
   5:         1 2018-01-04 16:00:00   0.8196739
  ---                                          
1094:         3 2018-12-27 16:00:00  -0.4877347
1095:         3 2018-12-28 16:00:00   1.3994610
1096:         3 2018-12-29 16:00:00  -1.6502108
1097:         3 2018-12-30 16:00:00   0.5593474
1098:         3 2018-12-31 16:00:00   1.0878634

我想要一个列,其中每个accountid都采用ordervolume列的n周期均值。有没有办法用data.tabledplyr有效地做到这一点?

注意:orderday在此示例中按升序排列,但在我的实际数据中可能不是(随机的)。这有关系吗?

编辑:

我尝试了以下操作:

foo[, ordervolume30ma := rollmean(ordervolume, k = 30, align = "right"), by = .(accountid)]

但收到此错误:

Error in `[.data.table`(foo, , `:=`(ordervolume30ma, rollmean(ordervolume,  : 
  Supplied 337 items to be assigned to group 1 of size 366 in column 'ordervolume30ma'. The RHS length must either be 1 (single values are ok) or match the LHS length exactly. If you wish to 'recycle' the RHS please use rep() explicitly to make this intent clear to readers of your code.

我在做什么错了?

0 个答案:

没有答案