无法找到一种方法来传递最后的良好价值

时间:2019-06-25 17:01:23

标签: r zoo

这有点难以描述,但我会试一试。 假设我有以下动物园对象:

a <- read.zoo(data.frame(date=as.Date('2011-1-1') + 0:59, closest.idx=c(rep(1,20), rep(2, 20), rep(3, 20)), is.good=c(rep(1,20), rep(1,20), rep(0, 20)), val=c(rep(.2, 6), rep(.3, 14), rep(.4, 6), rep(.5, 14), rep(.6, 6), rep(.7, 14))), FUN = as.Date)
           closest.idx is.good val
2011-01-01          1       1 0.2
2011-01-02          1       1 0.2
2011-01-03          1       1 0.2
2011-01-04          1       1 0.2
2011-01-05          1       1 0.2
2011-01-06          1       1 0.2
2011-01-07          1       1 0.3
2011-01-08          1       1 0.3
2011-01-09          1       1 0.3
2011-01-10          1       1 0.3
...

我想记下最后一个好的“ val”。规则如下:

  1. 无论值是多少,每个组的前6行都不应更改。
  2. 如果is.good = 0,则更改下一行。如果is.good = 0,则将val更改为last.good.val)
  3. 最后一个有效值是is.good = 1且出现在该组的第7行或更高行上的
  

注意#1:不要假设一个组中总共有20行-可以是任意数量

     

注2:您可以假设不应该触摸每组的前6行

因此在此示例中,

2011-01-01 - 2011-01-06 will have a val of 0.2 (is.good = 1, < 6 rows into group so not last.good.val)
2011-01-07 - 2011-01-20 will have a val of 0.3 (is.good = 1, last.good.val = 0.3)
2011-01-21 - 2011-01-26 will have a val of 0.4 (is.good = 1, last.good.val = 0.3, < 6 rows into group so not last.good.val)
2011-01-27 - 2011-02-09 will have a val of 0.5 (is.good = 1, last.good.val = 0.5)
2011-02-10 - 2011-02-15 will have a val of 0.6 (b/c they are < 6 rows into the group so aren't affected)
2011-02-16 - 2011-03-01 will have a val of 0.5 (b/c 0.5 was the last good value and is.good = 0 in this group)

所以我希望我的输出看起来像这样:

           closestIdx is.good val
2011-01-01          1       1 0.2
2011-01-02          1       1 0.2
2011-01-03          1       1 0.2
2011-01-04          1       1 0.2
2011-01-05          1       1 0.2
2011-01-06          1       1 0.2
2011-01-07          1       1 0.3
2011-01-08          1       1 0.3
2011-01-09          1       1 0.3
...
2011-01-21          2       1 0.4
2011-01-22          2       1 0.4
2011-01-23          2       1 0.4
2011-01-24          2       1 0.4
2011-01-25          2       1 0.4
2011-01-26          2       1 0.4
2011-01-27          2       1 0.5
2011-01-28          2       1 0.5
2011-01-29          2       1 0.5
2011-01-30          2       1 0.5
2011-01-31          2       1 0.5
...
2011-02-10          3       0 0.6
2011-02-11          3       0 0.6
2011-02-12          3       0 0.6
2011-02-13          3       0 0.6
2011-02-14          3       0 0.6
2011-02-15          3       0 0.6
2011-02-16          3       0 0.5    <- notice these changed to last good value
2011-02-17          3       0 0.5
2011-02-18          3       0 0.5
...
  

注意:我更喜欢base-R解决方案,但其他软件包是   有趣

1 个答案:

答案 0 :(得分:1)

以下几种方法在本质上都相同:

  • 添加一列val_tofill,将所有非有效值替换为NA's
  • 使用许多可用方法之一来向前填充val_tofill,请参见例如Replacing NAs with latest non-NA value
  • 只要行号不是该组的前六个(由val分组)之一,就用val_tofill覆盖closest.idx

初始数据

a <- data.frame(
    date=as.Date('2011-1-1') + 0:59, 
    closest.idx=c(rep(1,20), rep(2, 20), rep(3, 20)), 
    is.good=c(rep(1,20), rep(1,20), rep(0, 20)), 
    val=c(rep(.2, 6), rep(.3, 14), rep(.4, 6), rep(.5, 14), rep(.6, 6), rep(.7, 14))
)

基本+动物园:: na.locf

a$val_tofill <- zoo::na.locf(ifelse(a$is.good > 0, a$val, NA))
a$val <- unlist(
    by(a, INDICES = a$closest.idx,
        FUN = function(x) ifelse(seq_len(nrow(x)) < 7, x$val, x$val_tofill)
    )
)
a$val_tofill <- NULL

dplyr + tidyr :: fill

library(tidyverse)

mutate(a, val_tofill = ifelse(is.good > 0, val, NA)) %>%
    fill(val_tofill, .direction = "down") %>%
    group_by(closest.idx) %>%
    mutate(val = ifelse(row_number() < 7, val, val_tofill)) %>%
    ungroup() %>%
    select(-val_tofill)

data.table + zoo :: na.locf

library(data.table)

a <- setDT(a)
a[, val_tofill := zoo::na.locf(ifelse(is.good > 0, val, NA))][,
    val := ifelse(seq_len(.N) < 7, val, val_tofill),
    by = closest.idx
]
a$val_tofill <- NULL