根据以前的行值更新列

时间:2016-06-30 14:36:40

标签: r dplyr

我有这个数据帧df1。

    User|Date|Index|
    a   |1   |1    |
    a   |1   |2    |
    a   |1   |3    |
    a   |1   |0    |
    a   |1   |5    |
    a   |1   |6    |
    a   |2   |0    |
    b   |4   |1    |
    b   |4   |2    |
    b   |4   |3    |

我想以下列方式更新Index列:

  1. 按用户,日期分组数据;
  2. 假设行正确排序;
  3. 查看列索引,找到0值时,将其更新为1,并更正以下行,根据前一行递增1,直到找到另一个0。
  4. 我已将其缩小到这个范围,但我不确定更换部件的完成程度是多么完整。

        df1 %>%
        group_by(User, Date) %>%
        mutate(Index = replace(Index,)
    

    有人能帮助我吗?

    编辑: 上面的数据框是一种简化。这是代码。

        df1 <-structure(list(User = c(2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3,3), 
        Date = c(16864, 16864, 16864, 16864, 16864, 16879, 16879,16879, 16879, 16879, 16879, 16879, 16879, 16879), 
        Index = c(16,17, 0, 19, 20, 1, 2, 3, 0, 5, 0, 0, 8, 9)), 
        class = "data.frame", .Names = c("User","Date", "Index"), row.names = c(NA, -14L))
    

    这是目前的外观:

        User|Date    |Index|
        2   |16864   |16   |
        2   |16864   |17   |
        2   |16864   |0    |
        2   |16864   |19   |
        2   |16864   |20   |
        3   |16879   |1    |
        3   |16879   |2    |
        3   |16879   |3    |
        3   |16879   |0    |
        3   |16879   |5    |
        3   |16879   |0    |
        3   |16879   |0    |
        3   |16879   |8    |
        3   |16879   |9    |
    

    所需的输出是:

        User|Date    |Index|
        2   |16864   |16   |
        2   |16864   |17   |
        2   |16864   |1    |
        2   |16864   |2    |
        2   |16864   |3    |
        3   |16879   |1    |
        3   |16879   |2    |
        3   |16879   |3    |
        3   |16879   |1    |
        3   |16879   |2    |
        3   |16879   |1    |
        3   |16879   |1    |
        3   |16879   |2    |
        3   |16879   |3    |
    

1 个答案:

答案 0 :(得分:3)

可能有一种更聪明的方法来实现这一目标,但这是我尝试使用自定义功能

myfun <- function(x)  { 
  indx <- which(x == 0L)
  c(x[1L:(indx[1L] - 1L)], sequence(c(diff(indx), length(x) - last(indx) + 1L)))
}

df1 %>%
  group_by(User, Date) %>%
  mutate(Index = myfun(Index))

# Source: local data frame [14 x 3]
# Groups: User, Date [2]
#     User  Date Index
#    (dbl) (dbl) (dbl)
# 1      2 16864    16
# 2      2 16864    17
# 3      2 16864     1
# 4      2 16864     2
# 5      2 16864     3
# 6      3 16879     1
# 7      3 16879     2
# 8      3 16879     3
# 9      3 16879     1
# 10     3 16879     2
# 11     3 16879     1
# 12     3 16879     1
# 13     3 16879     2
# 14     3 16879     3