data.table:在组内符号出现之前/之后进行标记

时间:2015-08-22 15:50:09

标签: r data.table

随意编辑此标题,使其更易于理解/推广......

我有一个 data.table 对象,其中3列形成了组(idid2 pol_loc)。在这些组中是行观察,每个组的某行会有一个星号或NA。我想有效地为行星的每一组制作一个指示栏,相对于星号(在-1之前,在0之后)。这是数据表的样子:

    id id2 pol_loc non_pol cluster_tag
 1:  1   1       3      do          NA
 2:  1   1       3     you          NA
 3:  1   1       3       *          NA
 4:  1   1       3      it          NA
 -------------------------------------
 5:  1   2       3     but           4
 6:  1   2       3       i          NA
 7:  1   2       3       *          NA
 8:  1   2       3  really           2
 9:  1   2       3     bad          NA
 -------------------------------------
10:  1   2       5     but           4
11:  1   2       5       i          NA
12:  1   2       5    hate          NA
13:  1   2       5  really           2
14:  1   2       5       *          NA
15:  1   2       5    dogs          NA
 -------------------------------------
16:  2   1       4       i          NA
17:  2   1       4      am          NA
18:  2   1       4     the          NA
19:  2   1       4       *          NA
20:  2   1       4  friend          NA
 -------------------------------------
21:  3   1       4      do          NA
22:  3   1       4     you          NA
23:  3   1       4  really           2
24:  3   1       4       *          NA
 -------------------------------------
25:  3   2      NA      NA          NA
    id id2 pol_loc non_pol cluster_tag

期望的输出

这是所需的输出:

    id id2 pol_loc non_pol cluster_tag   before
 1:  1   1       3      do          NA        1
 2:  1   1       3     you          NA        1
 3:  1   1       3       *          NA       NA
 4:  1   1       3      it          NA        0
 ----------------------------------------------
 5:  1   2       3     but           4        1
 6:  1   2       3       i          NA        1
 7:  1   2       3       *          NA       NA
 8:  1   2       3  really           2        0
 9:  1   2       3     bad          NA        0
 ----------------------------------------------
10:  1   2       5     but           4        1
11:  1   2       5       i          NA        1
12:  1   2       5    hate          NA        1
13:  1   2       5  really           2        1
14:  1   2       5       *          NA       NA
15:  1   2       5    dogs          NA        0
 ----------------------------------------------
16:  2   1       4       i          NA        1
17:  2   1       4      am          NA        1
18:  2   1       4     the          NA        1
19:  2   1       4       *          NA       NA
20:  2   1       4  friend          NA        0
 ----------------------------------------------
21:  3   1       4      do          NA        1
22:  3   1       4     you          NA        1
23:  3   1       4  really           2        1
24:  3   1       4       *          NA       NA
 ----------------------------------------------
25:  3   2      NA      NA          NA       NA
    id id2 pol_loc non_pol cluster_tag   before

MWE

dat <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L), 
    id2 = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L), pol_loc = c(3L, 
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 5L, 5L, 4L, 
    4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, NA), non_pol = c("do", "you", 
    "*", "it", "but", "i", "*", "really", "bad", "but", "i", 
    "hate", "really", "*", "dogs", "i", "am", "the", "*", "friend", 
    "do", "you", "really", "*", NA), cluster_tag = c(NA, NA, 
    NA, NA, "4", NA, NA, "2", NA, "4", NA, NA, "2", NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, "2", NA, NA)), row.names = c(NA, 
-25L), class = "data.frame", .Names = c("id", "id2", "pol_loc", 
"non_pol", "cluster_tag"))

library(data.table)

setDT(dat)

编辑如果它更容易或更有效,NA可以变为01它没有任何区别我猜这是更多高效。

1 个答案:

答案 0 :(得分:5)

尝试

dat[, before:=1-cumsum(non_pol=="*"), by=.(id, id2, pol_loc)][non_pol=="*", before:=NA,]