具有嵌套data.table语句的Ifelse语句

时间:2016-06-13 18:22:20

标签: r if-statement data.table dplyr

这是我的数据框架。

library(data.table)
     dt <- fread('
 Name     Video   Webinar Meeting Conference  Level  NextStep
  John       1         0        0       0      1     Webinar,Meeting,Conference
  John       1         1        0       0      1     Meeting,Conference
  John       1         1        1       0      2     Conference      
  Tom        0         0        1       0      1     Webinar,Conference,Video
  Tom        0         0        1       1      2     Webinar,Video   
  Kyle       0         0        0       1      2     Webinar,Meeting,Video

                                ')

我正在创建nextstep列

dt[, nextstep := paste0(names(.SD)[.SD==0], collapse = ','), 1:nrow(DT), .SDcols = 2:5][]

根据此处的解决方案Making a character string with column names with zero values

现在,我想根据“级别”字段更改元素在下一步列中的显示顺序。例如,如果它是1级,我希望会议在网络研讨会之前出现。会议。如果是2级,我希望视频总是最后出现。这是我的尝试。

 dt<-dt[, NextStep := ifelse(Level1=="Level0",
(paste0(names(.SD)[.SD==0], collapse = ';'), 1:nrow(dt), .SDcols = c(5,2,3,4)),
      ifelse(EngagementLevel1=="Level2",
(paste0(names(.SD)[.SD==0], collapse = ';'), 1:nrow(dt), .SDcols = c(3,4,5,2))))]

我只是想根据'Level'字段重新排序'nextstep'字段中的元素。真诚地感谢您的帮助!

1 个答案:

答案 0 :(得分:4)

好吧,你可以把你喜欢的订单放在某个地方:

levelmap = data.table(Level = 1:2, ord = list(
    c("Conference", "Webinar", "Meeting", "Video"), 
    c("Webinar", "Meeting", "Conference", "Video")
))

然后使用您之前的方法:

DT[, r := .I]
for (ii in seq(nrow(levelmap)))
    DT[ Level == levelmap$Level[ii], 
      ns := paste0(names(.SD)[.SD==0], collapse = ',')
    , by = r, .SDcols = levelmap$ord[[ii]] ][]

但实际上,我认为你根本不应该这样做(这个问题和前一个问题都没有)。处理数据是一种混乱的方式。

评论整洁的数据。为了澄清我的意思,我建议审核Hadley Wickham的paper on tidy data。这里整洁的数据可能如下所示:

myDT = melt(
  DT[, !"NextStep", with=FALSE][, Seq := 1:.N, by=Name], 
  id.var = c("Name", "Seq", "Level"))

    Name Seq Level   variable value
 1: John   1     1      Video     1
 2: John   2     1      Video     1
 3: John   3     2      Video     1
 4:  Tom   1     1      Video     0
 5:  Tom   2     2      Video     0
 6: Kyle   1     2      Video     0
 7: John   1     1    Webinar     0
 8: John   2     1    Webinar     1
 9: John   3     2    Webinar     1
10:  Tom   1     1    Webinar     0
11:  Tom   2     2    Webinar     0
12: Kyle   1     2    Webinar     0
13: John   1     1    Meeting     0
14: John   2     1    Meeting     0
15: John   3     2    Meeting     1
16:  Tom   1     1    Meeting     1
17:  Tom   2     2    Meeting     1
18: Kyle   1     2    Meeting     0
19: John   1     1 Conference     0
20: John   2     1 Conference     0
21: John   3     2 Conference     0
22:  Tom   1     1 Conference     0
23:  Tom   2     2 Conference     1
24: Kyle   1     2 Conference     1
    Name Seq Level   variable value

或者你甚至可能会丢弃所有零或者为零的行(因为它们相当冗余)。

这个想法是,这将是您用于进行任何分析或构建任何汇总表的主要数据。在你的情况下,目标是一个汇总表(据我所知),如

library(magrittr)
res = myDT[levelmap, on="Level"][, .( NextStep = 
  variable[value == 0] %>% factor(levels = ord[[1]]) %>% sort %>% toString
), keyby=.(Name, Seq, Level)]

   Name Seq Level                     NextStep
1: John   1     1 Conference, Webinar, Meeting
2: John   2     1          Conference, Meeting
3: John   3     2                   Conference
4: Kyle   1     2      Webinar, Meeting, Video
5:  Tom   1     1   Conference, Webinar, Video
6:  Tom   2     2               Webinar, Video

如果你真的想要0/1列,你也可以用dcast(将数据从长变换为宽)包含它们:

cbind(
  res, 
  dcast(myDT, Name + Seq ~ variable, value.var="value")[, !c("Name", "Seq"), with=FALSE])