根据其他列的内容汇总列

时间:2019-07-16 10:15:36

标签: r dplyr

我正在尝试编写一个代码,用于比较同一数据帧中的两列,并使用摘要创建一个新列,该列将说明ID是否在审阅发生之前注册。

这是我的数据框:

tt <- structure(list(ID = c("P40", "P40", "P40", "P42", "P42", "P43", "P43",
                      "P44", "P44"),Type = c("Pre-Initial", "Review", "Review", "Initial", "Review", "Initial", "Review", "Pre-Initial", "Review"),
               Registered = c("Yes", "", "", "No", "", "Yes", "", "No", "")),
          class = "data.frame", row.names = c(NA, -9L))

我想要实现的结果:

ID  Outcome
P40 Yes
P42 No
P43 Yes
P44 No

这是我尝试过的代码,但仅对所有ID显示否

tt %>% group_by(ID) %>%
    summarise(outcome = c("No", "Yes")[all(Registered == "Yes" & Type == "Review") + 1])

3 个答案:

答案 0 :(得分:2)

可以尝试:

tt %>%
  group_by(ID) %>%
  summarise(
    Outcome = c("No", "Yes")[any(Type == "Review" & cumsum(Registered == "Yes") == 1) + 1]
  ) 

输出:

# A tibble: 4 x 2
  ID    Outcome
  <chr> <chr>  
1 P40   Yes    
2 P42   No     
3 P43   Yes    
4 P44   No  

请注意,这假设Yes的{​​{1}}每隔Registered只发生一次。否则,只需将ID替换为cumsum(Registered == "Yes") == 1

答案 1 :(得分:2)

另一个dplyr变体,如果"No"中没有Registered的值,则返回"Yes",或者将其发生索引与{{1} },并据此分配值。

"Review"

答案 2 :(得分:0)

我不确定您的预期结果是什么,但是从您的描述看来,Type == 'Review'行根本无关紧要:您需要删除它们,然后删除该列(并重命名Registered列):

tt %>%
    filter(Type != 'Review') %>%
    select(- Type, Outcome = Registered)