如何根据分组变量分配变量?

时间:2019-10-11 17:06:56

标签: r dplyr

我有以下数据框

filenumber<-c('510-1','510-1','510-2','510-3','510-3')
Year<-c('2017','2018','2018','2018','2019')
outcome<-c('Accepted',"Completed","Accepted","Accepted","Completed")

df<-data.frame(filenumber,Year,outcome)

我想确保如果给定filenumber中的AcceptedYear,我将与该Year关联的所有文件都命名为“ cohort”,后跟被接受的年份

df%>%group_by(filenumber)%>%mutate(cohort=case_when(Year=='2017' & outcome=='Accepted'~'cohort-2017',
                                                    Year=='2018' & outcome=='Accepted'~'cohort-2018'))

 filenumber Year  outcome   cohort     
 510-1      2017  Accepted  cohort-2017
 510-1      2018  Completed NA         
 510-2      2018  Accepted  cohort-2018
 510-3      2018  Accepted  cohort-2018
 510-3      2019  Completed NA     

但是,我想确保该同类群组适用于以Accepted作为结果的文件号,以便我可以这样做

 filenumber Year  outcome   cohort     
 510-1      2017  Accepted  cohort-2017
 510-1      2018  Completed cohort-2017         
 510-2      2018  Accepted  cohort-2018
 510-3      2018  Accepted  cohort-2018
 510-3      2019  Completed cohort-2018     

我该怎么做

1 个答案:

答案 0 :(得分:0)

我们可以从fill开始使用tidyr

library(dplyr)
library(tidyr)
df%>%
  group_by(filenumber)%>%mutate(cohort=case_when(Year=='2017' & 
    outcome=='Accepted'~'cohort-2017',
               Year=='2018' & outcome=='Accepted'~'cohort-2018')) %>% 
  fill(cohort)
# A tibble: 5 x 4
# Groups:   filenumber [3]
#  filenumber Year  outcome   cohort     
#  <fct>      <fct> <fct>     <chr>      
#1 510-1      2017  Accepted  cohort-2017
#2 510-1      2018  Completed cohort-2017
#3 510-2      2018  Accepted  cohort-2018
#4 510-3      2018  Accepted  cohort-2018
#5 510-3      2019  Completed cohort-2018

它也可以简化。在按“文件编号”分组之后,match在“结果”上的“接受”字符串以获取数字索引,基于该索引,将“年份”和paste的子集“ cohort-”字符串作为子集创建“同类群组”列

library(stringr)
df %>% 
    group_by(filenumber) %>% 
    mutate(cohort = str_c('cohort-', Year[match('Accepted', outcome)]))
相关问题