创建新列以显示dplyr中字符串的部分匹配

时间:2018-08-13 11:44:57

标签: r dplyr

我正在尝试创建一个新列,以显示我的数据框中两列中的字符串是否匹配。 This question几乎是我要问的,但我不想创建过滤条件,而是想创建一个新列来显示是否存在匹配项(TRUE或FALSE)。

这是一个示例数据框:

 transcript        target
 he saw the dog    saw
 she gave them it  gave
 watch out for     danger
 real bravery      brave

我想创建一个新列来显示两者之间的任何匹配项:

 transcript        target    match
 he saw the dog    saw        T
 she gave them it  gave       T
 watch out for     danger     F
 real bravery      brave      T

我更喜欢使用dplyr(),但愿意接受其他建议!

3 个答案:

答案 0 :(得分:3)

使用stringr::str_detect,我们可以检查transcript是否包含target

library(stringr)
library(dplyr)
df %>% mutate_if(is.factor, as.character) %>%    #If transcript and target are character class  in your df then no need to this step
       mutate(match = str_detect(transcript,target))


         transcript target match
1   he saw the dog    saw  TRUE
2 she gave them it   gave  TRUE
3    watch out for danger FALSE
4     real bravery  brave  TRUE

答案 1 :(得分:2)

您要求使用dplyr方法,但这也是使用grepl的基本R方法:

df1$match <- mapply(grepl, df1$target, df1$transcript)

df1
        transcript target match
1   he saw the dog    saw  TRUE
2 she gave them it   gave  TRUE
3    watch out for danger FALSE
4     real bravery  brave  TRUE

在dplyr mutate语句中使用grepl

df1 %>% 
  mutate(match = mapply(grepl, target, transcript))

        transcript target match
1   he saw the dog    saw  TRUE
2 she gave them it   gave  TRUE
3    watch out for danger FALSE
4     real bravery  brave  TRUE

答案 2 :(得分:1)

可以选择使用dplyr::rowwise()grepl来创建匹配列,如下所示:

library(dplyr)

df %>% rowwise() %>%
  mutate(match  = grepl(target,transcript)) %>%
  as.data.frame()

#         transcript target match
# 1   he saw the dog    saw  TRUE
# 2 she gave them it   gave  TRUE
# 3    watch out for danger FALSE
# 4     real bravery  brave  TRUE

数据:

df <- read.table(text = 
"transcript        target
'he saw the dog'    saw
'she gave them it'  gave
'watch out for'     danger
'real bravery'      brave",
header = TRUE, stringsAsFactors = FALSE)