R中两列之间的部分字符串匹配

时间:2020-03-25 20:45:05

标签: r string-matching

我正在尝试验证列表电子邮件是否正确。我当时想可以在Email和Name列之间进行部分字符串匹配,然后在新列中返回逻辑向量(TRUE / FALSE)。

在下面的示例中,只有第3行和第5行具有正确的电子邮件,并且这些行的输出将为'TRUE'。我尝试了以下操作,但没有成功:

>for (i in Test$LastName) {
 Test$Match <- agrepl(i, Test$Email, ignore.case = TRUE)
}

>Test$Email %in% Test$LastName

也欢迎其他任何建议。谢谢!

enter image description here

3 个答案:

答案 0 :(得分:2)

R的基本选项是使用grepl + mapply

Test <- within(Test, Match <- mapply(grepl,paste(FirstNmae,LastName,sep = "|"),Email,ignore.case = TRUE))

这样

> Test
  FirstNmae LastName                    Email Match
1    Audrey      Low         T.Rose@gmail.com FALSE
2     Tammy     Rose          A.Low@gmail.com FALSE
3    Stacey     Lock     stacy.lock@gmail.com  TRUE
4    Judson   Porter beth.mccormick@gmail.com FALSE
5    Kellie     Sims         k.sims@gmail.com  TRUE

数据

Test <- data.frame(FirstNmae = c("Audrey","Tammy","Stacey","Judson","Kellie"),
                 LastName = c("Low","Rose","Lock","Porter","Sims"),
                 Email = c("T.Rose@gmail.com","A.Low@gmail.com","stacy.lock@gmail.com","beth.mccormick@gmail.com","k.sims@gmail.com"))

答案 1 :(得分:1)

尝试类似的方法?您快到了,只需要将TRUE / FALSE存储在向量中即可。我使用sapply,遍历行名并比较相应的列。在sapply中,结果存储在向量中,因此您可以将其用作TRUE / FALSE:

test = data.frame(FirstName=c("Audrey","Tammy","Stacey","Judson","Kellie"),
LastName=c("Low","Rose","Lock","Porter","Sims"),
Email=c("T.Rose@gmail.com","A.Low@gmail.com","stacy.lock@gmail.com","beth.mccormick@gmail.com","k.sims@gmail.com"))

matches = sapply(1:nrow(test),function(i)agrepl(test$LastName[i],test$Email[i]))

test[matches,]

  FirstName LastName                Email
3    Stacey     Lock stacy.lock@gmail.com
5    Kellie     Sims     k.sims@gmail.com

答案 2 :(得分:1)

尝试一下:

DF <- data.frame(FirstName = c("Audrey","Tammy","Stacey","Judson","Kellie"),
                 LastName = c("Low","Rose","Lock","Porter","Sims"),
                 Email = c("T.Rose@gmail.com","A.Low@gmail.com","stacy.lock@gmail.com","beth.mccormick@gmail.com","k.sims@gmail.com"))
library(dplyr)

DF %>% 
  rowwise() %>%
  mutate(isMatch = grepl(LastName, Email, ignore.case = T))

输出:

  FirstName LastName Email                    isMatch    
  <fct>     <fct>    <fct>                    <lgl>
1 Audrey    Low      T.Rose@gmail.com         FALSE
2 Tammy     Rose     A.Low@gmail.com          FALSE
3 Stacey    Lock     stacy.lock@gmail.com     TRUE 
4 Judson    Porter   beth.mccormick@gmail.com FALSE
5 Kellie    Sims     k.sims@gmail.com         TRUE