使用grepl从向量中排除多个单词

时间:2015-05-20 09:55:27

标签: r

这里有样本数据:

exclude.words <- c("zoznam","azet","dovera","joj","alza","telecom","google","post","sme")

main.data <- c("zoznam","registration","azet","azet.com","dovera","dna","joj","alza","telecom","google","post","sme")

如果单词相同(完全匹配),则可以使用此功能,但请参阅不会被排除的azet.com!为此,我们可以使用agrepl()

main.data[!(main.data %in% exclude.words)]

那么如何将agrepl与两个向量一起使用?

main.data[!agrepl(main.data, exclude.words)]

3 个答案:

答案 0 :(得分:1)

main.data[!as.logical(rowSums(sapply(exclude.words, function(x) agrepl(x, main.data))))]
# [1] "registration" "dna"


# clarification
sapply(exclude.words, function(x) agrepl(x, main.data))
#       zoznam  azet dovera   joj  alza telecom google  post   sme
#  [1,]   TRUE FALSE  FALSE FALSE FALSE   FALSE  FALSE FALSE FALSE
#  [2,]  FALSE FALSE  FALSE FALSE FALSE   FALSE  FALSE FALSE FALSE
#  [3,]  FALSE  TRUE  FALSE FALSE FALSE   FALSE  FALSE FALSE FALSE
#  [4,]  FALSE  TRUE  FALSE FALSE FALSE   FALSE  FALSE FALSE FALSE
#  [5,]  FALSE FALSE   TRUE FALSE FALSE   FALSE  FALSE FALSE FALSE
#  [6,]  FALSE FALSE  FALSE FALSE FALSE   FALSE  FALSE FALSE FALSE
#  [7,]  FALSE FALSE  FALSE  TRUE FALSE   FALSE  FALSE FALSE FALSE
#  [8,]  FALSE FALSE  FALSE FALSE  TRUE   FALSE  FALSE FALSE FALSE
#  [9,]  FALSE FALSE  FALSE FALSE FALSE    TRUE  FALSE FALSE FALSE
# [10,]  FALSE FALSE  FALSE FALSE FALSE   FALSE   TRUE FALSE FALSE
# [11,]  FALSE FALSE  FALSE FALSE FALSE   FALSE  FALSE  TRUE FALSE
# [12,]  FALSE FALSE  FALSE FALSE FALSE   FALSE  FALSE FALSE  TRUE

答案 1 :(得分:1)

如评论所述,您可以使用:

main.data[!grepl(paste(exclude.words, collapse = "|"), main.data)]

排除main.data和exclude.words之间部分或完全匹配的任何字词。

paste(exclude.words, collapse = "|")

使用&#34; |&#34;创建单个字符串exclude.words之间的(逻辑OR),可以在grepl中用作单个模式。因此,您不需要循环单个单词。

答案 2 :(得分:1)

您可以使用此函数式编程方法:

library(functional)

funcs = lapply(exclude.words, function(u) function(x) x[!grepl(u, x)])

Reduce(Compose, funcs)(main.data)
#[1] "registration" "dna"