用一个第二个向量中指定的单词替换一个向量中的所有单词实例

时间:2017-06-29 16:34:40

标签: r

我正在尝试找到一种有效的方法,用删除列表中的单词删除输入列表中一组单词的所有实例。

 vectorOfWordsToRemove <- c('cat', 'monkey', 'wolf', 'mouses')
 vectorOfPhrases <- c('the cat and the monkey walked around the block', 'the wolf and the mouses ate lunch with the monkey', 'this should remain unmodified')
 remove_strings <- function(a, b) { stringr::str_replace_all(a,b, '')}
 remove_strings(vectorOfPhrases, vectorOfWordsToRemove)

我希望输出的是

vectorOfPhrases <- c('the and the walked around the block', 'the and the ate lunch with the', 'this should remain unmodified')

也就是说,vector-vectorOfWordsToRemove中所有单词的每个实例都应该在vectorOfPhrases中删除。

我可以使用for循环执行此操作,但它非常慢,似乎应该有一种矢量化方式来有效地执行此操作。

由于

2 个答案:

答案 0 :(得分:1)

首先,我将一个空字符串向量替换为:

vectorOfNothing <- rep('', 4)

然后使用qdap库用替换向量替换模式向量:

library(qdap)
vectorOfPhrases <- qdap::mgsub(vectorOfWordsToRemove, 
                               vectorOfNothing, 
                               vectorOfPhrases)

> vectorOfPhrases
[1] "the and the walked around the block" "the and the ate lunch with the"     

[3] "this should remain unmodified"

答案 1 :(得分:1)

您可以使用gsubfn()

library(gsubfn)
replaceStrings <- as.list(rep("", 4))
newPhrases <- gsubfn("\\S+", setNames(replaceStrings, vectorOfWordsToRemove), vectorOfPhrases)

> newPhrases
[1] "the and the walked around the block" "the and the ate lunch with the"     
[3] "this should remain unmodified"