在字符串替换中避免循环?

时间:2013-02-15 22:40:53

标签: r

我有数据,一个字符向量(最终我会崩溃它,所以我不在乎它是否仍然是一个向量,或者它是否被视为单个字符串),一个模式向量和一个向量更换。我希望数据中的每个模式都由其各自的替换替换。我用stringr和for循环完成了它,但是有更像R的方法吗?

require(stringr)
start_string <- sample(letters[1:10], 10)
my_pattern <- c("a", "b", "c", "z")
my_replacement <- c("[this was an a]", "[this was a b]", "[this was a c]", "[no z!]")
str_replace(start_string, pattern = my_pattern, replacement = my_replacement)
# bad lengths, doesn't work

str_replace(paste0(start_string, collapse = ""),
    pattern = my_pattern, replacement = my_replacement)
# vector output, not what I want in this case

my_result <- start_string
for (i in 1:length(my_pattern)) {
    my_result <- str_replace(my_result,
        pattern = my_pattern[i], replacement = my_replacement[i])
}
> my_result
 [1] "[this was a c]"  "[this was an a]" "e"               "g"               "h"               "[this was a b]" 
 [7] "d"               "j"               "f"               "i"   

# This is what I want, but is there a better way?

在我的情况下,我知道每个模式最多只会出现一次,但并不是每个模式都会出现。我知道如果模式可能不止一次出现,我可以使用str_replace_all;我希望解决方案也能提供这种选择。我还想要一个使用my_patternmy_replacement的解决方案,这样它就可以成为一个函数的一部分,并将这些向量作为参数。

2 个答案:

答案 0 :(得分:3)

我敢打赌还有另一种方法可以做到这一点,但我的第一个想法是 gsubfn

my_repl <- function(x){
    switch(x,a = "[this was an a]",
             b = "[this was a b]",
             c = "[this was a c]",
             z = "[this was a z]")
}

library(gsubfn)    
start_string <- sample(letters[1:10], 10)
gsubfn("a|b|c|z",my_repl,x = start_string)

如果您要搜索列表元素的可接受有效名称的模式,这也将起作用:

names(my_replacement) <- my_pattern
gsubfn("a|b|c|z",as.list(my_replacement),start_string)

修改

但是坦率地说,如果我真的不得不在我自己的代码中做很多事情,我可能会做一个包含在函数中的for循环事件。以下是使用subgsub的简单版本,而不是 stringr 中的函数:

vsub <- function(pattern,replacement,x,all = TRUE,...){
  FUN <- if (all) gsub else sub
  for (i in seq_len(min(length(pattern),length(replacement)))){
    x <- FUN(pattern = pattern[i],replacement = replacement[i],x,...)
  }
  x
}

vsub(my_pattern,my_replacement,start_string)

但是,当然,没有内置函数的原因之一就是众所周知的可能是这样的顺序替换可能不是很脆弱,因为它们依赖于顺序:

vsub(rev(my_pattern),rev(my_replacement),start_string)
 [1] "i"                                          "[this w[this was an a]s [this was an a] c]"
 [3] "[this was an a]"                            "g"                                         
 [5] "j"                                          "d"                                         
 [7] "f"                                          "[this w[this was an a]s [this was an a] b]"
 [9] "h"                                          "e"      

答案 1 :(得分:1)

以下是基于gregrexprregmatchesregmatches<-的选项。请注意,可匹配的正则表达式的长度存在限制,因此如果您尝试使用过多的长模式匹配,这将无效。

replaceSubstrings <- function(patterns, replacements, X) {
    pat <- paste(patterns, collapse="|")
    m <- gregexpr(pat, X)
    regmatches(X, m) <- 
        lapply(regmatches(X,m),
               function(XX) replacements[match(XX, patterns)])
    X
}

## Try it out
patterns <- c("cat", "dog")
replacements <- c("tiger", "coyote")
sentences <- c("A cat", "Two dogs", "Raining cats and dogs")
replaceSubstrings(patterns, replacements, sentences)
## [1] "A tiger"                    "Two coyotes"               
## [3] "Raining tigers and coyotes"