使用R标识向量内的字符串

时间:2017-04-01 20:41:02

标签: r string

任何人都可以帮我解决这个问题吗?

我有这样的矢量:

vec1 <- c("10F/I/V", "33F", "36I", "54A/L/M/S/T/V", "62V", "82A/C/F/G", "84V", "90M")

另一个像这样:

vec2 <- c("10F", "10L", "10I", "33G", "47A", "54A", "54T", "62V")

我想计算结果等于3,因为vec2有字符串&#34; 10F&#34;和&#34; 10I&#34;属于同一个字符串&#34; 10F / I / V&#34;,它还有字符串&#34; 54A&#34;和&#34; 54T&#34;属于&#34; 54A / L / M / S / T / V&#34;和#34; 62V&#34;。

谢谢!

1 个答案:

答案 0 :(得分:0)

您可以展开原始向量vec1以显式构建54A54L54M之类的字符串......一旦有了这些字符串,就可以找到任何模式和你去的时候叫“匹配”。

vec1 <- c("10F/I/V", "33F", "36I", "54A/L/M/S/T/V", "62V", "82A/C/F/G", "84V", "90M")

vec2 <- c("10F", "10L", "10I", "33G", "47A", "54A", "54T", "62V")


out1 <- mapply(x = gsub("(^\\d+)(.*$)", "\\1", vec1),
               y = strsplit(gsub("\\d+", "", vec1), "/"),
               FUN = function(x, y) {
                 # expands the vector to make comparisons easy using regex
                 xy <- paste(x, y, sep = "")
                 # finds individual combination
                 m <- sapply(xy, FUN = grepl, vec2)
                 # finds if strings appears in original element (element is a column)
                 apply(m, MARGIN = 2, any)  
               })

> sapply(out1, any)
   10    33    36    54    62    82    84    90 
 TRUE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE 
> sum(sapply(out1, any))
[1] 3