如何减去数据框中的字符串数

时间:2019-12-22 12:58:24

标签: r dataframe split

如果您能为我的问题分享一些帮助,那就太好了。基本上我的数据集有点不同。看起来如下。

            1                  2
1   [34, 67], [17, 76]       [17, 76], , , , , , 

我想摆脱“ [”,“]”和多余的“,”,并做一个数字向量。

理想情况下,其外观应如下所示

            1               2
1   "[34, 67]", "[17, 76]"     "[17, 76]"

          1               2
1   "34, 67", "17, 76"     "17, 76"

我尝试以下

a=trimws(df[1,1])
a=unlist(strsplit(a, split=", "))

,但返回“ [34”,“ 67””,“ [17”,“ 76]”。有没有简单的方法可以做到这一点?

这是我从dput()获得的示例:

structure(list(rse1e = structure(c(3L, 7L), .Label = c("", ", , , , , , ", 
"[118, 25], [17, 76], [56, 56], [34, 67], , , ", "[17, 76], , , , , , ", 
"[34, 67], [118, 25], [17, 76], [0, 84], [84, 42], [56, 56], [151, 8]", 
"[34, 67], [168, 0], , , , , ", "[56, 56], [0, 84], [34, 67], [168, 0], [151, 8], , ", 
"[56, 56], [118, 25], [0, 84], , , , ", "{\"ImportId\":\"rse1e\"}", 
"rse1e"), class = "factor"), rse2e = structure(6:7, .Label = c("", 
", , , , , , , ", "[0, 54], [173, 11], [22, 49], [108, 27], [86, 32], [43, 43], [130, 22], [216, 0]", 
"[108, 27], [0, 54], , , , , , ", "[151, 16], [216, 0], [108, 27], , , , , ", 
"[22, 49], [108, 27], [86, 32], [151, 16], , , , ", "[43, 43], [108, 27], [173, 11], [130, 22], [0, 54], , , ", 
"[86, 32], , , , , , , ", "{\"ImportId\":\"rse2e\"}", "rse2e"
), class = "factor")), row.names = 15:16, class = "data.frame")

3 个答案:

答案 0 :(得分:1)

不太确定您的数据是什么样子,但是可以像这样删除括号并按|进行拆分:

f <- "1 [34, 67], [17, 76] | [17, 76]"
[1] "1 [34, 67], [17, 76] | [17, 76]"
# remove the brackets
gsub("\\[|\\]", "", f)
[1] "1 34, 67, 17, 76 | 17, 76"
# split by |, we need unlist here since strsplit() returns a list
unlist(strsplit(a, "(?<=[|])", perl = TRUE))
[1] "1 34, 67, 17, 76 |" " 17, 76"  

如果您不想保留|作为分隔符,则可以执行以下操作:

unlist(strsplit(a, "[|]", perl = TRUE))
[1] "1 34, 67, 17, 76 " " 17, 76"

答案 1 :(得分:0)

您可以尝试

df[]<-trimws(gsub("\\[|\\]|,","",as.matrix(df)))

如此

> df
                          rse1e                           rse2e
15     118 25 17 76 56 56 34 67       22 49 108 27 86 32 151 16
16 56 56 0 84 34 67 168 0 151 8 43 43 108 27 173 11 130 22 0 54

编辑: 用括号将字符串分割

s <- "[34, 67], [118, 25], [17, 76], [0, 84], [84, 42], [56, 56], [151, 8]"
> unlist(regmatches(s,gregexpr("\\[.*?\\]",s)))
[1] "[34, 67]"  "[118, 25]" "[17, 76]"  "[0, 84]"   "[84, 42]"  "[56, 56]"  "[151, 8]" 

答案 2 :(得分:0)

我们还可以删除所有不是数字的空格字符。

df[] <- trimws(gsub('\\D', ' ', unlist(df)))

要获得不同列的输出,我们可以使用cSplit

splitstackshape::cSplit(df, names(df), sep = " ")
相关问题