从strsplit列表中提取向量而不使用循环

时间:2012-01-24 23:59:39

标签: r

考虑以下载体:

[1] "1-1694429" "2-1546669" "3-928598"  "4-834486"  "5-802353"  "6-659439"  "7-552850" 
"8-516804"  "9-364061" 
[10] "10-354181" "11-335154" "12-257915" "13-251310" "14-232313" "15-217628" "16-216569"   

我正在尝试生成两个向量,每个向量包含通过分隔符“ - ”分割向量的每个元素而获得的值。

我用过:

f <- function(s) strsplit(s, "-")
cc<-sapply(names.reads, f)
  

头(cc)的       $ 1-1694429       [1]“1”“1694429”

$`2-1546669`

[1] "2"       "1546669"

我知道我可以访问它们:

> cc[[1]][1]
[1] "1"

> cc[[1]][2]
[1] "1694429"

我想有两个向量,每个向量包含存储在cc[[i]][1]cc[[i]][2]的值...我可以不使用循环吗? (我有超过100万个元素)

5 个答案:

答案 0 :(得分:20)

使用mathematical.coffee的建议,以下代码可以避免循环或sapply

names.reads <- c("1-1694429", "2-1546669", "3-928598", "4-834486", "5-802353",
              "6-659439",  "7-552850",  "8-516804", "9-364061", "10-354181",
              "11-335154", "12-257915", "13-251310", "14-232313", "15-217628",
              "16-216569")

cc       <- strsplit(names.reads,'-')
part1    <- unlist(cc)[2*(1:length(names.reads))-1]
part2    <- unlist(cc)[2*(1:length(names.reads))  ]

产生

> part1
 [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" "14" "15"
[16] "16"
> part2
 [1] "1694429" "1546669" "928598"  "834486"  "802353"  "659439"  "552850" 
 [8] "516804"  "364061"  "354181"  "335154"  "257915"  "251310"  "232313" 
[15] "217628"  "216569"

虽然它确实要求每个原始值都是预期的格式。

答案 1 :(得分:6)

另一种方法:

names.reads <- c("1-1694429", "2-1546669", "3-928598", "4-834486", "5-802353",
              "6-659439",  "7-552850",  "8-516804", "9-364061", "10-354181",
              "11-335154", "12-257915", "13-251310", "14-232313", "15-217628",
              "16-216569")

library(reshape2)
colsplit(string=names.reads, pattern="-", names=c("Part1", "Part2"))

   Part1   Part2
1      1 1694429
2      2 1546669
3      3  928598
4      4  834486
5      5  802353
6      6  659439
7      7  552850
8      8  516804
9      9  364061
10    10  354181
11    11  335154
12    12  257915
13    13  251310
14    14  232313
15    15  217628
16    16  216569

答案 2 :(得分:6)

使用T(为了完整性):

sapply()

正如@Bird在评论中指出的那样,y <- c("1-1694429", "2-1546669", "3-928598", "4-834486", "5-802353", "6-659439", "7-552850", "8-516804", "9-364061", "10-354181", "11-335154", "12-257915", "13-251310", "14-232313", "15-217628", "16-216569")参数可用于避免生成的向量中的名称。

USE.NAMES

x <- sapply(y, function(x) strsplit(x, "-")[[1]], USE.NAMES=FALSE)

a <- x[1,]

答案 3 :(得分:3)

或使用purrr包:

第1部分:

> map(strsplit(names.reads, "-"), ~.x[1]) %>% unlist()
[1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13"
[14] "14" "15" "16"

第2部分:

> map(strsplit(names.reads, "-"), ~.x[2]) %>% unlist()
[1] "1694429" "1546669" "928598"  "834486"  "802353"  "659439" 
[7] "552850"  "516804"  "364061"  "354181"  "335154"  "257915" 
[13] "251310"  "232313"  "217628"  "216569" 

答案 4 :(得分:2)

想要解决类似的问题,遇到了这篇文章。添加我的解决方案虽然我在未来遥遥领先! (从亨利那里复制代码)

names.reads <- c("1-1694429", "2-1546669", "3-928598", "4-834486", "5-802353",
          "6-659439",  "7-552850",  "8-516804", "9-364061", "10-354181",
          "11-335154", "12-257915", "13-251310", "14-232313", "15-217628",
          "16-216569")

require(plyr)
cc <- ldply(strsplit(names.reads, '-'))
cc$V1;cc$V2

生成一个数据框,可以从中提取与列表中每个项目的第n个元素相关的向量。