将列元素分成3个单独的列(R)

时间:2014-08-08 00:06:54

标签: r

我有一个数据框(theData),其中的值由管道分隔:

Col1  Col2     Col3 
1     colors   red|green|purple
1     colors   red|pink|yellow
1     colors   yellow|mauve|purple
1     colors   red|green|orange
1     colors   red|yellow|purple
1     colors   red|green|purple

我想将Col3分成这样的附加列:

Col1     Col2        Col3                    Col4      Col5  
1       colors      red                     green     purple
1       colors      red                     pink      yellow
1       colors      yellow                  mauve     purple
1       colors      red                     green     orange
1       colors      red                     yellow    purple
1       colors      red                     green     purple

我尝试了以下内容:

str_split_fixed(as.character(theData$Col3), "|", 3)

但这不起作用。

4 个答案:

答案 0 :(得分:2)

My cSplit function很容易处理这类问题。

cSplit(theData, "Col3", "|")
#    Col1   Col2 Col3_1 Col3_2 Col3_3
# 1:    1 colors    red  green purple
# 2:    1 colors    red   pink yellow
# 3:    1 colors yellow  mauve purple
# 4:    1 colors    red  green orange
# 5:    1 colors    red yellow purple
# 6:    1 colors    red  green purple

结果是data.table,因为该函数使用" data.table"它提供的效率包,特别是对于更大的数据集。

答案 1 :(得分:1)

您还可以尝试colsplit

中的reshape
  library(reshape)
  cbind(theData[,1:2],
     colsplit(theData$Col3, "[|]", names=c("Col3", "Col4", "Col5")))
  #  Col1   Col2   Col3   Col4   Col5
 #1    1 colors    red  green purple
 #2    1 colors    red   pink yellow
 #3    1 colors yellow  mauve purple
 #4    1 colors    red  green orange
 #5    1 colors    red yellow purple
 #6    1 colors    red  green purple

或者只使用read.table

   cbind(theData[,1:2],
         setNames(read.table(text=theData$Col3,sep="|",header=F,stringsAsFactors=F),paste0("Col",3:5)))

答案 2 :(得分:1)

再添加一个选项。看你是否喜欢。这是Hadley的tidyr套餐。代码很干净。

> library(tidyr)
> test <- data.frame(Col3 = c("red|green|purple", "red|pink|yellow"))
> test
Source: local data frame [2 x 1]

              Col3
1 red|green|purple
2  red|pink|yellow

> test %>% separate(Col3, c("A", "B", "C"), sep = "\\|")
Source: local data frame [2 x 3]

    A     B      C
1 red green purple
2 red  pink yellow

答案 3 :(得分:0)

您只需将|[]一起打包,或使用\\|将其转义。这似乎是mapply的工作。

> m <- mapply(strsplit, dat$Col3, split = "[|]", USE.NAMES = FALSE)
> setNames(cbind(dat[-3], do.call(rbind, m)), paste0("Col", 1:5))
#   Col1   Col2   Col3   Col4   Col5
# 1    1 colors    red  green purple
# 2    1 colors    red   pink yellow
# 3    1 colors yellow  mauve purple
# 4    1 colors    red  green orange
# 5    1 colors    red yellow purple
# 6    1 colors    red  green purple

使用您str_split_fixed的尝试,只需要稍加改动,

> library(stringr)
> cbind(dat[-3], str_split_fixed(dat$Col3, "[|]", 3))