R:将数据帧转换为pseudoCSV

时间:2015-05-17 08:37:05

标签: r transformation

让我们有一个像这样的两列数据框:

A  1
A  2
A  4
A  5
B  2
B  13
C  1
C  3
C  6
C  18
D  8
E  2
E  112
...

R中是否有一种快速方法如何将其转换为这样的两列数据帧?

A  1;2;4;5
B  2;13
C  1;3;6;18
D  8
E  2;112

又如何将它重新放回第一个结构?

谢谢

2 个答案:

答案 0 :(得分:1)

base R选项(来自@David Arenburg的评论)

res1 <- aggregate(Col2 ~ Col1, df1, paste, collapse = ";")

或使用data.table

library(data.table)
res2 <- setDT(df1)[, list(Col2=paste(Col2, collapse=";")), Col1]

dplyr

library(dplyr)
res3 <- df1 %>%
           group_by(Col1) %>%
           summarise(Col2= paste(Col2, collapse=";") )

更新

将输出转换回原始结构

library(splitstackshape)
cSplit(res2, 'Col2', ';', 'long')

数据

df1 <- structure(list(Col1 = c("A", "A", "A", "A", "B", "B", "C", "C", 
"C", "C", "D", "E", "E"), Col2 = c(1L, 2L, 4L, 5L, 2L, 13L, 1L, 
3L, 6L, 18L, 8L, 2L, 112L)), .Names = c("Col1", "Col2"),
 class =     "data.frame", row.names = c(NA, -13L))

答案 1 :(得分:0)

paste()中使用带有collapse = ";"的{​​p> aggregate()来连接V2。要将其返回到原始结构,strsplit()用于在lapply()中拆分V2 - do.call()只是为了按顺序绑定结果列表。

df <- read.table(header = F, text = "
A  1
A  2
A  4
A  5
B  2
B  13
C  1
C  3
C  6
C  18
D  8
E  2
E  112")

df1 <- aggregate(df, by = list(df$V1), FUN = function(x) paste(x, collapse = ";"))[,-2]
names(df1) <- c("V1", "V2")
df1
#  V1       V2
#1  A  1;2;4;5
#2  B     2;13
#3  C 1;3;6;18
#4  D        8
#5  E    2;112

df <- do.call(rbind, lapply(unique(df1$V1), function(x) {
  df <- data.frame(x, strsplit(df1[df1$V1 == x, 2], ";"))
  names(df) <- c("V1", "V2")
  df
}))
df
#   V1  V2
#1   A   1
#2   A   2
#3   A   4
#4   A   5
#5   B   2
#6   B  13
#7   C   1
#8   C   3
#9   C   6
#10  C  18
#11  D   8
#12  E   2
#13  E 112
相关问题