让我们有一个像这样的两列数据框:
A 1
A 2
A 4
A 5
B 2
B 13
C 1
C 3
C 6
C 18
D 8
E 2
E 112
...
R中是否有一种快速方法如何将其转换为这样的两列数据帧?
A 1;2;4;5
B 2;13
C 1;3;6;18
D 8
E 2;112
又如何将它重新放回第一个结构?
谢谢
答案 0 :(得分:1)
base R
选项(来自@David Arenburg的评论)
res1 <- aggregate(Col2 ~ Col1, df1, paste, collapse = ";")
或使用data.table
library(data.table)
res2 <- setDT(df1)[, list(Col2=paste(Col2, collapse=";")), Col1]
或dplyr
library(dplyr)
res3 <- df1 %>%
group_by(Col1) %>%
summarise(Col2= paste(Col2, collapse=";") )
将输出转换回原始结构
library(splitstackshape)
cSplit(res2, 'Col2', ';', 'long')
df1 <- structure(list(Col1 = c("A", "A", "A", "A", "B", "B", "C", "C",
"C", "C", "D", "E", "E"), Col2 = c(1L, 2L, 4L, 5L, 2L, 13L, 1L,
3L, 6L, 18L, 8L, 2L, 112L)), .Names = c("Col1", "Col2"),
class = "data.frame", row.names = c(NA, -13L))
答案 1 :(得分:0)
paste()
中使用带有collapse = ";"
的{p> aggregate()
来连接V2。要将其返回到原始结构,strsplit()
用于在lapply()
中拆分V2 - do.call()
只是为了按顺序绑定结果列表。
df <- read.table(header = F, text = "
A 1
A 2
A 4
A 5
B 2
B 13
C 1
C 3
C 6
C 18
D 8
E 2
E 112")
df1 <- aggregate(df, by = list(df$V1), FUN = function(x) paste(x, collapse = ";"))[,-2]
names(df1) <- c("V1", "V2")
df1
# V1 V2
#1 A 1;2;4;5
#2 B 2;13
#3 C 1;3;6;18
#4 D 8
#5 E 2;112
df <- do.call(rbind, lapply(unique(df1$V1), function(x) {
df <- data.frame(x, strsplit(df1[df1$V1 == x, 2], ";"))
names(df) <- c("V1", "V2")
df
}))
df
# V1 V2
#1 A 1
#2 A 2
#3 A 4
#4 A 5
#5 B 2
#6 B 13
#7 C 1
#8 C 3
#9 C 6
#10 C 18
#11 D 8
#12 E 2
#13 E 112