您好我想汇总几个专栏。
d <- structure(list(Gene = structure(1:3, .Label = c("k141_20041_1",
"k141_27047_2", "k141_70_3"), class = "factor"), phylum = structure(c(1L,
1L, 1L), .Label = "Firmicutes", class = "factor"), class = structure(c(1L,
1L, 1L), .Label = "Bacillales", class = "factor"), order = structure(c(1L,
1L, 1L), .Label = "Bacilli", class = "factor"), family = structure(c(1L,
1L, 1L), .Label = "Bacillaceae", class = "factor"), genus = structure(c(1L,
1L, 1L), .Label = "Bacillus", class = "factor"), species = structure(c(1L,
1L, 2L), .Label = c("Bacillus subtilis", "unknown"), class = "factor"),
SampleA = c(0, 0, 0), SampleB = c(0, 0, 0), SampleCtrl = c(3.98888888888889,
11.5555555555556, 3.35978835978836)), .Names = c("Gene",
"phylum", "class", "order", "family", "genus", "species", "SampleA",
"SampleB", "SampleCtrl"), row.names = c(21918L, 40410L, 40857L
), class = "data.frame")
这在汇总的输入数据框中
Gene phylum class order family genus species SampleA SampleB
k141_20041_1 Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis 0 0
k141_27047_2 Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis 0 0
k141_70_3 Firmicutes Bacillales Bacilli Bacillaceae Bacillus unknown 0 0
SampleCtrl
3.99
11.56
3.36
最后我想要的是一条包含所有列的单行。在这种情况下,它看起来像这样(我们可以删除基因列)。
phylum class order family genus species SampleA SampleB SampleCtrl
Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis 0 0 15.6
Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus Unknown 0 0 3.36
请注意,这是一个非常简单的例子。我在原始数据帧中有20个样本和500多个种类。
答案 0 :(得分:0)
这是一个dplyr
解决方案:
library(dplyr)
d%>%
group_by(phylum,class,order,family,genus, species)%>%
summarise_if(is.numeric, sum)
Groups: phylum, class, order, family, genus [?]
phylum class order family genus species SampleA SampleB SampleCtrl
<fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <dbl> <dbl> <dbl>
1 Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis 0 0 15.54444
2 Firmicutes Bacillales Bacilli Bacillaceae Bacillus unknown 0 0 3.35979
答案 1 :(得分:0)
假设样本列是数字而其他列不是数据,并且所需的聚合是将每个样本列的分组与其他列(Gene除外)相加:
j <- which(sapply(d, is.numeric))
aggregate(d[j], d[-c(1, j)], sum)
,并提供:
phylum class order family genus species SampleA
1 Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis 0
2 Firmicutes Bacillales Bacilli Bacillaceae Bacillus unknown 0
SampleB SampleCtrl
1 0 15.544444
2 0 3.359788
另一种可能性是,如果示例列的名称中都有Sample
而其他列不是,请使用此列而不是上面的第一行:
j <- grep("Sample", names(d))
或者,如果我们知道样本列是最后3列那么上述两个假设都没有:
j <- seq(to = ncol(d), length = 3)
更新:已修复并添加了两个替代方案。