按字符聚合列

时间:2017-04-18 11:59:49

标签: r aggregate taxonomy

您好我想汇总几个专栏。

d <- structure(list(Gene = structure(1:3, .Label = c("k141_20041_1", 
"k141_27047_2", "k141_70_3"), class = "factor"), phylum = structure(c(1L, 
1L, 1L), .Label = "Firmicutes", class = "factor"), class = structure(c(1L, 
1L, 1L), .Label = "Bacillales", class = "factor"), order = structure(c(1L, 
1L, 1L), .Label = "Bacilli", class = "factor"), family = structure(c(1L, 
1L, 1L), .Label = "Bacillaceae", class = "factor"), genus = structure(c(1L, 
1L, 1L), .Label = "Bacillus", class = "factor"), species = structure(c(1L, 
1L, 2L), .Label = c("Bacillus subtilis", "unknown"), class = "factor"), 
    SampleA = c(0, 0, 0), SampleB = c(0, 0, 0), SampleCtrl = c(3.98888888888889, 
    11.5555555555556, 3.35978835978836)), .Names = c("Gene", 
"phylum", "class", "order", "family", "genus", "species", "SampleA", 
"SampleB", "SampleCtrl"), row.names = c(21918L, 40410L, 40857L
), class = "data.frame")

这在汇总的输入数据框中

   Gene     phylum      class   order      family    genus           species SampleA SampleB
k141_20041_1 Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis       0       0
k141_27047_2 Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis       0       0
k141_70_3 Firmicutes Bacillales Bacilli Bacillaceae Bacillus           unknown       0       0
  SampleCtrl
  3.99
 11.56
  3.36

最后我想要的是一条包含所有列的单行。在这种情况下,它看起来像这样(我们可以删除基因列)。

    phylum   class order  family  genus  species SampleA SampleB SampleCtrl
    Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis       0       0     15.6
    Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus Unknown       0       0     3.36

请注意,这是一个非常简单的例子。我在原始数据帧中有20个样本和500多个种类。

2 个答案:

答案 0 :(得分:0)

这是一个dplyr解决方案:

library(dplyr)
d%>%
group_by(phylum,class,order,family,genus, species)%>%
summarise_if(is.numeric, sum)    
Groups: phylum, class, order, family, genus [?]

      phylum      class   order      family    genus           species SampleA SampleB SampleCtrl
      <fctr>     <fctr>  <fctr>      <fctr>   <fctr>            <fctr>   <dbl>   <dbl>      <dbl>
1 Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis       0       0   15.54444
2 Firmicutes Bacillales Bacilli Bacillaceae Bacillus           unknown       0       0    3.35979

答案 1 :(得分:0)

假设样本列是数字而其他列不是数据,并且所需的聚合是将每个样本列的分组与其他列(Gene除外)相加:

j <- which(sapply(d, is.numeric))
aggregate(d[j], d[-c(1, j)], sum)

,并提供:

      phylum      class   order      family    genus           species SampleA
1 Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis       0
2 Firmicutes Bacillales Bacilli Bacillaceae Bacillus           unknown       0
  SampleB SampleCtrl
1       0  15.544444
2       0   3.359788

另一种可能性是,如果示例列的名称中都有Sample而其他列不是,请使用此列而不是上面的第一行:

j <- grep("Sample", names(d))

或者,如果我们知道样本列是最后3列那么上述两个假设都没有:

j <- seq(to = ncol(d), length = 3)

更新:已修复并添加了两个替代方案。