R:从列表中的数据帧中提取行并拆分成新的数据帧

时间:2015-03-09 00:10:48

标签: r list dataframe

我有一个包含3个数据框(DvE, DvS, EvS)的列表:

str(Table.list2)
List of 3
 $ DvE:'data.frame':    18482 obs. of  4 variables:
  ..$ gene      : Factor w/ 18482 levels "c10000_g1_i3|m.32237",..: 1 2 3 4 5 6 7 8 9 10 ...
  ..$ FDR       : num [1:18482] 0.502 0.982 0.936 0.411 0.461 ...
  ..$ log2FC    : num [1:18482] 0.415 -0.245 0.728 -0.384 0.474 ...
  ..$ annotation: Factor w/ 4939 levels "","[Genbank](myosin heavy-chain) kinase [Calothrix sp. PCC 6303] ",..: 1 2204 2980 2204 1 2204 4622 2980 1 241 ...
 $ DvS:'data.frame':    18482 obs. of  4 variables:
  ..$ gene      : Factor w/ 18482 levels "c10000_g1_i3|m.32237",..: 1 2 3 4 5 6 7 8 9 10 ...
  ..$ FDR       : num [1:18482] 1.25e-01 7.18e-01 2.02e-01 2.72e-13 6.02e-01 ...
  ..$ log2FC    : num [1:18482] -0.417 0.583 2.148 1.689 -0.167 ...
  ..$ annotation: Factor w/ 4939 levels "","[Genbank](myosin heavy-chain) kinase [Calothrix sp. PCC 6303] ",..: 1 2204 2980 2204 1 2204 4622 2980 1 241 ...
 $ EvS:'data.frame':    18482 obs. of  4 variables:
  ..$ gene      : Factor w/ 18482 levels "c10000_g1_i3|m.32237",..: 1 2 3 4 5 6 7 8 9 10 ...
  ..$ FDR       : num [1:18482] 1.78e-03 6.04e-01 4.09e-01 3.42e-19 3.20e-02 ...
  ..$ log2FC    : num [1:18482] -0.832 0.828 1.42 2.073 -0.641 ...
  ..$ annotation: Factor w/ 4939 levels "","[Genbank](myosin heavy-chain) kinase [Calothrix sp. PCC 6303] ",..: 1 2204 2980 2204 1 2204 4622 2980 1 241 ...

所有3个数据帧具有相似的结构,例如:

> head(Table.list2$DvE)
                  gene       FDR     log2FC                               annotation
1 c10000_g1_i3|m.32237 0.5024600  0.4149066                                         
2 c10000_g1_i4|m.32240 0.9818297 -0.2449509 [Pfam]Calcium-activated chloride channel
3 c10000_g1_i4|m.32242 0.9361868  0.7277203                         [Pfam]LSM domain
4 c10000_g1_i5|m.32244 0.4114795 -0.3835745 [Pfam]Calcium-activated chloride channel
5 c10000_g1_i6|m.32245 0.4605157  0.4739777                                         
6 c10000_g1_i6|m.32246 0.4965353 -0.4607749 [Pfam]Calcium-activated chloride channel

我想要做的是在每个数据框中,取出具有FDR < 0.05log2FC > 0的数据并输入新的数据框,然后取出{{1}的数据}和FDR < 0.05并放入另一个数据框。

因此,从3个数据帧的列表中,我将获得6个名为的新数据帧:

log2FC < 0

DvE.+ DvE.- DvS.+ DvS.- EvS.+ EvS.- 的输出示例:

DvE.+

我想知道是否有更优雅的方式/循环可以完成所有这些而不是反复写出类似的命令行?

更新

我试过这样做:

                    gene          FDR    log2FC                                                                   annotation
47  c10010_g1_i4|m.32346 8.609296e-15 1.9188013                  [Genbank]conserved unknown protein [Ectocarpus siliculosus]
48  c10010_g1_i4|m.32348 5.625766e-09 1.8240089           [Genbank]hypothetical protein THAOC_07134 [Thalassiosira oceanica]
155 c10037_g1_i4|m.32582 2.666894e-02 0.6669399                                                     [Pfam]LETM1-like protein
211 c10050_g2_i2|m.32706 8.154555e-03 1.6900611 [Genbank]hypothetical protein SELMODRAFT_84252 [Selaginella moellendorffii] 
243 c10057_g1_i1|m.32812 1.936893e-02 0.8141790                                     [Pfam]Fibrinogen alpha/beta chain family
265 c10061_g4_i2|m.32861 3.614401e-02 1.7059034                                                         [Pfam]Maf1 regulator

但我收到了这个错误:

  

警告讯息:
      1:在assign(paste(i,“。+”,sep =“”),value = pos)中:         只有第一个元素用作变量名       2:在assign(paste(i,“.-”,sep =“”),value = neg)中:         只有第一个元素用作变量名       3:在assign(paste(i,“。+”,sep =“”),value = pos)中:         只有第一个元素用作变量名       4:在assign(paste(i,“.-”,sep =“”),value = neg)中:         只有第一个元素用作变量名       5:在assign(paste(i,“。+”,sep =“”),value = pos)中:         只有第一个元素用作变量名       6:在assign(paste(i,“.-”,sep =“”),value = neg)中:         只有第一个元素用作变量名

1 个答案:

答案 0 :(得分:0)

   Not tested: 
   listdf<-list(DvE, DvS, EvS)
   library(dplyr) # filtering the data
  alldf<-lapply(listdf, function(i) { # Each list contains two filtered dataframes
    df1<-filter(i,FDR < 0.05 & log2FC > 0) # dfs have not been properly named here 
    df2<-filter(i,FDR < 0.05 & log2FC < 0)
    list(df1,df2)
   }