从file_list中按列合并

时间:2016-03-20 09:00:25

标签: r

我在ini_setting

中有96个文件
file_list

它们都具有相同的列,但行数不同。示例文件:

file_list <- list.files(pattern = "*.mirna")

第二档

> head(test1)
                       seq          name freq             mir start end mism   add t5  t3       s5       s3    DB
1    TGGAGTGTGATAATGGTGTTT seq_100003_x4    4  hsa-miR-122-5p    15  35 11TC     0  0   g GCTGTGGA TTTGTGTC miRNA
2 TGTAAACATCCCCGACCGGAAGCT seq_100045_x4    4  hsa-miR-30d-5p     6  29 17CT     0  0  CT TTGTTGTA GAAGCTGT miRNA
3   CTAGACTGAAGCTCCTTGAAAA seq_100048_x4    4 hsa-miR-151a-3p    47  65    0 I-AAA  0  gg CCTACTAG GAGGACAG miRNA
4   AGGCGGAGACTTGGGCAATTGC seq_100059_x4    4   hsa-miR-25-5p    14  35    0     0  0   C TGAGAGGC ATTGCTGG miRNA
5    AAACCGTTACCATTACTGAAT seq_100067_x4    4    hsa-miR-451a    17  35    0  I-AT  0 gtt AAGGAAAC AGTTTAGT miRNA
6   TGAGGTAGTAGCTTGTGCTGTT seq_10007_x24   24   hsa-let-7i-5p     6  27 12CT     0  0   0 TGGCTGAG TGTTGGTC miRNA
     precursor ambiguity
1  hsa-mir-122         1
2  hsa-mir-30d         1
3 hsa-mir-151a         1
4   hsa-mir-25         1
5 hsa-mir-451a         1
6   hsa-let-7i         1

我想创建一个由> head(test2) seq name freq mir start end mism add t5 t3 s5 s3 DB 1 ATTGCACTTGTCCTGGCCTGT seq_1000013_x1 1 hsa-miR-92a-3p 49 69 14TC 0 t 0 AAAGTATT CTGTGGAA miRNA 2 AAACCGTTACTATTACTGAGA seq_1000094_x1 1 hsa-miR-451a 17 36 11TC I-A 0 tt AAGGAAAC AGTTTAGT miRNA 3 TGAGGTAGCAGATTGTATAGTC seq_1000169_x1 1 hsa-let-7f-5p 8 28 9CT I-C 0 t GGGATGAG AGTTTTAG miRNA 4 TGGGTCTTTGCGGGCGAGAT seq_100019_x12 12 hsa-miR-193a-5p 21 40 0 0 0 ga GGGCTGGG ATGAGGGT miRNA 5 TGAGGTAGTAGATTGTATAGTG seq_100035_x12 12 hsa-let-7f-5p 8 28 0 I-G 0 t GGGATGAG AGTTTTAG miRNA 6 TGAAGTAGTAGGTTGTGTGGTAT seq_1000437_x1 1 hsa-let-7b-5p 6 26 4AG I-AT 0 t GGGGTGAG GGTTTCAG miRNA precursor ambiguity 1 hsa-mir-92a-2 1 2 hsa-mir-451a 1 3 hsa-let-7f-2 1 4 hsa-mir-193a 1 5 hsa-let-7f-2 1 6 hsa-let-7b 1 mir列组成的唯一ID:

HSA-的miR-122-5p_TGGAGTGTGATAATGGTGTTT

然后我想合并所有基于此ID的96个文件,并从每个文件的列seq开始。

freq

如果特定文件中不存在ID,则ID freq_file1 freq_file2 ... hsa-miR-122-5p_TGGAGTGTGATAATGGTGTTT 4 12 应为NA

1 个答案:

答案 0 :(得分:2)

我们可以Reduce merge list data.frame使用lst <- lapply(mget(ls(pattern="test\\d+")), function(x) subset(transform(x, ID=paste(precursor, seq)), select=c("ID", "freq"))) Reduce(function(...) merge(..., by = "ID"), lst)

list

注意:在上面,我假设已经通过读取'file_list'中的文件在全局环境中创建了“test1”,“test2”对象。如果没有,我们可以直接将文件读入 library(data.table) lst <- lapply(file_list, function(x) fread(x, select=c("precursor", "seq", "freq"))[, list(ID=paste(precursor, seq), freq=freq)]) Reduce(function(x,y) x[y, on = "ID"], lst) ,而不是创建额外的data.frame对象,即

fread

或者代替data.table(来自read.csv/read.table)使用merge并像以前一样使用{{1}}来'lst'

相关问题