循环非制表符分隔的文件

时间:2019-06-05 10:11:57

标签: r

我有几个非制表符分隔的文件。我想将它们合并,并创建一个包含有关所有文件的某些信息的文件。

我已经尝试过此代码,但是当我对

使用循环时无法正常工作

原始文件就像

Warning: Output file '02-MappedReads_HISAT2/sam_folder/SAMPLE01_unsorted_sample.sam' was specified without -S.  This will not work in future HISAT 2 versions.  Please use -S instead.
9437 reads; of these:
  9437 (100.00%) were paired; of these:
    310 (3.28%) aligned concordantly 0 times
    8977 (95.13%) aligned concordantly exactly 1 time
    150 (1.59%) aligned concordantly >1 times
    ----
    310 pairs aligned concordantly 0 times; of these:
      13 (4.19%) aligned discordantly 1 time
    ----
    297 pairs aligned 0 times concordantly or discordantly; of these:
      594 mates make up the pairs; of these:
        306 (51.52%) aligned 0 times
        282 (47.47%) aligned exactly 1 time
        6 (1.01%) aligned >1 times
98.38% overall alignment rate

所以我使用read.table功能读取文件:

(report_sample <- read.table(paste0(mapping_Folder, '/', 'SAMPLE01_summary.txt'), header = F, as.is = T, fill = TRUE, sep = ' ', skip = 1, blank.lines.skip = TRUE, text = TRUE))

(final <- data.frame('samples' = samples['1',1], 'Input_Read_Pairs' = report_sample[1,1], 'Mapped_reads' = report_sample[2,3], 'Mapped_reads_%' = report_sample[2,4], 'reads_unmapped' = report_sample[3,5], 'reads_unmapped_%' = report_sample[3,6], 'reads_uniquely_mapped' = report_sample[4,5], 'reads_uniquely_mapped_%' = report_sample[4,6]))

所以输出是这样的 样本Input_Read_Pairs Mapped_reads Mapped_reads_。 reads_unmapped reads_unmapped_。 reads_uniquely_mapped reads_uniquely_mapped_。 1个样本01 9437 9437(100.00%)310(3.28%)8977(95.13%)

我只使用一个文件就可以了。如果我使用for循环效果不好

所以我使用read.table功能读取文件:

(report_sample <- read.table(paste0(mapping_Folder, '/', 'SAMPLE01_summary.txt'), header = F, as.is = T, fill = TRUE, sep = ' ', skip = 1, blank.lines.skip = TRUE, text = TRUE))

(final <- data.frame('samples' = samples['1',1], 'Input_Read_Pairs' = report_sample[1,1], 'Mapped_reads' = report_sample[2,3], 'Mapped_reads_%' = report_sample[2,4], 'reads_unmapped' = report_sample[3,5], 'reads_unmapped_%' = report_sample[3,6], 'reads_uniquely_mapped' = report_sample[4,5], 'reads_uniquely_mapped_%' = report_sample[4,6]))

所以输出是这样的

samples Input_Read_Pairs Mapped_reads Mapped_reads_. reads_unmapped reads_unmapped_. reads_uniquely_mapped reads_uniquely_mapped_.
1 SAMPLE01             9437         9437      (100.00%)            310          (3.28%)                  8977                (95.13%)

我只使用一个文件就可以了。如果我使用for循环效果不好

 report_sample <- array(dim = 0)
    for (i in samples[,1]) {
        report_sample[i] <- read.table(paste0(mapping_Folder, '/', i,'_summary.txt'), header = F, as.is = T, fill = TRUE, sep = ' ', skip = 1, blank.lines.skip = TRUE, text = TRUE, )
    }
    final <- data.frame('samples' = samples['1',1], 'Input_Read_Pairs' = report_sample[1,1], 'Mapped_reads' = report_sample[2,3], 'Mapped_reads_%' = report_sample[2,4], 'reads_unmapped' = report_sample[3,5], 'reads_unmapped_%' = report_sample[3,6], 'reads_uniquely_mapped' = report_sample[4,5], 'reads_uniquely_mapped_%' = report_sample[4,6])
$SAMPLE01
 [1] "9437"          ""              ""              ""              ""              ""              ""              "these:"       
 [9] ""              "time"          ""              ""              "discordantly;" ""              "pairs;"        ""             
[17] "0"             ""              "exactly"       ""              ">1"            "98.38%"       

$SAMPLE02
 [1] "9437"          ""              ""              ""              ""              ""              ""              "these:"       
 [9] ""              "time"          ""              ""              "discordantly;" ""              "pairs;"        ""             
[17] "0"             ""              "exactly"       ""              ">1"            "98.38%"       

$SAMPLE03
 [1] "9437"          ""              ""              ""              ""              ""              ""              "these:"       
 [9] ""              "time"          ""              ""              "discordantly;" ""              "pairs;"        ""             
[17] "0"             ""              "exactly"       ""              ">1"            "98.43%"       

1 个答案:

答案 0 :(得分:0)

您的示例不是100%可重现的(samples是什么?),所以我估算一下。

TidyTable <- function(x) {
  final <- data.frame('Input_Read_Pairs' = x[1,1], # add you "samples" before that
                      'Mapped_reads' = x[2,3], 
                      'Mapped_reads_%' = x[2,4], 
                      'reads_unmapped' = x[3,5], 
                      'reads_unmapped_%' = x[3,6], 
                      'reads_uniquely_mapped' = x[4,5], 
                      'reads_uniquely_mapped_%' = x[4,6])
  return(final)
}

report_sample <- list()
for (i in 1:3) { # change this to your "samples"
  report_sample[[i]] <- read.table(paste0(mapping_Folder, '/', "output", i,".txt"), 
                             header = F, as.is = T, fill = TRUE, sep = ' ', 
                             skip = 1, blank.lines.skip = TRUE, text = TRUE, )
}

df <- lapply(report_sample, FUN = function(x) TidyTable(x))
do.call("rbind", df)