合并多个文件中的列

时间:2016-06-19 18:21:57

标签: r

我正在尝试将列组合成多个文件,但是我收到的错误消息会影响我的某些文件的合并。我不确定错误发生在哪里,任何想法?

file_list <- list.files(pattern = "*.mirna")

library(data.table)
lst <- lapply(file_list, function(x) 
  fread(x, select=c("mir", "seq", "freq","mism","add","t5","t3"))[, 
                                                                  list(ID=paste(mir, seq, mism,add,t5,t3), freq=freq)])
miraligner <- as.data.frame(Reduce(function(x,y) x[y, on = "ID"], lst))
head(miraligner)
Warning messages:
1: In fread(x, select = c("mir", "seq", "freq", "mism", "add", "t5",  :
  Bumped column 9 to type character on data row 6, field contains 'g'. Coercing 
  previously read values in this column from logical, integer or numeric back to 
  character which may not be lossless; e.g., if '00' and '000' occurred before they 
  will now be just '0', and there may be inconsistencies with treatment of ',,' and 
  ',NA,' too (if they occurred in this column before the bump). If this matters please 
  rerun and set 'colClasses' to 'character' for this column. Please note that column 
  type detection uses the first 5 rows, the middle 5 rows and the last 5 rows, so 
  hopefully this message should be very rare. If reporting to datatable-help, please 
  rerun and include the output from verbose=TRUE.
2: In fread(x, select = c("mir", "seq", "freq", "mism", "add", "t5",  :

Bumped column 9 to type character on data row 16, field contains 't'. Coercing 
  previously read values in this column from logical, integer or numeric back to 
  character which may not be lossless; e.g., if '00' and '000' occurred before they 
  will now be just '0', and there may be inconsistencies with treatment of ',,' and 
  ',NA,' too (if they occurred in this column before the bump). If this matters please 
  rerun and set 'colClasses' to 'character' for this column. Please note that column 
  type detection uses the first 5 rows, the middle 5 rows and the last 5 rows, so 
  hopefully this message should be very rare. If reporting to datatable-help, please 
  rerun and include the output from verbose=TRUE.

我的文件如下:

> head(Xfile)
                           seq          name freq             mir start end mism   add t5 t3       s5       s3    DB     precursor ambiguity
1        AACTGGTTGAACAACTGAACC seq_100018_x3    3  hsa-miR-582-3p    54  74    0     0  t  0 ATTGTAAC AACCCAAA miRNA   hsa-mir-582         1
2       TAGCACCATTTGAAATCAGTGT seq_10002_x43   43  hsa-miR-29b-3p    52  73    0     0  0  t TATCTAGC TGTTTTAG miRNA hsa-mir-29b-2         1
3 TGAGTGTGTGTGTGTGAGTGTGTGTTTT seq_100046_x3    3  hsa-miR-574-5p    25  49    0 I-TTT  0 GT CGTGTGAG GTGTGTCG miRNA   hsa-mir-574         1
4        GTCATACACGGCTCTCCTCTC seq_100072_x3    3  hsa-miR-485-3p    46  66    0     0  0  t GCGAGTCA CTCTTTTA miRNA   hsa-mir-485         1
5      CTGGACTTGGAGTCAGAAGGCAC seq_100077_x3    3 hsa-miR-378a-3p    44  64    0  I-AC  a  0 TAGCACTG   AGGCCT miRNA  hsa-mir-378a         1
6      TAACACTGTCTGGTAACGATGGT seq_100080_x3    3 hsa-miR-200a-3p    54  74    0  I-GT  0  t ACTCTAAC ATGTTCAA miRNA  hsa-mir-200a         1

1 个答案:

答案 0 :(得分:1)

你不必关心这个。

您的第9列(t5)包含0或字母。 fread尝试根据少数记录自动转换变量的类型(5)。

对于那些5个第一个记录仅包含0的文件,它将autoguess作为数字。比如当遇到“t”或“a”时,它会切换到字符,足以告诉你。