读取多个文件并组合R中的唯一行

时间:2016-02-25 00:50:19

标签: r unique sapply

我想在目录和排序中读取一组不等行的文本文件,将uniq它们组合为一个矩阵,文件名作为列标题

例如:

FILE1.TXT

  ID    COUNT
  id1     3
  id5     4

sample2.txt

  ID    COUNT
  id1    5
  id3    6

期望的输出:

  ID  file1  sample2  ....  
  id1  3      5
  id5  4      NA  
  id3  NA     6

我在如何阅读文件和创建列表方面取得了一些成就,但却找到了独特的

   files <- list.files(path=".", pattern="\\.txt")
    samples <- list()
    for (f in files) {
            file <- read.table(f,header=F, sep="\t")
            ...

如何在文件列表中使用sapply来查找所有文件中的唯一行?

2 个答案:

答案 0 :(得分:3)

library(reshape2)

# Read all the files into a list of data frames
df.list = lapply(files, function(file) {
  dat = read.table(file, sep="\t")
  dat$file = file
  return(dat)
}

# Combine into a single data frame
df = do.call(rbind, df.list)

# Reshape from long to wide
df = dcast(df, ID ~ file)

答案 1 :(得分:0)

或者,如果您寻求表现:

library(data.table)
process = function(files){
    files = setNames(files, substr(files, 1L, nchar(files) - 4L))
    dt = rbindlist(lapply(files, fread), idcol = "file")
    dcast(dt, ID ~ file, value.var = "COUNT")
}
files = list.files(path=".", pattern="\\.txt")
process(files)
#    ID file1 sample2
#1: id1     3       5
#2: id3    NA       6
#3: id5     4      NA