我想在目录和排序中读取一组不等行的文本文件,将uniq它们组合为一个矩阵,文件名作为列标题
例如:
FILE1.TXT
ID COUNT
id1 3
id5 4
sample2.txt
ID COUNT
id1 5
id3 6
期望的输出:
ID file1 sample2 ....
id1 3 5
id5 4 NA
id3 NA 6
我在如何阅读文件和创建列表方面取得了一些成就,但却找到了独特的
files <- list.files(path=".", pattern="\\.txt")
samples <- list()
for (f in files) {
file <- read.table(f,header=F, sep="\t")
...
如何在文件列表中使用sapply来查找所有文件中的唯一行?
答案 0 :(得分:3)
library(reshape2)
# Read all the files into a list of data frames
df.list = lapply(files, function(file) {
dat = read.table(file, sep="\t")
dat$file = file
return(dat)
}
# Combine into a single data frame
df = do.call(rbind, df.list)
# Reshape from long to wide
df = dcast(df, ID ~ file)
答案 1 :(得分:0)
或者,如果您寻求表现:
library(data.table)
process = function(files){
files = setNames(files, substr(files, 1L, nchar(files) - 4L))
dt = rbindlist(lapply(files, fread), idcol = "file")
dcast(dt, ID ~ file, value.var = "COUNT")
}
files = list.files(path=".", pattern="\\.txt")
process(files)
# ID file1 sample2
#1: id1 3 5
#2: id3 NA 6
#3: id5 4 NA