Question

我有一个包含大量文件的目录;每个文件都具有相同的结构：

Nodes: 6606
Edges: 382386
Average degree: 115.76930063578565
Average clustering: 0.11213868344294504
Modularity: 0.6021229084216876
Giant component: 6598

使用list.files()函数，我读了目录的内容：

files <- list.files(path = "test", pattern = "netstat*", full.names = TRUE)

然后我使用lapply()函数将文件读入数据框列表：

data1 <- lapply(files, read.table, sep = ":", row.names = 1)

最后，我将列表转换为数据框并重命名行名称：

data2 <- t(do.call(data.frame, data1))
rownames(data2) <- 1:nrow(data)

最终数据如下：

> head(data2)
  Nodes  Edges Average degree Average clustering Modularity Giant component
1  6606 382386     115.769301         0.11213868  0.6021229            6598
2  5157  20292       7.869692         0.07020251  0.8195294            5125
3  5177  20148       7.783658         0.07640135  0.9030172            5102
4  5689  29559      10.391633         0.08480404  0.7104452            5626
5  5985  32086      10.722139         0.06803845  0.7189815            5938
6  5829  26449       9.074970         0.05963236  0.7061715            5770

我的问题：有更优雅的方式吗？特别是最后一个命令 - 我手动重命名行 - 在某种程度上不符合优雅的R编程。

Answer 1

我们可以使用fread阅读文件，并将list data.table个data.table转换为rbindlist个library(data.table) rbindlist(lapply(files, fread))

Acid Exposure (pH)      Total
            Total   Normal

        Clearance pH  :  Channel 7
        Number of Acid Episodes 26  
        Time    31.5 min    
        Percent Time    7.4%    
        Mean Acid Clearance Time    73 sec  
        Longest Episode 7.1 min

        Gastric pH  :  Channel 8
        Time pH<4.0 425.9 min



    Bolus Exposure (Impedance)      Total
            Total   Normal

        Acid Time   22.0 min    
        Acid Percent Time   5.2%    
        Nonacid Time    6.1 min 
        Nonacid Percent Time    1.4%    
        All Reflux Time 28.2 min    
        All Reflux Percent Time 6.6%    
        Median Bolus Clearance Time 16 sec  
        Longest Episode 7.8 min

将多个文件读入数据框

1 个答案: