R - 一次读取多个数据文件

时间:2018-04-04 13:57:37

标签: r data.table lapply

我正在研究R中的并行处理,并且想知道我是否可以并行读取多个txt文件而不是顺序读取。原因是我有一个闪亮的应用程序,我想减少加载时间,加载文件会产生一大块。

现状:

Shipments_Raw <- read.delim('/srv/samba/share/SAP data/_zmrosales_ship_month.txt', fill = TRUE)
ShipmentsYear_Raw <- read.delim('/srv/samba/share/SAP data/_zmrosales_ship_year.txt', fill = TRUE)
Open_Raw <- read.delim('/srv/samba/share/SAP data/_zmrosales_wip.txt', fill = TRUE)
WIP_Short_Raw <- read.delim('/srv/samba/share/SAP data/_zmro_short.txt', fill = TRUE)
WIP_RTQT_Raw <- read.delim('/srv/samba/share/SAP data/_zmro_sno_tasks_year.txt', fill = TRUE)
Invoiced_Raw <- read.delim('/srv/samba/share/SAP data/_zmrosales_inv.txt', fill = TRUE)

我已经看到了并行运行的示例,但它们都以组合所有文件结束。我导入的每个文件,我想作为一个单独的数据帧。

以下是一些例子:

How do you read in multiple .txt files into R?

https://www.r-bloggers.com/import-all-text-files-in-a-folder-with-parallel-execution/

理想情况(虽然我知道这不是代码):

RunParallel {
    Shipments_Raw <- read.delim('/srv/samba/share/SAP data/_zmrosales_ship_month.txt', fill = TRUE)
    ShipmentsYear_Raw <- read.delim('/srv/samba/share/SAP data/_zmrosales_ship_year.txt', fill = TRUE)
    Open_Raw <- read.delim('/srv/samba/share/SAP data/_zmrosales_wip.txt', fill = TRUE)
    WIP_Short_Raw <- read.delim('/srv/samba/share/SAP data/_zmro_short.txt', fill = TRUE)
    WIP_RTQT_Raw <- read.delim('/srv/samba/share/SAP data/_zmro_sno_tasks_year.txt', fill = TRUE)
    Invoiced_Raw <- read.delim('/srv/samba/share/SAP data/_zmrosales_inv.txt', fill = TRUE)
}

在下面发表评论后

    tic <- Sys.time()
Shipments_Raw <- read.delim('/srv/samba/share/SAP data/_zmrosales_ship_month.txt', fill = TRUE)
ShipmentsYear_Raw <- read.delim('/srv/samba/share/SAP data/_zmrosales_ship_year.txt', fill = TRUE)
Open_Raw <- read.delim('/srv/samba/share/SAP data/_zmrosales_wip.txt', fill = TRUE)
WIP_Short_Raw <- read.delim('/srv/samba/share/SAP data/_zmro_short.txt', fill = TRUE)
WIP_RTQT_Raw <- read.delim('/srv/samba/share/SAP data/_zmro_sno_tasks_year.txt', fill = TRUE)
Invoiced_Raw <- read.delim('/srv/samba/share/SAP data/_zmrosales_inv.txt', fill = TRUE)
toc <- Sys.time()
Sequential <- toc - tic


tic <- Sys.time()
file <- c("/srv/samba/share/SAP data//_zmrosales_ship_month.txt", 
          "/srv/samba/share/SAP data//_zmrosales_ship_year.txt", 
          "/srv/samba/share/SAP data//_zmrosales_inv.txt",
          "/srv/samba/share/SAP data//_zmrosales_wip.txt",
          "/srv/samba/share/SAP data//_zmro_short.txt",
          "/srv/samba/share/SAP data//_zmro_sno_tasks_year.txt")
x2 <- lapply(file, data.table::fread)

Shipments_Raw <- as.data.frame(x2[1])
ShipmentsYear_Raw <- as.data.frame(x2[2])
Invoiced_Raw <- as.data.frame(x2[3])
Open_Raw <- as.data.frame(x2[4])
WIP_Short_Raw <- as.data.frame(x2[5])
WIP_RTQT_Raw <- as.data.frame(x2[6])

toc <- Sys.time()
Lapply <- toc - tic

Sequential
Lapply

时间差异:

> Sequential
Time difference of 6.011156 secs
> Lapply
Time difference of 0.8015034 secs

1 个答案:

答案 0 :(得分:1)

只需将lapplydata.table超级快fread结合使用:

lapply(files, data.table::fread)
相关问题