上载一个文件夹中的几个最新文件

时间:2019-01-16 01:38:19

标签: r

我知道如何使用mtime函数中的file.info列上传文件夹中的最新文件:

# Create the example data frame

Name <- c('AAA_2019_01_15.csv', 'AAA_2019_01_16.csv', 'AAA_2019_01_17.csv', 'BBB_2019_01_15.csv', 'BBB_2019_01_16.csv', 'BBB_2019_01_17.csv', 'CCC_2019_01_15.csv', 'CCC_2019_01_16.csv', 'CCC_2019_01_17.csv')
size <- as.numeric(1:9)
isdir <- rep(FALSE, 9)
mode <- rep(666, 9)
mtime <- as.POSIXct(c("2019-01-15 18:07:28", "2019-01-16 18:07:28", "2019-01-17 18:07:28", "2019-01-15 18:07:28", "2019-01-16 18:07:28", "2019-01-17 18:07:28", "2019-01-15 18:07:28", "2019-01-16 18:07:28", "2019-01-17 18:07:28"))
ctime <- as.POSIXct(c("2019-01-15 18:07:28", "2019-01-16 18:07:28", "2019-01-17 18:07:28", "2019-01-15 18:07:28", "2019-01-16 18:07:28", "2019-01-17 18:07:28", "2019-01-15 18:07:28", "2019-01-16 18:07:28", "2019-01-17 18:07:28"))
atime <- as.POSIXct(c("2019-01-15 18:07:28", "2019-01-16 18:07:28", "2019-01-17 18:07:28", "2019-01-15 18:07:28", "2019-01-16 18:07:28", "2019-01-17 18:07:28", "2019-01-15 18:07:28", "2019-01-16 18:07:28", "2019-01-17 18:07:28"))
exe <- rep("no", 9)
All_Files <- data.frame(size, isdir, mode, mtime, ctime, atime, exe)
All_Files$mode <- as.octmode(All_Files$mode)
rownames(All_Files) <- Name

# Upload the most recent file from the working directory

All_Files <- file.info(list.files(pattern = ".csv", full.names = TRUE), value = TRUE)
Most_Recent_File <- rownames(All_Files)[which.max(All_Files$mtime)]
Most_Recent_File <- read.table(Most_Recent_File, skip = 1, stringsAsFactors = F, sep = ",", na.strings = "NAN")

我想上传包含字符串"AAA"的最新文件,包含字符串"BBB"的最新文件和包含字符串{{1}的最新文件},使用"CCC"函数中的mtime列。

有没有一种方法,而无需为每个字符串单独执行上载步骤?例如,我可以创建一个字符向量file.info并使用它上载每种类型的最新文件吗?在现实生活中,我要上传的文件要多于3个,因此,一种有效的方式将不胜感激。谢谢!

2 个答案:

答案 0 :(得分:0)

可能有一种更优雅的处理方法,但这是一种使用tidyverse和regex模式的方法。

library(tidyverse)

files <- c('AAA_2019_01_15', 'AAA_2019_01_16', 'AAA_2019_01_17', 
                 'BBB_2019_01_15', 'BBB_2019_01_16', 'BBB_2019_01_17', 
                 'CCC_2019_01_15', 'CCC_2019_01_16', 'CCC_2019_01_17')

dates <- str_extract_all(files, pattern = "[0-9]{4}_[0-9]{2}_[0-9]{2}", simplify = T) %>% 
  lubridate::ymd()

file_type <- str_extract_all(files, pattern = "AAA|BBB|CCC", simplify = T)


tibble(file_type, dates) %>% 
  mutate(file_names = files) %>% 
  group_by(file_type) %>% 
  arrange(desc(dates)) %>% 
  filter(row_number() == 1)
#> # A tibble: 3 x 3
#> # Groups:   file_type [3]
#>   file_type[,1] dates      file_names    
#>   <chr>         <date>     <chr>         
#> 1 AAA           2019-01-17 AAA_2019_01_17
#> 2 BBB           2019-01-17 BBB_2019_01_17
#> 3 CCC           2019-01-17 CCC_2019_01_17

reprex package(v0.2.1)于2019-01-15创建

答案 1 :(得分:0)

我想出了办法:

首先,在想要获取最新文件的文件名中创建字符串的字符向量:

Unique_Character_Strings_of_Files_to_Upload <- c("AAA", "BBB", "CCC")

然后,生成文件信息的数据框:

All_Files <- file.info(list.files(pattern = ".csv", full.names = TRUE), value = TRUE)

然后,生成一个列表,其中列表的第一部分包含所有包含字符串“ AAA”的文件,第二部分包含所有包含“ BBB”的文件,第三部分包含所有包含“ CCC”的文件: / p>

List_of_Files <- lapply(Unique_Character_Strings_of_Files_to_Upload, function(x) {List_of_Files <- All_Files[grep(x, rownames(All_Files)), ]})
names(List_of_Files) <- Unique_Character_Strings_of_Files_to_Upload

然后,从列表的每个组件中获取最新文件:

Most_Recent_Station_Files <- lapply(List_of_Files, function(x) {return(x[which.max(x$mtime), ])})

然后,从此新列表中仅选择文件名:

List_of_Names_of_Files_to_Upload <- lapply(Most_Recent_Station_Files, function (x) rownames(x))

然后,将此新列表转换为字符向量:

Names_of_Files_to_Upload <- unlist(List_of_Names_of_Files_to_Upload, use.names = FALSE)

然后,上传所需的文件:

List_of_Files <- lapply(Names_of_Files_to_Upload, function(x) {read.table(x, skip = 1, stringsAsFactors = F, sep = ",", na.strings = "NAN")})
names(List_of_Files) <- Unique_Character_Strings_of_Files_to_Upload