将多个.csv文件与tidyr函数结合使用时,需要将部分(或全部)列读取为.character

时间:2018-11-30 18:36:47

标签: r csv tidyr purrr rbind

我正在读取许多具有相同列名的大型.csv文件,并使用以下代码对它们进行行绑定(如https://serialmentor.com/blog/2016/6/13/reading-and-combining-many-tidy-data-files-in-R所示):

require(readr)  # for read_csv()
require(purrr)  # for map(), reduce()

# find all file names ending in .csv 
files <- dir(pattern = "*.csv")
files

data <- files %>%
  map(read_csv) %>%    # read in all the files individually, using
                   # the function read_csv() from the readr package
  reduce(rbind)        # reduce with rbind into one dataframe
data

但是,我的数据只有一列需要以.character格式读取,因为它包含以“,”分隔的数字字符串条目,否则read_csv将该列转换为没有逗号的数字。

我怎么

1。)指定仅以字符形式读入一列(最好按名称)?

2。)只需将所有列都读为字符?

第二个选项并不理想,因为那之后我不得不将许多列改回数字。

我尝试使用:

col_types = cols(.default = "c")

,如https://github.com/tidyverse/readr/issues/148https://github.com/tidyverse/readr/issues/292所述。

我的方法是这样的:

data <- files %>%
   map(read_csv( col_types = cols(.default = "c" ))) %>%
   reduce(rbind)   
data

但是,这不起作用,因为read_csv()要求输入'x'(即.csv文件路径)。它引发此错误:

Error in read_delimited(file, tokenizer, col_names = col_names, col_types = col_types,  : 
  argument "file" is missing, with no default

1 个答案:

答案 0 :(得分:0)

每个.csv文件的九个(或其他数字)列名称相同,只有两列(在本例中为“ start_scan”和“ end_scan”)将被读取为数字,其余所有将被读取为字符:

files <- dir(pattern = "*.csv")

metadata <- files %>%
  map_df(~read_csv(., col_types = cols(.default = "c", 
    scan_end = "n", scan_start = "n") ))