我正在读取许多具有相同列名的大型.csv文件,并使用以下代码对它们进行行绑定(如https://serialmentor.com/blog/2016/6/13/reading-and-combining-many-tidy-data-files-in-R所示):
require(readr) # for read_csv()
require(purrr) # for map(), reduce()
# find all file names ending in .csv
files <- dir(pattern = "*.csv")
files
data <- files %>%
map(read_csv) %>% # read in all the files individually, using
# the function read_csv() from the readr package
reduce(rbind) # reduce with rbind into one dataframe
data
但是,我的数据只有一列需要以.character格式读取,因为它包含以“,”分隔的数字字符串条目,否则read_csv将该列转换为没有逗号的数字。
我怎么
1。)指定仅以字符形式读入一列(最好按名称)?
或
2。)只需将所有列都读为字符?
第二个选项并不理想,因为那之后我不得不将许多列改回数字。
我尝试使用:
col_types = cols(.default = "c")
,如https://github.com/tidyverse/readr/issues/148和https://github.com/tidyverse/readr/issues/292所述。
我的方法是这样的:
data <- files %>%
map(read_csv( col_types = cols(.default = "c" ))) %>%
reduce(rbind)
data
但是,这不起作用,因为read_csv()要求输入'x'(即.csv文件路径)。它引发此错误:
Error in read_delimited(file, tokenizer, col_names = col_names, col_types = col_types, :
argument "file" is missing, with no default
答案 0 :(得分:0)
每个.csv文件的九个(或其他数字)列名称相同,只有两列(在本例中为“ start_scan”和“ end_scan”)将被读取为数字,其余所有将被读取为字符:
files <- dir(pattern = "*.csv")
metadata <- files %>%
map_df(~read_csv(., col_types = cols(.default = "c",
scan_end = "n", scan_start = "n") ))