阅读大txt文件的特殊斯堪的纳维亚字符的问题

时间:2017-09-26 12:47:47

标签: r encoding special-characters readr

我正在阅读包含瑞典字母åäö使用(readr)和(Laf)包的1 gb txt文件

my.data<- read_fwf('my.file', fwf_widths (c(2,2,2,8,2,4,40,1,1,10,10,4,2,11,32,1,4)),
progress = interactive())

获取价值而不是öäå

有什么建议吗?

1 个答案:

答案 0 :(得分:0)

斯堪的纳维亚字符有时需要特殊的编码,即#ISO-8859-1 - Latin 1.以下是有关编码的有用命令(对于每个计算/操作系统而言都不同)

#Check what is your present encoding:
getOption("encoding")
#Modifiy the encoding
options(encoding = "ISO-8859-1")

#Optional: Check your system locale (Time, Monetary System, etc)
Sys.getlocale()
#Set an encoding, but it is OS-dependent what is installed
Sys.setlocale("LC_ALL", locale ="swe")

来自readr文档:“默认编码。这仅影响文件的读取方式 - 读取器始终将输出转换为UTF-8。”使用read_fwf,您似乎可以指定区域设置/编码:

read_fwf(file, col_positions, col_types = NULL, locale = default_locale(),
na = c("", "NA"), comment = "", skip = 0, n_max = Inf,
guess_max = min(n_max, 1000), progress = show_progress())

default_locale()替换为locale(encoding= "latin1")。您的示例不可再现,因此很难测试它是否有效,但它应该如下所示:

my.data<- read_fwf('my.file', fwf_widths (c(2,2,2,8,2,4,40,1,1,10,10,4,2,11,32,1,4)), locale= locale(encoding= "latin1"),
progress = interactive())