如何将带有未加引号的新行\ n字符的csv文件读入R?

时间:2015-08-08 14:05:13

标签: r csv data.table

我正在使用csv中的freaddata.table文件读入R中。读取过程因此错误而停止:

fstDF <- fread("dat.csv")
Read 34.5% of 2000004 rowsError in fread("dat.csv") : 
  Expected sep (',') but new line or EOF ends field 25 on line 800747 when reading data: John,,,ID,362526197318501X,M,19730218, ,,F,,CHN,44,4403,,,,,,,13828890538,,, ,M

我检查了数据,发现错误是由字段中的新行字符引起的,该行将一行记录分成两行。就像下面的示例数据中的Steve行一样:

Name,CardNo,Descriot,CtfTp,CtfId,Gender,Birthday,Address,Zip,Dirty,District1,District2,District3,District4,District5,District
6,FirstNm,LastNm,Duty,Mobile,Tel,Fax,EMail,Nation,Taste,Education,Company,CTel,CAddress,CZip,Family,Version,id              
Mike,,,OTH,010-116321,M,19000101,,100080, ,,CHN,0,0,,,,,,10116,010-82808028,010-82828028-208,chenmeng@dist.
Steve,,,GID,0282,M,19000101,,051430, ,,CHN
,0,0,,,,,,13831193762,0311-88030066,0311-88030088,info@shineway.com,,,,,   
Nicholas,,,OTH,010-125321,F,19000101,,100097,,,CHN,0,0,,,,,,10125,010-88400202,010-88400260,,,,,,,,,,,4
Abrham,,,OTH,010-130321,F,19000101,,100029,,,CHN,0,0,,,,,,10130,010-51292052/3-802,010-51292052/3-811,
Bill,,,OTH,010-142321,F,19000101,,100007,,,CHN,0,0,,,,,,10142,010-67687044,010-67687044,baiguoshouyue@sina       
Zabrina,,,OTH,010-186321,F,19000101,,100101,,,CHN,0,0,,,,,,13942697025,010-64869596/0411-668895950,0411-6688519
Julia,,,OTH,021-044321,M,19000101,,201206,,,CHN,0,0,,,,,,21044,021-28995000*208,021-50315077,jane.dai@parker.com  
Dave,,,OTH,021-127321,M,19000101,,200008,,,CHN,0,0,,,,,,21127,021-55150244,021-55150344,,,,,,,,,,,9     
Cecilia,,,OTH,021-151321,F,19000101,,201108,,,CHN,0,0,,,,,,21151,021-61451188,021-61452602,reception.china@eurotherm.co

此数据是从Microsoft SQL Server导出的。我无法访问数据库,我不知道导出过程有什么问题。但我当然知道这是一个错误的新行字符导致阅读问题。

这是关于stackoverflow的类似问题(没有明确的解决方案): Importing csv file to R new line issue

问题:

如何使用换行符读取csv数据?

1 个答案:

答案 0 :(得分:1)

第1步:删除行中间的append()

^M

参考:How to remove carriage returns in the middle of a line

第2步:用perl -pe 's/\r(?!\n)//g' 替换\n,(请参阅下面的@jimmij的回答。)

,

参考:https://unix.stackexchange.com/questions/222049/how-to-detect-and-remove-newline-character-within-a-column-in-a-csv-file/222052#222052

第3步:在fread中照常阅读:

perl -p00e 's/\n,/,/g' 
相关问题