检查csv文件中的换行符

时间:2016-11-16 06:21:34

标签: shell awk grep gawk sql-loader

目前在我的代码下面用于修复csv中的换行符的行:

gawk -v RS='"' 'NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", $0, RT) }' MY_FILE.csv > MY_FILE.csv.tmp

我想做一个预检,就好像文件中有新的换行符,然后只有脚本会运行上面的命令来修复该文件,我该如何为此添加预检?

我的csv文件如下所示,其中包含1百万条记录:

20160711,"M","N1","F","S","A","good data with.....some special character and space (new line)
space ..
....","M","072","00126"

20160711,"M","N1","F","S","A","R","M","072","00126"
20160711,"M","N1","F","S","A","R","M","072","00126"

新行可以出现在文件的任何位置。

2 个答案:

答案 0 :(得分:1)

@sabya或许算上一行的双引号?如果奇怪,那么某处有一个回报:

gawk '{if (and(1,gsub(/"/, "\"")) HasReturn = 1; exit} END {exit HasReturn}'

答案 1 :(得分:0)

我恭敬地建议您按照给定的方式加载数据,而不是通过构造控制文件来保持数据的完整性来保持数据的完整性,以保留双引号之间的换行符。

使用" str"构建这样的控制文件。 infile选项行上的子句用于设置记录字符的结尾。它告诉sqlldr十六进制0D(回车或^ M)是记录分隔符(这样它会忽略双引号内的换行符):

LOAD DATA
infile "test.dat" "str x'0D'" 
TRUNCATE
INTO TABLE test
replace
fields terminated by ","  
optionally enclosed by '"'
(
cola char,
colb char,
colc char
)

此帖中的更多信息:https://stackoverflow.com/a/37216660/2543416