我有一个CSV文件,如下所示:
height, comment, name
152, he was late, for example, on Tuesday, Fred
162, , Sam
我无法解析此文件,因为它在注释字段中包含可变数量的未封闭逗号(但没有其他字段)。我想使用awk
修复文件(这对我来说很新),以便第二个字段中的逗号变为分号:
height, comment, name
152, he was late; for example; on Tuesday, Fred
162, , Sam
(用引号括起整个字段不会解决我的问题,因为我的CSV解析器不理解引号。)
到目前为止,我正在考虑使用NF来计算未封闭的逗号的数量,然后使用带有令人不快的正则表达式的gsub替换它们,但我觉得我应该能够利用awk来编写更易读的程序而我不是确保NF表现得这样。
答案 0 :(得分:2)
基本上只是一种蛮力解决方案,但相当容易理解。用
调用$ awk -F "," -f test.awk test.dat
awk文件。
$ cat test.awk
{
printf "%s, ", $1
if (NF > 3) {
for (i = 2; i < NF; i++) {
printf "%s;", $i
}
printf ", "
}
else {
printf "%s, ", $2
}
printf "%s\n", $NF
}
答案 1 :(得分:2)
$ cat file
height, comment, name
152, he was late, for example, on Tuesday, Fred
162, , Sam
$ awk -v OFS=, '{
height = comment = name = $0
sub(/,.*$/,"",height)
sub(/^.*,/,"",name)
gsub(/^[^,]+,|,[^,]+$/,"",comment)
gsub(/,/,";",comment)
print height, comment, name
}' file
height, comment, name
152, he was late; for example; on Tuesday, Fred
162, , Sam