我正在 Unix 中将双引号 CSV 转换为管道分隔的 txt 文件。 我使用以下 sed 命令将“,”替换为 |然后删除开始和结束双引号。
sed -e 's/","/|/g' -e 's/"//g' filenm.csv > filenm.txt
但文件似乎有连续的逗号,没有双引号,它们没有被替换。
Col1|col2|col3|col4|col5|col6|col7|col8
Val1|val2|val3,,,,val7|val8
现在我想将所有这些连续的逗号转换为连续的管道,因为它们指示空字段或空字段。
其他字段在字段值内也有逗号,不应更改。
我尝试使用下面的方法,但不起作用。
sed -e 's/,{1,\}/|{1,\}/g' filenm.csv > filenm.txt
在记事本中打开的示例 csv 文件:
"ID","Name","DOB","Age","Address","City","State","Country","Phone number"
"123","ABC","12/20/2020","15","No.38,3rd st, RRR NNN, TRT",,,,"9999999999"
"456","DEF","12/20/2020",,,,,"test-country","9999999999"
"465","XYZ",,,"No.38,3rd st, RRR NNN, TRT",,,,"9999999999"
我希望这有助于重现问题并解决。
提前致谢....
答案 0 :(得分:4)
这可能对你有用(GNU sed):
sed -E ':a;s/^(("[^",]*",+)*"[^",]*),/\1\n/;ta;y/,\n/|,/' file
用换行符迭代替换 ,
之间的 "
,然后将 ,
翻译为 |
,将换行符翻译为 ,
.
答案 1 :(得分:1)
您可以使用perl
:
perl -pe 's/"([^"]*)"|,/defined($1) ? $1 : "|"/ge' filenm.csv > filenm.txt
详情:
"([^"]*)"|,
- 匹配 "
的正则表达式模式,然后将除 "
之外的任何零个或多个字符捕获到组 1 中,然后匹配一个 "
,或者只匹配所有其他上下文中的 ,
defined($1) ? $1 : "|"
- RHS,replacement,用第 1 组值(如果第 1 组匹配)或 |
(如果 ,
匹配)替换匹配项立>
ge
- g
代表 global
(替换所有出现的),e
使 Perl 将 RHS 视为 Perl 表达式。查看online test:
#!/bin/bash
s='"ID","Name","DOB","Age","Address","City","State","Country","Phone number"
"123","ABC","12/20/2020","0","No.38,3rd st, RRR NNN, TRT",,,,"9999999999"'
perl -pe 's/"([^"]*)"|,/defined($1) ? $1 : "|"/ge' <<< "$s"
输出:
ID|Name|DOB|Age|Address|City|State|Country|Phone number
123|ABC|12/20/2020|0|No.38,3rd st, RRR NNN, TRT||||9999999999
答案 2 :(得分:1)
使用awk:
awk -F \" '{ for(i=1;i<=NF;i++) { if ($i ~ /^[,]{2,}$/) { $i="," } } OFS="\"";gsub("\",\"","\"|\"",$0)}1' sample.csv
说明:
awk -F \" '{ # Set the field delimiter to double quote
for(i=1;i<=NF;i++) {
if ($i ~ /^[,]{2,}$/) {
$i="," # Loop through each field and if is contains 2 or more commas, set that field to one comma
}
}
OFS="\"";
gsub("\",\"","\"|\"",$0) # Substitute "," for "|"
}1' sample.csv
答案 3 :(得分:1)
我会使用 GNU AWK
来实现以下方式。让 file.txt
内容成为
"ID","Name","DOB","Age","Address","City","State","Country","Phone number"
"123","ABC","12/20/2020","15","No.38,3rd st, RRR NNN, TRT",,,,"9999999999"
"456","DEF","12/20/2020",,,,,"test-country","9999999999"
"465","XYZ",,,"No.38,3rd st, RRR NNN, TRT",,,,"9999999999"
然后
awk 'BEGIN{FS="\"";OFS=""}{for(i=1;i<=NF;i+=2){$i=gensub(/,/,"|","g",$i)};print $0}' file.txt
输出
ID|Name|DOB|Age|Address|City|State|Country|Phone number
123|ABC|12/20/2020|15|No.38,3rd st, RRR NNN, TRT||||9999999999
456|DEF|12/20/2020|||||test-country|9999999999
465|XYZ|||No.38,3rd st, RRR NNN, TRT||||9999999999
我假设第一列和最后一列永远不会为空。我使用 "
作为字段分隔符,然后在每个奇数字段(这些只包含 ,
)中,我将所有 ,
更改为 |
。最后,我打印了整条这样修改过的行。
(在 GNU Awk 5.0.1 中测试)