Unix:查找并替换连续的逗号到连续的管道

时间:2021-01-08 13:33:19

标签: regex linux unix awk sed

我正在 Unix 中将双引号 CSV 转换为管道分隔的 txt 文件。 我使用以下 sed 命令将“,”替换为 |然后删除开始和结束双引号。

sed -e 's/","/|/g' -e 's/"//g' filenm.csv > filenm.txt

但文件似乎有连续的逗号,没有双引号,它们没有被替换。

Col1|col2|col3|col4|col5|col6|col7|col8
Val1|val2|val3,,,,val7|val8

现在我想将所有这些连续的逗号转换为连续的管道,因为它们指示空字段或空字段。

其他字段在字段值内也有逗号,不应更改。

我尝试使用下面的方法,但不起作用。

sed -e 's/,{1,\}/|{1,\}/g' filenm.csv > filenm.txt

在记事本中打开的示例 csv 文件:

"ID","Name","DOB","Age","Address","City","State","Country","Phone number"
"123","ABC","12/20/2020","15","No.38,3rd st, RRR NNN, TRT",,,,"9999999999"
"456","DEF","12/20/2020",,,,,"test-country","9999999999"
"465","XYZ",,,"No.38,3rd st, RRR NNN, TRT",,,,"9999999999"

我希望这有助于重现问题并解决。

提前致谢....

4 个答案:

答案 0 :(得分:4)

这可能对你有用(GNU sed):

sed -E ':a;s/^(("[^",]*",+)*"[^",]*),/\1\n/;ta;y/,\n/|,/' file

用换行符迭代替换 , 之间的 ",然后将 , 翻译为 |,将换行符翻译为 , .

答案 1 :(得分:1)

您可以使用perl

perl -pe 's/"([^"]*)"|,/defined($1) ? $1 : "|"/ge' filenm.csv > filenm.txt

详情

  • "([^"]*)"|, - 匹配 " 的正则表达式模式,然后将除 " 之外的任何零个或多个字符捕获到组 1 中,然后匹配一个 ",或者只匹配所有其他上下文中的 ,
  • defined($1) ? $1 : "|" - RHS,replacement,用第 1 组值(如果第 1 组匹配)或 |(如果 , 匹配)替换匹配项
  • ge - g 代表 global(替换所有出现的),e 使 Perl 将 RHS 视为 Perl 表达式。

查看online test

#!/bin/bash
s='"ID","Name","DOB","Age","Address","City","State","Country","Phone number"
"123","ABC","12/20/2020","0","No.38,3rd st, RRR NNN, TRT",,,,"9999999999"'
perl -pe 's/"([^"]*)"|,/defined($1) ? $1 : "|"/ge' <<< "$s"

输出:

ID|Name|DOB|Age|Address|City|State|Country|Phone number
123|ABC|12/20/2020|0|No.38,3rd st, RRR NNN, TRT||||9999999999

答案 2 :(得分:1)

使用awk:

awk -F \" '{ for(i=1;i<=NF;i++) { if ($i ~ /^[,]{2,}$/) { $i="," } } OFS="\"";gsub("\",\"","\"|\"",$0)}1' sample.csv

说明:

awk -F \" '{  # Set the field delimiter to double quote
             for(i=1;i<=NF;i++) { 
               if ($i ~ /^[,]{2,}$/) { 
                  $i="," # Loop through each field and if is contains 2 or more commas, set that field to one comma
               } 
             } 
             OFS="\"";
             gsub("\",\"","\"|\"",$0) # Substitute "," for "|"
           }1' sample.csv

答案 3 :(得分:1)

我会使用 GNU AWK 来实现以下方式。让 file.txt 内容成为

"ID","Name","DOB","Age","Address","City","State","Country","Phone number"
"123","ABC","12/20/2020","15","No.38,3rd st, RRR NNN, TRT",,,,"9999999999"
"456","DEF","12/20/2020",,,,,"test-country","9999999999"
"465","XYZ",,,"No.38,3rd st, RRR NNN, TRT",,,,"9999999999"

然后

awk 'BEGIN{FS="\"";OFS=""}{for(i=1;i<=NF;i+=2){$i=gensub(/,/,"|","g",$i)};print $0}' file.txt

输出

ID|Name|DOB|Age|Address|City|State|Country|Phone number
123|ABC|12/20/2020|15|No.38,3rd st, RRR NNN, TRT||||9999999999
456|DEF|12/20/2020|||||test-country|9999999999
465|XYZ|||No.38,3rd st, RRR NNN, TRT||||9999999999

我假设第一列和最后一列永远不会为空。我使用 " 作为字段分隔符,然后在每个奇数字段(这些只包含 ,)中,我将所有 , 更改为 |。最后,我打印了整条这样修改过的行。

(在 GNU Awk 5.0.1 中测试)

相关问题