处理双引号内的逗号+ awk

时间:2016-05-12 20:37:21

标签: bash awk comma double-quotes

这是我的档案

$ cat -v test2
"Rec Open Date","MSISDN","IMEI","Data Volume (Bytes)","Device Manufacturer","Device Model","Product Description"
"2015-10-06","427","060","137765","Samsung Korea","Samsung SM-G900I","$39 Plan"
"2015-10-06","592","620","0","Apple Inc","Apple iPhone 6 (A1586)","PREPAY  STD - TRIAL - #16"
"2015-10-06","007","290","0","Apple Inc","Apple iPhone 6 (A1586)","PREPAY PLUS - $0 -"
"2015-10-06","592","050","48836832","Apple Inc","Apple iPhone 5S (A1530)","Talk and Text Connect Flexi Plan"
"2016-04-27","498","220","146610","Guangdong Oppo Mobile Telecommunications Corp Ltd","OPPO X9076,OPPO R6006,OPPO R6001,OPPO N5116,OPPO X9006","$29.95 Carryover Plan (1GB)"
"2015-10-06","409","720","113755347","Samsung Korea","Samsung SM-G360G","$29 CARRYOVER PLAN"
"2015-10-06","742","620","19840943","Apple Inc","Apple iPhone S (A1530)","PREPAY STD - $0 - #2"
"2015-10-06","387","180","0","HUAWEI Technologies Co Ltd","HUAWEI HUAWEI G526-L11","PREPAY STD - $1 - #4"

此命令在末尾添加一列

$ awk -F, -v OFS=, -v q='"' 'NR==1{$8=q"Data_Volume_MB"q} NR>1{$8=$4; gsub(/"/,"",$8); $8= q $8/(1024*1024)q}1' test2 | cat -v
"Rec Open Date","MSISDN","IMEI","Data Volume (Bytes)","Device Manufacturer","Device Model","Product Description","Data_Volume_MB"
"2015-10-06","427","060","137765","Samsung Korea","Samsung SM-G900I","$39 Plan","0.131383"
"2015-10-06","592","620","0","Apple Inc","Apple iPhone 6 (A1586)","PREPAY  STD - TRIAL - #16","0"
"2015-10-06","007","290","0","Apple Inc","Apple iPhone 6 (A1586)","PREPAY PLUS - $0 -","0"
"2015-10-06","592","050","48836832","Apple Inc","Apple iPhone 5S (A1530)","Talk and Text Connect Flexi Plan","46.5744"
"2016-04-27","498","220","146610","Guangdong Oppo Mobile Telecommunications Corp Ltd","OPPO X9076,OPPO R6006,"0.139818",OPPO N5116,OPPO X9006","$29.95 Carryover Plan (1GB)"
"2015-10-06","409","720","113755347","Samsung Korea","Samsung SM-G360G","$29 CARRYOVER PLAN","108.486"
"2015-10-06","742","620","19840943","Apple Inc","Apple iPhone S (A1530)","PREPAY STD - $0 - #2","18.9218"
"2015-10-06","387","180","0","HUAWEI Technologies Co Ltd","HUAWEI HUAWEI G526-L11","PREPAY STD - $1 - #4","0"

我的问题是这一行

"2016-04-27","498","220","146610","Guangdong Oppo Mobile Telecommunications Corp Ltd","OPPO X9076,OPPO R6006,"0.139818",OPPO N5116,OPPO X9006","$29.95 Carryover Plan (1GB)"

它改为

"2016-04-27","498","220","146610","Guangdong Oppo Mobile Telecommunications Corp Ltd","OPPO X9076,OPPO R6006,"0.139818",OPPO N5116,OPPO X9006","$29.95 Carryover Plan (1GB)"

将此"0.139818"放在错误的位置。 它并不像其他人那样。问题似乎是此列中双引号中的逗号: "OPPO X9076,OPPO R6006,"0.139818",OPPO N5116,OPPO X9006"

实现这一目标的最佳方式是什么,或者是否可能?这就是我想要的线条,就像其他线条一样。

"2016-04-27","498","220","146610","Guangdong Oppo Mobile Telecommunications Corp Ltd","OPPO X9076,OPPO R6006,OPPO N5116,OPPO X9006","$29.95 Carryover Plan (1GB)","0.139818"

也许我需要整理数据,尤其是在获得awk之前这一行。

EDIT1答案解决了

将分隔符更改为;并在最后添加新列

$ sed 's/","/";"/g' < test2 | awk -F';' -v OFS=';' -v q='"' 'NR==1{$8=q"Data_Volume_MB"q} NR>1{n=$4; gsub(/"/,"",n); $8= q n/(1024*1024)q}1'
"Rec Open Date";"MSISDN";"IMEI";"Data Volume (Bytes)";"Device Manufacturer";"Device Model";"Product Description";"Data_Volume_MB"
"2015-10-06";"427";"060";"137765";"Samsung Korea";"Samsung SM-G900I";"$39 Plan";"0.131383"
"2015-10-06";"592";"620";"0";"Apple Inc";"Apple iPhone 6 (A1586)";"PREPAY  STD - TRIAL - #16";"0"
"2015-10-06";"007";"290";"0";"Apple Inc";"Apple iPhone 6 (A1586)";"PREPAY PLUS - $0 -";"0"
"2015-10-06";"592";"050";"48836832";"Apple Inc";"Apple iPhone 5S (A1530)";"Talk and Text Connect Flexi Plan";"46.5744"
"2016-04-27";"498";"220";"146610";"Guangdong Oppo Mobile Telecommunications Corp Ltd";"OPPO X9076,OPPO R6006,OPPO R6001,OPPO N5116,OPPO X9006";"$29.95 Carryover Plan (1GB)";"0.139818"
"2015-10-06";"409";"720";"113755347";"Samsung Korea";"Samsung SM-G360G";"$29 CARRYOVER PLAN";"108.486"
"2015-10-06";"742";"620";"19840943";"Apple Inc";"Apple iPhone S (A1530)";"PREPAY STD - $0 - #2";"18.9218"
"2015-10-06";"387";"180";"0";"HUAWEI Technologies Co Ltd";"HUAWEI HUAWEI G526-L11";"PREPAY STD - $1 - #4";"0"

将分隔符从|更改为并在最后添加新列

$ sed 's/","/"|"/g' < test2 | awk -F'|' -v OFS='|' -v q='"' 'NR==1{$8=q"Data_Volume_MB"q} NR>1{n=$4; gsub(/"/,"",n); $8= q n/(1024*1024)q}1'
"Rec Open Date"|"MSISDN"|"IMEI"|"Data Volume (Bytes)"|"Device Manufacturer"|"Device Model"|"Product Description"|"Data_Volume_MB"
"2015-10-06"|"427"|"060"|"137765"|"Samsung Korea"|"Samsung SM-G900I"|"$39 Plan"|"0.131383"
"2015-10-06"|"592"|"620"|"0"|"Apple Inc"|"Apple iPhone 6 (A1586)"|"PREPAY  STD - TRIAL - #16"|"0"
"2015-10-06"|"007"|"290"|"0"|"Apple Inc"|"Apple iPhone 6 (A1586)"|"PREPAY PLUS - $0 -"|"0"
"2015-10-06"|"592"|"050"|"48836832"|"Apple Inc"|"Apple iPhone 5S (A1530)"|"Talk and Text Connect Flexi Plan"|"46.5744"
"2016-04-27"|"498"|"220"|"146610"|"Guangdong Oppo Mobile Telecommunications Corp Ltd"|"OPPO X9076,OPPO R6006,OPPO R6001,OPPO N5116,OPPO X9006"|"$29.95 Carryover Plan (1GB)"|"0.139818"
"2015-10-06"|"409"|"720"|"113755347"|"Samsung Korea"|"Samsung SM-G360G"|"$29 CARRYOVER PLAN"|"108.486"
"2015-10-06"|"742"|"620"|"19840943"|"Apple Inc"|"Apple iPhone S (A1530)"|"PREPAY STD - $0 - #2"|"18.9218"
"2015-10-06"|"387"|"180"|"0"|"HUAWEI Technologies Co Ltd"|"HUAWEI HUAWEI G526-L11"|"PREPAY STD - $1 - #4"|"0"

将分隔符更改为;并在第二列的第二列之前插入

$ sed 's/","/";"/g' < test2 | awk -F';' -v OFS=';' -v q='"' 'NR==1{$(NF-1)=q"Data_Volume_MB"q FS $(NF-1)} NR>1{n=$4; gsub(/"/,"",n); $(NF-1)= q n/(1024*1024)q FS $(NF-1)}1'
"Rec Open Date";"MSISDN";"IMEI";"Data Volume (Bytes)";"Device Manufacturer";"Data_Volume_MB";"Device Model";"Product Description"
"2015-10-06";"427";"060";"137765";"Samsung Korea";"0.131383";"Samsung SM-G900I";"$39 Plan"
"2015-10-06";"592";"620";"0";"Apple Inc";"0";"Apple iPhone 6 (A1586)";"PREPAY  STD - TRIAL - #16"
"2015-10-06";"007";"290";"0";"Apple Inc";"0";"Apple iPhone 6 (A1586)";"PREPAY PLUS - $0 -"
"2015-10-06";"592";"050";"48836832";"Apple Inc";"46.5744";"Apple iPhone 5S (A1530)";"Talk and Text Connect Flexi Plan"
"2016-04-27";"498";"220";"146610";"Guangdong Oppo Mobile Telecommunications Corp Ltd";"0.139818";"OPPO X9076,OPPO R6006,OPPO R6001,OPPO N5116,OPPO X9006";"$29.95 Carryover Plan (1GB)"
"2015-10-06";"409";"720";"113755347";"Samsung Korea";"108.486";"Samsung SM-G360G";"$29 CARRYOVER PLAN"
"2015-10-06";"742";"620";"19840943";"Apple Inc";"18.9218";"Apple iPhone S (A1530)";"PREPAY STD - $0 - #2"
"2015-10-06";"387";"180";"0";"HUAWEI Technologies Co Ltd";"0";"HUAWEI HUAWEI G526-L11";"PREPAY STD - $1 - #4"

1 个答案:

答案 0 :(得分:1)

我建议您首先更改字段分隔符,如下所示(此处我将其从,更改为|):

sed 's/","/"|"/g' < test2 > newfile

然后使用awk上的newfile代码。

你可以把这一切都放在一行(我在这里没有使用你的awk代码,而只是我自己的awk代码):

sed 's/","/"|"/g' < test2 | awk 'BEGIN{FS="|"} {print  $1}'

在回应OP评论时,请务必按此方式运行您的命令(请注意我已将-F,更改为-F"|"

    sed 's/","/"|"/g' < test2 | awk -F"|" -v OFS=, -v q='"' 'NR==1{$8=q"Data_Volume_MB"q} NR>1{$8=$4; gsub(/"/,"",$8); $8= q $8/(1024*1024)q}1'

使用您的数据,这是我的结果:

"Rec Open Date","MSISDN","IMEI","Data Volume (Bytes)","Device Manufacturer","Device Model","Product Description","Data_Volume_MB"
"2015-10-06","427","060","137765","Samsung Korea","Samsung SM-G900I","$39 Plan","0.131383"
"2015-10-06","592","620","0","Apple Inc","Apple iPhone 6 (A1586)","PREPAY  STD - TRIAL - #16","0"
"2015-10-06","007","290","0","Apple Inc","Apple iPhone 6 (A1586)","PREPAY PLUS - $0 -","0"
"2015-10-06","592","050","48836832","Apple Inc","Apple iPhone 5S (A1530)","Talk and Text Connect Flexi Plan","46.5744"
"2016-04-27","498","220","146610","Guangdong Oppo Mobile Telecommunications Corp Ltd","OPPO X9076,OPPO R6006,OPPO R6001,OPPO N5116,OPPO X9006","$ Carryover Plan (1GB)","0.139818"
"2015-10-06","409","720","113755347","Samsung Korea","Samsung SM-G360G","$29 CARRYOVER PLAN","108.486"
"2015-10-06","742","620","19840943","Apple Inc","Apple iPhone S (A1530)","PREPAY STD - $0 - #2","18.9218"
"2015-10-06","387","180","0","HUAWEI Technologies Co Ltd","HUAWEI HUAWEI G526-L11","PREPAY STD - $1 - #4","0"