使用AWK用逗号分析字段

时间:2015-06-16 15:54:36

标签: regex bash shell awk

已编辑 - TLDR:使用awk解析包含逗号的字段。

#original config file - confile1
$ cat confile1
list=(
app1,"HOSTNAME - port - application name - alert1",99.0,99.0
app2,"HOSTNAME - port - application name - alert1",99.0,99.0
app3,"HOSTNAME - port - service name - alert2",99.0,99.0
web1,"URL - HOSTNAMES(01,02) - http://someurl.com/ - alert1",99.0,99.0
)
#original script - test1
$ cat test1
#!/bin/bash

IFS="$(printf '\n\t')"

function parse
{
for item in ${list[*]}
do
  group=$(echo $item | awk -F, '{print $1}')
  monitor=$(echo $item | awk -F, '{print $2}')
  grp_sla=$(echo $item | awk -F, '{print $3}')
  mon_sla=$(echo $item | awk -F, '{print $4}')
  echo $group
  echo $monitor
  echo $grp_sla
  echo $mon_sla
done
}

. confile1
parse

注意confile1的最后一行被屠杀,因为它在第二个字段中有一个逗号

  $ ./test1
    app1
    HOSTNAME - port - application name - alert1
    99.0
    99.0
    app2
    HOSTNAME - port - application name - alert1
    99.0
    99.0
    app3
    HOSTNAME - port - service name - alert2
    99.0
    99.0
    web1
    URL - HOSTNAMES(01
    02) - http://someurl.com/ - alert1
    99.0

3 个答案:

答案 0 :(得分:2)

我不愿意涉及你的整个问题(对不起,恕我直言,它太长了太多无关的信息),但看起来你正试图从中提取单个字段那" confile1"在你的问题的顶部,所以这可能是你需要的所有提示:

$ cat tst.awk
BEGIN { FPAT = "([^,]*)|(\"[^\"]+\")" }
NF>1 {
    print "\nRecord", ++nr":", $0
    for (i=1; i<=NF; i++) {
        print "   Field", i":", $i
    }
}

$ awk -f tst.awk confile1

Record 1: app1,"HOSTNAME - port - application name - alert1",99.0,99.0
   Field 1: app1
   Field 2: "HOSTNAME - port - application name - alert1"
   Field 3: 99.0
   Field 4: 99.0

Record 2: app2,"HOSTNAME - port - application name - alert1",99.0,99.0
   Field 1: app2
   Field 2: "HOSTNAME - port - application name - alert1"
   Field 3: 99.0
   Field 4: 99.0

Record 3: app3,"HOSTNAME - port - service name - alert2",99.0,99.0
   Field 1: app3
   Field 2: "HOSTNAME - port - service name - alert2"
   Field 3: 99.0
   Field 4: 99.0

Record 4: web1,"URL - HOSTNAMES(01,02) - http://someurl.com/ - alert1",99.0,99.0
   Field 1: web1
   Field 2: "URL - HOSTNAMES(01,02) - http://someurl.com/ - alert1"
   Field 3: 99.0
   Field 4: 99.0

以上使用GNU awk for FPAT(参见http://www.gnu.org/software/gawk/manual/gawk.html#Splitting-By-Content)。

特别是因为你在教自己,我强烈建议你获得由Arnold Robbins编写的Effective Awk Programming,第4版和Chris Johnson的Shell Scripting Recipes,因为它非常容易在UNIX中走错路给出了解决任何一个问题的所有可能方法。

答案 1 :(得分:0)

针对您的特定设置的另一种解决方案是使用NF来控制字段。在这里,我将OFS设置为更加明显。

$ awk -F, 'BEGIN{OFS=" <-> "} NF==4{print $1, $2, $3, $4 } NF==5{print $1, $2","$3, $4, $5}' data.csv

app1 <-> "HOSTNAME - port - application name - alert1" <-> 99.0 <-> 99.0
app2 <-> "HOSTNAME - port - application name - alert1" <-> 99.0 <-> 99.0
app3 <-> "HOSTNAME - port - service name - alert2" <-> 99.0 <-> 99.0
web1 <-> "URL - HOSTNAMES(01,02) - http://someurl.com/ - alert1" <-> 99.0 <-> 99.0

答案 2 :(得分:0)

Ed Morton提供了我需要的确切信息。我已经在我的主脚本上对它进行了测试,它完全解析了!这是测试代码的工作原理:

$ awk 'BEGIN { FPAT = "([^,]*)|(\"[^\"]+\")" } {print $0}' confile4
app1,"HOSTNAME - port - application name - alert1",99.0,99.0
app2,"HOSTNAME - port - application name - alert1",99.0,99.0
app3,"HOSTNAME - port - service name - alert2",99.0,99.0
web1,"URL - HOSTNAMES(01,02) - http://someurl.com/ - alert1",99.0,99.0


$ cat test10
#!/bin/bash

IFS="$(printf '\n\t')"

function parse
{
for item in $(awk 'BEGIN { FPAT = "([^,]*)|(\"[^\"]+\")" } {print $0}' confile4)
do
  group=$(echo $item | awk 'BEGIN { FPAT = "([^,]*)|(\"[^\"]+\")" } {print $1}')
  monitor=$(echo $item | awk 'BEGIN { FPAT = "([^,]*)|(\"[^\"]+\")" } {print $2}')
  grp_sla=$(echo $item | awk 'BEGIN { FPAT = "([^,]*)|(\"[^\"]+\")" } {print $3}')
  mon_sla=$(echo $item | awk 'BEGIN { FPAT = "([^,]*)|(\"[^\"]+\")" } {print $4}')
  echo $group
  echo $monitor
  echo $grp_sla
  echo $mon_sla
done
}

parse

$ ./test10
app1
"HOSTNAME - port - application name - alert1"
99.0
99.0
app2
"HOSTNAME - port - application name - alert1"
99.0
99.0
app3
"HOSTNAME - port - service name - alert2"
99.0
99.0
web1
"URL - HOSTNAMES(01,02) - http://someurl.com/ - alert1"
99.0
99.0