Bash脚本读取文件并添加内容

时间:2016-01-27 23:25:50

标签: bash

我在文件中有以下内容,我想过滤Executor Deserialize Time并添加所有值以获得最终结果。我怎么能这样做?

{"Event":"SparkListenerTaskEnd","Stage ID":0,"Stage Attempt ID":0,"Task Type":"ShuffleMapTask","Task End Reason":{"Reason":"Success"},"Task Info":{"Task ID":29,"Index":29,"Attempt":0,"Launch Time":1453927221831,"Executor ID":"1","Host":"172.17.0.226","Locality":"ANY","Speculative":false,"Getting Result Time":0,"Finish Time":1453927230401,"Failed":false,"Accumulables":[]},"Task Metrics":{"Host Name":"172.17.0.226","Executor Deserialize Time":9,"Executor Run Time":8550,"Result Size":2258,"JVM GC Time":18,"Result Serialization Time":0,"Memory Bytes Spilled":0,"Disk Bytes Spilled":0,"Shuffle Write Metrics":{"Shuffle Bytes Written":0,"Shuffle Write Time":4425,"Shuffle Records Written":0},"Input Metrics":{"Data Read Method":"Hadoop","Bytes Read":134283264,"Records Read":100890}}}
{"Event":"SparkListenerTaskEnd","Stage ID":0,"Stage Attempt ID":0,"Task Type":"ShuffleMapTask","Task End Reason":{"Reason":"Success"},"Task Info":{"Task ID":30,"Index":30,"Attempt":0,"Launch Time":1453927222232,"Executor ID":"1","Host":"172.17.0.226","Locality":"ANY","Speculative":false,"Getting Result Time":0,"Finish Time":1453927230493,"Failed":false,"Accumulables":[]},"Task Metrics":{"Host Name":"172.17.0.226","Executor Deserialize Time":7,"Executor Run Time":8244,"Result Size":2258,"JVM GC Time":16,"Result Serialization Time":0,"Memory Bytes Spilled":0,"Disk Bytes Spilled":0,"Shuffle Write Metrics":{"Shuffle Bytes Written":0,"Shuffle Write Time":4190,"Shuffle Records Written":0},"Input Metrics":{"Data Read Method":"Hadoop","Bytes Read":134283264,"Records Read":100886}}}
{"Event":"SparkListenerTaskEnd","Stage ID":0,"Stage Attempt ID":0,"Task Type":"ShuffleMapTask","Task End Reason":{"Reason":"Success"},"Task Info":{"Task ID":31,"Index":31,"Attempt":0,"Launch Time":1453927222796,"Executor ID":"1","Host":"172.17.0.226","Locality":"ANY","Speculative":false,"Getting Result Time":0,"Finish Time":1453927230638,"Failed":false,"Accumulables":[]},"Task Metrics":{"Host Name":"172.17.0.226","Executor Deserialize Time":5,"Executor Run Time":7826,"Result Size":2258,"JVM GC Time":18,"Result Serialization Time":0,"Memory Bytes Spilled":0,"Disk Bytes Spilled":0,"Shuffle Write Metrics":{"Shuffle Bytes Written":0,"Shuffle Write Time":3958,"Shuffle Records Written":0},"Input Metrics":{"Data Read Method":"Hadoop","Bytes Read":134283264,"Records Read":101004}}}

2 个答案:

答案 0 :(得分:0)

grep -P -o "Executor Deserialize Time.:[0-9]+" file.txt |
    cut -d: -f2 | awk '{ sum+=$1} END {print sum}'

grep将每行的一部分与你想要的字段对齐。
拆分它只是抓住号码。
使用awk总结所有值

答案 1 :(得分:0)

awk -v RS=, '/^"Executor Deserialize Time":/ {split($0,a,":"); tot+=a[2]} END{print tot}' file
  • RS(记录分隔符)设置为,
  • 匹配与所需字段名称匹配的记录。
  • 将当前记录拆分为:
  • 将第二个拆分字段添加到我们的总计中。
  • END
  • 打印总计

或者同样的想法,但设置FS(字段分隔符)

awk -F , '{for (i=1;i<=NF;i++) {if ($i ~ /^"Executor Deserialize Time":/) {split($i,a,":"); tot+=a[2]}}} END{print tot}' file
  • FS设为,
  • 遍历从1NF的每个字段。
  • 匹配所需的字段。
  • 将当前记录拆分为:
  • 将第二个拆分字段添加到我们的总计中。
  • END
  • 打印总计

如果你只想要给定Stage ID的给定值,那么你可以使用它:

awk -v stage=0 -F , '{
    ds=0; val=0
    for (i=1;i<=NF;i++) {
        split($i,a,":")

        if (a[1] == "\"Executor Deserialize Time\"") {
            val=a[2]
        }

        if ((a[1] == "\"Stage ID\"") && (a[2] == stage)) {
            ds++
        }

        if (ds && val) {
            tot+=val
            next
        }
    }
}
END{print tot}' file

跟踪我们是否已经看到每条线的两个必要值,并且只有在我们拥有时才合计。它使用stage变量来执行此操作,以便您可以从awk脚本(-v stage=0参数)外部控制它。