仅当所有模式都按相同顺序匹配时才提取多行

时间:2019-06-08 16:11:02

标签: linux grep pcregrep

我遇到了here遇到的类似困难。

我的Linux日志文件(sample log file)包含以下条目,并且仅当这两行后跟后,我才希望对行“ Total Action Failed :”和“ Total Action Processed:”进行grep包含字符串“ > Processing file: R”的行。

INF----BusinessLog:08/06/19 20:44:33 > Processing file:  R1111111.R222222222.TEST0107, and creates the reports.
Line2
Line3
Line4
INF----BusinessLog:08/06/19 20:44:33 > Data
    =========
    Overview:
        Total Action          : 100
        Total Action Failed   : 0
        Total Action Processed: 100

INF----BusinessLog:08/06/19 20:44:35 > Processing file:  R333333333.R222222222.TEST0107, and creates the reports.
Line2
Line3
Line4
INF----BusinessLog:08/06/19 20:44:35 > Data
    =========
    Overview:
        Total Action          : 50
        Total Action Failed   : 0
        Total Action Processed: 50

尝试使用以下问题上给出的pcregrep解决方案:

/opt/pdag/bin/pcregrep -M  '> Processing file:  R.*(\n|.)*Total Action Failed   :.*(\n|.)*Total Action Processed:'" $log_path/LogFile.log

我遇到以下两个问题:

(1)上面的命令返回所有出现在图案线之间的线 –不需要

(2)如果日志文件包含以下(> Processing file: Z条目而不是(> Processing file: R)条目,则上述pcregrep命令不会给出准确的结果。

INF----BusinessLog:08/06/19 20:44:33 > Processing file:  R1111111.R222222222.TEST0107, and creates the reports.
Line2
Line3
Line4
INF----BusinessLog:08/06/19 20:44:33 > Data
    =========
    Overview:
        Total Action          : 100
        Total Action Failed   : 0
        Total Action Processed: 100

INF----BusinessLog:08/06/19 20:44:35 > Processing file:  Z333333333.R222222222.TEST0107, and creates the reports.
Line2
Line3
Line4
INF----BusinessLog:08/06/19 20:44:35 > Data
    =========
    Overview:
        Total Action          : 50
        Total Action Failed   : 0
        Total Action Processed: 50
INF----BusinessLog:08/06/19 20:44:45 > Processing file:  R555555555.R222222222.TEST0107, and creates the reports.
Line2
Line3
Line4
INF----BusinessLog:08/06/19 20:44:54 > Data
    =========
    Overview:
        Total Action          : 300
        Total Action Failed   : 45
        Total Action Processed: 300

有人可以帮助我找到解决此问题的方法吗?

当所有模式以相同顺序匹配时,我只需要以下三行:同样,第一个模式> Processing file: R和第二个模式Total Action Failed :之间的行数不同,并且不一定总是3行。

INF----BusinessLog:08/06/19 20:44:33 > Processing file:  R1111111.R222222222.TEST0107, and creates the reports.
        Total Action Failed   : 0
        Total Action Processed: 50
INF----BusinessLog:08/06/19 20:44:45 > Processing file:  R555555555.R222222222.TEST0107
            Total Action Failed   : 45
            Total Action Processed: 300

1 个答案:

答案 0 :(得分:1)

我认为您开始尝试创建一个满足您要求的正则表达式,而实际上您真正想要做的就是品脱每个以一行开头的块的第一行和最后两行,包括> Processing file: R。鉴于此,在每个UNIX框上的任何shell中都包含任何awk:

$ awk -v OFS='\n' '
    /> Processing file:[[:space:]]*R/ { if (h) print h, y, z; h=$0 }
    NF { y=z; z=$0 }
    END { print h, y, z }
' file
INF----BusinessLog:08/06/19 20:44:33 > Processing file:  R1111111.R222222222.TEST0107, and creates the reports.
        Total Action Failed   : 0
        Total Action Processed: 50
INF----BusinessLog:08/06/19 20:44:45 > Processing file:  R555555555.R222222222.TEST0107, and creates the reports.
        Total Action Failed   : 45
        Total Action Processed: 300

如果这不是您想要的,请更新您的问题以阐明您的要求,并提供一个上面的示例不起作用的示例,我们可以发布通用的便携式awk解决方案。