删除分隔符中的行,但匹配正则表达式的行除外

时间:2013-12-04 18:47:05

标签: regex perl

我想删除两个字符串之间的所有文本,除了以某些字符串开头的行。使用以下示例,我想删除不以BEGINEND开头的行BREAK1BREAK2之间的文本:< / p>

keep keep keep
BEGIN
remove remove remove
remove remove remove
BREAK1 keep keep keep
remove remove remove
BREAK2 keep keep keep
remove remove remove
END
keep keep keep

有人知道如何用正则表达式做到这一点吗?

5 个答案:

答案 0 :(得分:8)

perl -ne 'print if !(/^BEGIN/ .. /^END/) or /^BREAK/' file

输出

keep keep keep
BREAK1 keep keep keep
BREAK2 keep keep keep
keep keep keep
标量上下文中的

..为perl flip-flop opeartor/^BEGIN/ .. /^END/将评估trueBEGIN之间的所有行的END

答案 1 :(得分:1)

好吧,您可以将其读取或拆分为@lines,然后遍历每一行,跟踪您的状态(BEGIN..END块的内部或外部)。如果在外面,请保持并传递线路。如果在里面,如果$line =~ m/^BREAK\d+\s*(.*)$/返回FALSE则丢弃,否则$ 1包含保留文本。我会把它作为练习留给学生,以确定你是否处于BEGIN区块。

答案 2 :(得分:1)

您可以使用此模式:

s/(?:^BEGIN\R|\G(?<!\A)(?:(?:BREAK1|BREAK2).*\R|END(?=\R|$)))\K|\G(?<!\A).*\R//gm

我们的想法是先匹配所有必须保留的内容,然后将匹配结果重置为\K\G锚用于确保匹配的不同部分是连续的。但是,当前模式不检查标记“END”的存在。如果它不存在,则替换继续到字符串的结尾(与html标记相同的行为)。要避免此行为,您可以在最后添加前瞻:(?=(?s).*?\REND(?:\R|$))

模式细节:

(?:                       # non capturing group for all that must be preserved
    ^BEGIN\R              # the word "BEGIN" at the start of a line, followed
                          # by a newline
  |                       # OR
    \G                    # contiguous to a precedent match or at the start of
                          # the string
    (?<!\A)               # lookbehind: not preceded by the start of the string
    (?:                   # non capturing group: all that must be contiguous
        (?:BREAK1|BREAK2) # one of this two words
        .*\R              # all until the newline (included)
      |                   # OR
        END               # 
        (?=\R|$)          # lookahead to check if END is followed by a newline
                          # or the end of the string. Since it is a zero-width 
                          # assertion and doesn't match anything, it is used to
                          # contiguous matches.
    )                     # close the 2nd non capturing group
)                         # close the 1st non capturing group
\K                        # reset the 1st non capturing group from match result
|                         # OR
\G(?<!\A).*\R             # all that is contiguous to a precedent match until
                          # the newline (included)

答案 3 :(得分:0)

好的,这是一个perl问题,但我无法抗拒发布sed(1)解决方案:

sed '/^BEGIN/,/^END/ { /^BREAK[12]/!d }'

答案 4 :(得分:-1)

在Linux机器上,您可以运行egrep命令

egrep -v ^BREAK