php regex,在括号中获取字符串

时间:2014-06-14 10:30:22

标签: php regex

我有一些解析器的输出,我想在其中搜索substing。输出:

(ROOT  
(S
(NP (DT the) (NN author))
(VP (VBZ has)
  (VP (VBN failed)
    (S
      (VP (TO to)
        (VP
          (VP (VB catch)
            (PRT (RP up))
            (PP (IN with)
              (NP
                (NP (JJ recent) (NNS discoveries))
                (PP (IN about)
                  (NP (NNP Thespis))))))
          (PRN (, ,)
            (S
              (NP (NNP Gilbert)
                (CC and)
                (NNP Sullivan))
              (VP (VBZ 's)
                (ADJP (JJ first))))
            (, ,))
          (NP (`` `) (JJ lost) ('' ') (JJ joint) (NN work)))))))
(. .)))

我想匹配这部分:

(PRN (, ,)
        (S
          (NP (NNP Gilbert)
            (CC and)
            (NNP Sullivan))
          (VP (VBZ 's)
            (ADJP (JJ first))))
        (, ,))

在这部分PRN是静态的。 我该怎么写正则表达式呢?

2 个答案:

答案 0 :(得分:2)

假设您希望匹配以PRN开头并且可能包含嵌套括号组的括号组(并且这整个块本身可以嵌套在括号括起来的组中),那么以下测试的递归正则表达式解决方案将诀窍:

递归PCRE正则表达式解决方案:

<?php // test.php 20140614_0800
// The regex:
$re = '/
    # Match nested parenthesized group beginning with PRN.
    \(PRN          # Literal opening sequence.
    (              # $1: Recursive subroutine!
      (?:          # Zero or more contents alternatives.
        [^()]++    # Either one or more non-parentheses,
      | \((?1)\)   # Or a nested parenthesized group.
      )*           # End zero or more contents alternatives.
    )              # End $1: Recursive subroutine!
    \)             # Literal closing sequence.
    /x';

// The string:
$s = '(ROOT  
(S
(NP (DT the) (NN author))
(VP (VBZ has)
  (VP (VBN failed)
    (S
      (VP (TO to)
        (VP
          (VP (VB catch)
            (PRT (RP up))
            (PP (IN with)
              (NP
                (NP (JJ recent) (NNS discoveries))
                (PP (IN about)
                  (NP (NNP Thespis))))))
          (PRN (, ,)
            (S
              (NP (NNP Gilbert)
                (CC and)
                (NNP Sullivan))
              (VP (VBZ \'s)
                (ADJP (JJ first))))
            (, ,))
          (NP (`` `) (JJ lost) (\'\' \') (JJ joint) (NN work)))))))
(. .)))';

// The code:
if (preg_match($re, $s, $matches)) {
    printf("Match found:\n%s", $matches[0]);
} else {
    echo('No match');
}
?>

这是输出:

Match found:
(PRN (, ,)
            (S
              (NP (NNP Gilbert)
                (CC and)
                (NNP Sullivan))
              (VP (VBZ 's)
                (ADJP (JJ first))))
            (, ,))

请注意,此解决方案要求所有组都具有正确平衡且匹配的打开和关闭括号。

答案 1 :(得分:1)

您是否只是要求一个与上述输出匹配的表达式?如果是这样,这有效:

$output = "(ROOT
(S
(NP (DT the) (NN author))
(VP (VBZ has)
  (VP (VBN failed)
    (S
      (VP (TO to)
        (VP
          (VP (VB catch)
            (PRT (RP up))
            (PP (IN with)
              (NP
                (NP (JJ recent) (NNS discoveries))
                (PP (IN about)
                  (NP (NNP Thespis))))))
          (PRN (, ,)
            (S
              (NP (NNP Gilbert)
                (CC and)
                (NNP Sullivan))
              (VP (VBZ 's)
                (ADJP (JJ first))))
            (, ,))
          (NP (`` `) (JJ lost) ('' ') (JJ joint) (NN work)))))))
(. .)))";

preg_match('/\(PRN.*,\)\)/s', $output, $match);
print_r($match[0]);

输出:

php reg.php
(PRN (, ,)
            (S
              (NP (NNP Gilbert)
                (CC and)
                (NNP Sullivan))
              (VP (VBZ 's)
                (ADJP (JJ first))))
            (, ,))