正则表达式模式删除括号(及其中的任何括号)

时间:2017-10-18 15:53:16

标签: php regex preg-replace

输入是维基百科页面的第一段。我想删除括号和括号之间的任何内容。

但是,有时(通常),括号内的HTML内容本身包含一个或多个括号,通常位于链接的href=""中。

采取以下措施:

<p>
    The <b>Sarcopterygii</b> or <b>lobe-finned fish</b> (from Greek σαρξ <i>sarx</i>, flesh, and πτερυξ <i>pteryx</i>, fin) – sometimes considered synonymous with <b>Crossopterygii</b> ("fringe-finned fish", from Greek κροσσός <i>krossos</i>, fringe) – constitute a <a href="/wiki/Clade" title="Clade">clade</a> (traditionally a <a href="/wiki/Class_(biology)" title="Class (biology)">class</a> or subclass) of the <a href="/wiki/Osteichthyes" title="Osteichthyes">bony fish</a>, though a strict <a href="/wiki/Cladistic" class="mw-redirect" title="Cladistic">cladistic</a> view includes the terrestrial <a href="/wiki/Vertebrate" title="Vertebrate">vertebrates</a>.
</p>

我希望最终结果是:

<p>
    The <b>Sarcopterygii</b> or <b>lobe-finned fish</b> – sometimes considered synonymous with <b>Crossopterygii</b> – constitute a <a href="/wiki/Clade" title="Clade">clade</a> of the <a href="/wiki/Osteichthyes" title="Osteichthyes">bony fish</a>, though a strict <a href="/wiki/Cladistic" class="mw-redirect" title="Cladistic">cladistic</a> view includes the terrestrial <a href="/wiki/Vertebrate" title="Vertebrate">vertebrates</a>.
</p>

但是当我使用下面的preg_replace模式时,它不起作用,它会被括号内的括号弄糊涂。

public function removeParentheses( $content ) {

    $pattern = '@\(.*?\)@';
    $content = preg_replace( $pattern, '', $content );
    $content = str_replace( ' .', '.', $content );
    $content = str_replace( '  ', ' ', $content );
    return $content;
}

其次,如何将括号放在链接'href=""title=""中?如果不在文本括号内,这些很重要。

1 个答案:

答案 0 :(得分:2)

您可以使用占位符替换所有链接,然后删除所有括号,最后将占位符替换回其原始值。

这是通过preg_replace_callback()完成的,传递一个出现计数器和一个替换数组来跟踪链接,然后使用你自己的removeParentheses()去除括号,最后使用{{3使用str_replace()array_keys()来获取您的链接:

<?php
$string = '<p>
The <b>Sarcopterygii</b> or <b>lobe-finned fish</b> (from Greek σαρξ <i>sarx</i>, flesh, and πτερυξ <i>pteryx</i>, fin) – sometimes considered synonymous with <b>Crossopterygii</b> ("fringe-finned fish", from Greek κροσσός <i>krossos</i>, fringe) – constitute a <a href="/wiki/Clade" title="Clade">clade</a> (traditionally a <a href="/wiki/Class_(biology)" title="Class (biology)">class</a> or subclass) of the <a href="/wiki/Osteichthyes" title="Osteichthyes">bony fish</a>, though a strict <a href="/wiki/Cladistic" class="mw-redirect" title="Cladistic">cladistic</a> view includes the terrestrial <a href="/wiki/Vertebrate" title="Vertebrate">vertebrates</a>.
</p>';
$occurrences = 0;
$replacements = [];
$replacedString = preg_replace_callback("/<a .*?>.*?<\/a>/i", function($el) use (&$occurrences, &$replacements) {
    $replacements["|||".$occurrences] = $el[0]; // the ||| are just to avoid unwanted matches
    return "|||".$occurrences++;
}, $string);
function removeParentheses( $content ) {
    $pattern = '@\(.*?\)@';
    $content = preg_replace( $pattern, '', $content );
    $content = str_replace( ' .', '.', $content );
    $content = str_replace( '  ', ' ', $content );
    return $content;
}
$replacedString = removeParentheses($replacedString);
$replacedString = str_replace(array_keys($replacements), array_values($replacements), $replacedString); // get your links back
echo $replacedString;

array_values()

<强>结果

<p>
The <b>Sarcopterygii</b> or <b>lobe-finned fish</b> – sometimes considered synonymous with <b>Crossopterygii</b> – constitute a <a href="/wiki/Clade" title="Clade">clade</a> of the <a href="/wiki/Osteichthyes" title="Osteichthyes">bony fish</a>, though a strict <a href="/wiki/Cladistic" class="mw-redirect" title="Cladistic">cladistic</a> view includes the terrestrial <a href="/wiki/Vertebrate" title="Vertebrate">vertebrates</a>.
</p>

然而,在我看来,这有点脆弱。正如其他人在评论中告诉你的那样,你Demo lot 可以更改,您可以获得意外结果。这可能会让你朝着正确的方向前进。

关于括号内括号的

编辑,您可以使用递归模式。看看shouldn't parse HTML with regular expressions

function removeParentheses( $content ) {
    $pattern = '@\(([^()]|(?R))*\)@';
    $content = preg_replace( $pattern, '', $content );
    $content = str_replace( ' .', '.', $content );
    $content = str_replace( '  ', ' ', $content );
    return $content;
}

this great answer by Bart Kiers