Question

我想捕获任何后缀为换行符的HTML结束标记，并仅用HTML标记替换它们。

例如我想转此：

<ul>\n
    <li>element</li>\n
</ul>\n\n
<br/>\n\n
Some text\n

进入这个：

<ul>
    <li>element</li>
</ul>\n
<br/>\n
Some text\n

问题是我无法用正则表达式捕获\n个字符：

preg_match_all('/(<\/[a-zA-Z]*>|<[a-zA-Z]*\/>)\n/s', $in, $matches);

一旦我将\ n放置在我的模式中的某个位置，匹配数组将返回空值。

有趣的是，如果我尝试仅匹配\n个字符，它会找到所有这些字符：

preg_match_all('/\n/s', $in, $matches);

Answer 1

尝试：

preg_match_all('/(<\/[a-zA-Z]*>|<[a-zA-Z]*\/>)\\n/s', $in, $matches);

你必须逃避＆＃34; \＆＃34;字符。

Answer 2

您可以使用以下内容：

(<[^>]+>)$\R{2}
# capture anything between a pair of < and > at the end of the line
# followed by two newline characters

您需要使用multiline模式，请参阅a demo on regex101.com 在PHP中，这将是：

$regex = '~(<[^>]+>)$\R{2}~m';
$string = preg_replace($regex, "$1", $your_string_here);

通常，DomDocument解析器提供了保留或丢弃空格的可能性，因此您可能更好地使用它。