Question

我有以下段落：

This is paragraph #1
New-York, London, Paris, Berlin
Some other text
End of paragraph

This is paragraph #2
London, Paris
End of paragraph

This is paragraph #3
New-York, Paris, Berlin
Some other text
End of paragraph

This is paragraph #4
End of paragraph

This is paragraph #5
Paris, Berlin
Some other text
End of paragraph

我如何使用正则表达式匹配包含例如纽约（＃1和＃3）或伦敦（＃1，＃2）？甚至是纽约和柏林（＃1，＃3）？

我在S.O.找到了答案。

How match a paragraph using regex

允许我匹配段落（两个空白行之间的所有文本）。

但是我无法想象（我的正则表达式技能是......有限的）如何匹配包含特定模式的段落，只有那些段落。

提前感谢您的帮助

注意：我的想法是使用编辑IOS应用程序中的答案来折叠不包含该模式的答案。

Answer 1

如果您打算在编辑iOS应用程序中使用该模式，我发现您可能无法访问Python代码。

然后，我所能建议的是像

这样的模式

(?m)^(?=.*(?:\r?\n(?!\r?\n).*)*?\bNew-York\b)(?=.*(?:\r?\n(?!\r?\n).*)*?\bBerlin\b).*(?:\r?\n(?!\r?\n).*)*

请参阅regex demo。基本上，我们只匹配行的开头（^和(?m)修饰符），我们检查是否有New-York和Berlin作为整个单词（由于{ {1}}字边界）在第一个双折线之前的行上的任何位置，如果存在，则匹配这些行。

<强>详情

\b - 行的开头
(?m)^ - 一个积极的预测，确保在除了换行符（(?=.*(?:\r?\n(?!\r?\n).*)*?\bNew-York\b)）之外的0 +字符之后的任何地方都有一个完整的单词New-York，可选地跟随0 +连续序列CRLF / LF换行符后面没有其他CRLF / LF换行符，其次是行
.* - 除了换行符（(?=.*(?:\r?\n(?!\r?\n).*)*?\bBerlin\b)）之外的0 +字符之后的任何地方的整个字Berlin，可选地跟随0 +连续的CRLF / LF换行序列，而不是另一个CRLF / LF换行符后跟其余行
.* - 匹配
.* - 匹配连续0次以上：
- (?:\r?\n(?!\r?\n).*)* - 换行符（CRLF或LF）未跟随另一个CRLF或LF
- \r?\n(?!\r?\n) - 其余部分。

Answer 2

使用支持空分割的newer regex module：

import regex as re

string = """
This is paragraph #1
New-York, London, Paris, Berlin
Some other text
End of paragraph

This is paragraph #2
London, Paris
End of paragraph

This is paragraph #3
New-York, Paris, Berlin
Some other text
End of paragraph

This is paragraph #4
End of paragraph

This is paragraph #5
Paris, Berlin
Some other text
End of paragraph
"""

rx = re.compile(r'^$', flags = re.MULTILINE | re.VERSION1)

needle = 'New-York'

interesting = [part 
    for part in rx.split(string)
    if needle in part]

print(interesting)
# ['\nThis is paragraph #1\nNew-York, London, Paris, Berlin\nSome other text\nEnd of paragraph\n', '\nThis is paragraph #3\nNew-York, Paris, Berlin\nSome other text\nEnd of paragraph\n']

Answer 3

我认为你的具体案例根本不需要正则表达式：

[i for i,p in enumerate(mystr.split('\n\n')) if 'New-York' in p or 'London' in p]

在您的情况下导致：

[0, 1, 2]

显然，and条件同样容易，或者否定if。仅当您需要段落索引时才使用enumerate。如果你想要段落本身，你不需要它。无论如何，无需强制regex。

如何匹配包含特定模式的段落与正则表达式？

3 个答案: