Question

我的字符串如下：

[abc]
line_one xxxxxxxxxxxxxx
line_two xxxxxxxxxxxxxx
[pqr]
line_four xxxxxxxxxxxxxx
line_five xxxxxxxxxxxxxx
[xyz]
line_six  xxxxxxxxxxxxxx
line_seven  xxxxxxxxxxxxxx

我正在尝试逐段获取这些行。尝试下面的正则表达式，但没有运气。

result = re.compile(r'(\[.+\])')
details = result.findall(string)

我正在获取部分名称，然后我尝试了：

result = re.compile(r'(\[.+\]((\n)(.+))+)')

有什么建议吗？

Answer 1

(\[[^\]]*\][^\[]+)(?:\s|$)

试试这个。看看演示。这将为你提供明智的线条。

http://regex101.com/r/mP1wO4/1

import re
p = re.compile(ur'(\[[^\]]*\][^\[]+)(?:\s|$)')
test_str = u"[abc]\nline_one xxxxxxxxxxxxxx\nline_two xxxxxxxxxxxxxx\n[pqr]\nline_four xxxxxxxxxxxxxx\nline_five xxxxxxxxxxxxxx\n[xyz]\nline_six xxxxxxxxxxxxxx\nline_seven xxxxxxxxxxxxxx"

re.findall(p, test_str)

Answer 2

使用re.findall功能。您需要在正面预测中包含\n，以便它不会出现在[]块之前的换行符。

>>> m = re.findall(r'(?s)(?:^|\n)(\[[^\]]*\].*?)(?=\n\[[^\]]*\]|$)', s)
>>> m
['[abc]\nline_one xxxxxxxxxxxxxx\nline_two xxxxxxxxxxxxxx', '[pqr]\nline_four xxxxxxxxxxxxxx\nline_five xxxxxxxxxxxxxx', '[xyz]\nline_six  xxxxxxxxxxxxxx\nline_seven  xxxxxxxxxxxxxx']
>>> for i in m:
    print(i)


[abc]
line_one xxxxxxxxxxxxxx
line_two xxxxxxxxxxxxxx
[pqr]
line_four xxxxxxxxxxxxxx
line_five xxxxxxxxxxxxxx
[xyz]
line_six  xxxxxxxxxxxxxx
line_seven  xxxxxxxxxxxxxx

Answer 3

分裂：

re.split(r'\n*(?=\[)', s)

或

re.split(r'(?m)\n*^(?=\[)', s)

正则表达式来提取部分

3 个答案: