Question

我有一个像这样的字符串：

'\n479 Appendix I\n1114\nAppendix I 481\n'

并希望使用正则表达式查找并返回

['479 Appendix I', 'Appendix I 481']

我首先尝试了这个表达式：

pattern = r'''
(?: \d+ \s)? Appendix \s+ \w+ (?: \s \d+)?
'''

regex = re.compile(pattern, re.VERBOSE)

regex.findall(s)

但这会返回

['479 Appendix I\n1114', 'Appendix I 481']

因为\s也匹配\n。遵循本文Python regex match space only的答案之一，我尝试了以下方法：

pattern = r'''
(?: \d+ [^ \S\t\n])? Appendix \s+ \w+ (?: [^ \S\t\n] \d+)?
'''

regex = re.compile(pattern, re.VERBOSE)

regex.findall(s)

但是没有返回期望的结果，给出：

['Appendix I', 'Appendix I']

在这种情况下，哪种表达方式有效？

Answer 1

import re

s = '\n479 Appendix I\n1114\nAppendix I 481\n'

for g in re.findall(r'^.*[^\d\n].*$', s, flags=re.M):
    print(g)

打印：

479 Appendix I
Appendix I 481

此正则表达式将匹配包含至少一个与数字或换行符不同的字符的所有行。 this regex here的说明。

Answer 2

此正则表达式比另一个答案中的正则表达式要强一些，因为它明确地锚定在“附录”上：

pattern = '(?:\d*[\t ]+)?Appendix\s+\w+(?:[\t ]+\d*)?'
re.findall(pattern, s)
#['479 Appendix I', 'Appendix I 481']