Question

我使用正则表达式捕获文件中的文本，但是字符串包含错误的数字。我没有捕获它，但是在尝试捕获下一行时，它仅返回字符串，而不返回下一行。没有尾随错误的数字时，我可以捕获它。

我已经尝试过很多正则表达式的组合，但都没有成功。

文本：

require 'ruport/util'

成功捕获正则表达式但带有数字的代码：

sentences
company_name: company, ltd6

numbers 99 and letters 99 (I want to match anything here and nothing after)
numbers 99 and letters 99 (I don't want to match anything here or after)

成功捕获没有数字的正则表达式的代码：

company_name = re.findall(r"company_name:\s(.*)\D.+", text)

尝试捕获以下行：

company_name = re.findall(r"company_name:\s(.*)(?=.\D.+)", text)

我希望捕获下一行，但是不会。

Answer 1

这将仅获得下一行，而忽略后续行：

next_line = re.sub(r".*company_name:[^\n]+\n*([^\n]+).*", r'\1', text, flags=re.S)

即：numbers 99 and letters 99 (I want to match anything here and nothing after)

Answer 2

根据您的原始表情，我猜可能是这个表情，

.*company_name:\s*(.*\D)\s*(\w.*)

可能会工作。我们有两组(.*\D)和(\w.*)，其中捕获了我们想要的输出。

Demo 1

或者也许这一个：

.*company_name:\s*(.*)\s*(\w.*)

Demo 2

测试

import re

regex = r".*company_name:\s*(.*\D)\s*(\w.*)"

test_str = ("sentences\n"
    "company_name: company, ltd6\n\n"
    "numbers 99 and letters 99 (I want to match anything here)")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

python regex匹配行，在字符串后包含数字，字符串末尾有数字

2 个答案:

Demo 1

Demo 2

测试