python regex匹配行,在字符串后包含数字,字符串末尾有数字

时间:2019-06-28 00:54:35

标签: python regex python-3.x

我使用正则表达式捕获文件中的文本,但是字符串包含错误的数字。我没有捕获它,但是在尝试捕获下一行时,它仅返回字符串,而不返回下一行。没有尾随错误的数字时,我可以捕获它。

我已经尝试过很多正则表达式的组合,但都没有成功。

文本:

require 'ruport/util'

成功捕获正则表达式但带有数字的代码:

sentences
company_name: company, ltd6

numbers 99 and letters 99 (I want to match anything here and nothing after)
numbers 99 and letters 99 (I don't want to match anything here or after)

成功捕获没有数字的正则表达式的代码:

company_name = re.findall(r"company_name:\s(.*)\D.+", text)

尝试捕获以下行:

company_name = re.findall(r"company_name:\s(.*)(?=.\D.+)", text)

我希望捕获下一行,但是不会。

2 个答案:

答案 0 :(得分:0)

这将仅获得下一行,而忽略后续行:

next_line = re.sub(r".*company_name:[^\n]+\n*([^\n]+).*", r'\1', text, flags=re.S)

即:numbers 99 and letters 99 (I want to match anything here and nothing after)

答案 1 :(得分:0)

根据您的原始表情,我猜可能是这个表情,

.*company_name:\s*(.*\D)\s*(\w.*)

可能会工作。我们有两组(.*\D)(\w.*),其中捕获了我们想要的输出。

Demo 1

或者也许这一个:

.*company_name:\s*(.*)\s*(\w.*)

Demo 2

测试

import re

regex = r".*company_name:\s*(.*\D)\s*(\w.*)"

test_str = ("sentences\n"
    "company_name: company, ltd6\n\n"
    "numbers 99 and letters 99 (I want to match anything here)")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))