正则表达式

时间:2018-01-22 14:26:10

标签: python regex

我正在处理的脚本当前在文件中执行三个正则表达式搜索;考虑以下内容作为输入:

2018-01-22 04.02.03: Wurk: 98745061 (12345678)
 Replies (pos: 2) are missing/not sent on assignment: Asdf (55461)

2018-01-22 04.02.03: Wurk: 98885612 (87654321)
 Gorp: 98885612 is not registered for arrival!
 Brork: 98885612 is not registered for arrival!

2018-01-22 04.02.08: Wurk: 88855521 (885052)
 Blam: 12365479 is not registered for arrival!
 Fork: 56564123 is not registered for arrival!

2018-01-22 04.02.08: Wurk: A0885521 (885052)
 Blam: 12365479 is not registered for arrival!
 Fork: 56564123 is not registered for arrival!

其中每个正则表达式根据行的日期以及Wurk:之后的第一个数字查找文件中的行,并在Wurk之后收集八个数字/字符:。

import time, glob, re
logpath = glob.glob('path\\to\\log*.log')[0]
readfile = open(logpath, "r")
daysdate = time.strftime("%Y-%m-%d")
nine = []
eight = []
seven = []
no_match = []
for line in readfile:
    for match in re.finditer(daysdate + r'.*Wurk: (9.{7})', line):
        nine.append(match.group(1))
    for match in re.finditer(daysdate + r'.*Wurk: (8.{7})', line):
        eight.append(match.group(1))
    for match in re.finditer(daysdate + r'.*Wurk: (7.{7})', line):
        seven.append(match.group(1))
print("\nNine:\n%s\n" % ",\n".join(map(str, nine)) +
   "\nEight:\n%s\n" % ",\n".join(map(str, eight)) +
   "\nSeven:\n%s\n" % ",\n".join(map(str, seven)) +
   "\nNo matches found:\n%s\n" % ",\n".join(map(str, no_match)))

目前提供的输出为:

Nine:
98745061,
98885612

Eight:
88855521

Seven:

No matches found:

现在,手头的问题是弄清楚如何制作一个与Wurk:之后的八个数字/字符相匹配的正则表达式,它们在之前的任何正则表达式中都不匹配。因此,新输出应为:

Nine:
98745061,
98885612

Eight:
88855521

Seven:

No matches found:
A0885521

TL; DR

如何匹配与先前正则表达式的条件不匹配的正则表达式?

1 个答案:

答案 0 :(得分:2)

正则表达式不打算对数据进行分组;它旨在找到数据。使用正则表达式提取值,然后使用代码对它们进行分组:

seven, eight, nine, no_match = [], [], [], []

wurk_map = {'7': seven,
            '8': eight,
            '9': nine}

wurks = re.findall(r'(?<=Wurk: ).{8}', text)
for wurk in wurks:
    wurk_map.get(wurk[0], no_match).append(wurk)

print(seven)     # []
print(eight)     # ['88855521']
print(nine)      # ['98745061', '98885612']
print(no_match)  # ['A0885521']