Question

我需要解析日志文件中的一组字符串：

    timestamp - user not found : user1
    timestamp - exception in xyz.security.plugin: global error : low memory

我想在“ - ”和最后一个“：”之间捕获文本。

目前我正在使用r' - （。*？）\ n'来捕获字符串直到EOL。请记住，字符串中可能有超过2个冒号。我需要捕获直到EOL之前使用的最后一个冒号。此外，如果字符串中没有“：”冒号，则应将EOL作为结束序列。

感谢。

编辑：更好的例子;

    2011-07-29 07:29:44,112 [TP-Processor10] ERROR springsecurity.GrailsDaoImpl  - User not found: sspm
    2011-07-29 09:01:05,850 [TP-Processor3] ERROR transaction.JDBCTransaction  - JDBC commit failed
    2011-07-29 08:32:00,353 [TP-Processor1] ERROR errors.GrailsExceptionResolver  - Exception occurred when processing request: [POST] /webapp/user/index - parameters: runtime exception

Answer 1

import re

for line in open('logfile.log'):
    match = re.search(r'-(.*):', line)
    if match:
        print match.group(1)
    else:
        match = re.search(r'-(.*)', line)
        if match:
            print match.group(1)
        else:
            print 'No match in line', line.strip()

Answer 2

试试这个：

"(?<=-).*(?=:[^:]*$)"

它匹配当前行中的-和最后:。如果没有冒号，则根本不匹配，因此您可以这样做：

r = re.compile("(?<=-).*(?=:[^:]*$)")
result = r.search(mystring) 
if result:
    match = result.group(0)
else:
    match = "\n"

这就是你所说的（“如果没有冒号，匹配EOL”），如果意味着“如果没有冒号，匹配直到 EOL“，然后单个正则表达式会：

r = re.compile("(?<=-)(?:[^:]*$|.*(?=:[^:]*$))")

Answer 3

r'^.+ -(.+):.*$'为我做了诀窍。

这是有效的，因为(.+)是贪婪的。查看re here的Python文档 - 特别是*，+和?。

python regex - EOL之前的最后一次出现

3 个答案: