Python分析日志文件与正则表达式

时间:2013-10-15 09:05:25

标签: python regex logging logfile

我必须分析发送日志文件的电子邮件(获取邮件ID的SMTP回复),如下所示:

Nov 12 17:26:57 zeus postfix/smtpd[23992]: E859950021DB1: client=pegasus.os[172.20.19.62]
Nov 12 17:26:57 zeus postfix/cleanup[23995]: E859950021DB1: message-id=a92de331-9242-4d2a-8f0e-9418eb7c0123
Nov 12 17:26:58 zeus postfix/qmgr[22359]: E859950021DB1: from=<system@directoperation.de>, size=114324, nrcpt=1 (queue active)
Nov 12 17:26:58 zeus postfix/smtp[24007]: certificate verification failed for mx.elutopia.it[62.149.128.160]:25: untrusted issuer /C=US/O=RTFM, Inc./OU=Widgets Division/CN=Test CA20010517
Nov 12 17:26:58 zeus postfix/smtp[24007]: E859950021DB1: to=<mike@elutopia.it>, relay=mx.elutopia.it[62.149.128.160]:25, delay=0.89, delays=0.09/0/0.3/0.5, dsn=2.0.0, status=sent (250 2.0.0 d3Sx1m03q0ps1bK013Sxg4 mail accepted for delivery)
Nov 12 17:26:58 zeus postfix/qmgr[22359]: E859950021DB1: removed
Nov 12 17:27:00 zeus postfix/smtpd[23980]: connect from pegasus.os[172.20.19.62]
Nov 12 17:27:00 zeus postfix/smtpd[23980]: setting up TLS connection from pegasus.os[172.20.19.62]
Nov 12 17:27:00 zeus postfix/smtpd[23980]: Anonymous TLS connection established from pegasus.os[172.20.19.62]: TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)
Nov 12 17:27:00 zeus postfix/smtpd[23992]: disconnect from pegasus.os[172.20.19.62]
Nov 12 17:27:00 zeus postfix/smtpd[23980]: 2C04150101DB2: client=pegasus.os[172.20.19.62]
Nov 12 17:27:00 zeus postfix/cleanup[23994]: 2C04150101DB2: message-id=21e2f9d3-154a-3683-85d3-a7c52d429386
Nov 12 17:27:00 zeus postfix/qmgr[22359]: 2C04150101DB2: from=<system@directoperation.de>, size=53237, nrcpt=1 (queue active)
Nov 12 17:27:00 zeus postfix/smtp[24006]: ABE7C50001D62: to=<info@elvictoria.it>, relay=relay3.telnew.it[195.36.1.102]:25, delay=4.9, delays=0.1/0/4/0.76, dsn=2.0.0, status=sent (250 2.0.0 r9EFQt0J009467 Message accepted for delivery)
Nov 12 17:27:00 zeus postfix/qmgr[22359]: ABE7C50001D62: removed
Nov 12 17:27:00 zeus postfix/smtp[23998]: 2C04150101DB2: to=<peter@elgravo.ch>, relay=liberomx2.elgravo.ch[212.52.84.93]:25, delay=0.72, delays=0.07/0/0.3/0.35, dsn=2.0.0, status=sent (250 ok:  Message 2040264602 accepted)
Nov 12 17:27:00 zeus postfix/qmgr[22359]: 2C04150101DB2: removed

目前,我从数据库中获取了一个message-id(uuid)(例如a92de331-9242-4d2a-8f0e-9418eb7c0123),然后通过日志文件运行我的代码:

log_id = re.search (']: (.+?): message-id='+message_id, text).group(1)
sent_status = (re.search (']: '+log_id+'.*dsn=(.....)', text)

使用message-id我找到log_id,使用log_id我可以找到SMTP回复答案。

这样可以正常工作,但更好的方法是,如果软件通过日志文件,则获取message-id和回复代码,然后更新数据库。但我不确定,我该怎么做?此脚本必须每隔约2分钟运行一次并检查更新日志文件。那么我怎么能保证,它会记住它的位置并且没有两次获得消息ID? 提前致谢

1 个答案:

答案 0 :(得分:0)

使用字典存储消息ID,使用单独的文件存储您在日志文件中最后一次停止的字节编号。

msgIDs = {}
# get where you left off in the logfile during the last read:
try:
    with open('logfile_placemarker.txt', 'r') as f:
        lastRead = int(f.read())
except IOError:
    print("Can't find/read place marker file!  Starting at 0")
    lastRead = 0

with open('logfile.log', 'r') as f:
    f.seek(lastRead)
    for line in f:
        # ...
        # Pick out msgIDs and response codes
        # ...
        if msgID in msgIDs:
            print("uh oh, found the same msg id twice!!")
        msgIDs[msgID] = responseCode
    lastRead = f.tell()

# Do whatever you need to do with the msgIDs you found:
updateDB(msgIDs)
# Store lastRead (where you left off in the logfile) in a file if you need to so it persists in the next run
with open('logfile_placemarker.txt', 'w') as f:
    f.write(str(lastRead))