Python:正则表达式搜索文件,下一行正则表达式

时间:2014-12-24 07:12:08

标签: python regex

我正在尝试在日志文件的每一行中搜索特定字符串,如果匹配,我需要能够从该特定错误中获取主机信息。

考虑以下日志条目:

05-05-2014 00:02:02,771 [HttpProxyServer-thread-1314] ERROR fd - Empty user name specified in NTLM authentication. Prompting for auth again.
Host=tools.google.com, Port=80, Client ip=/10.253.168.128, port=37271, User-Agent: Google Update/1.3.23.9;winhttp;cup-ecdsa
05-05-2014 00:02:02,771 [HttpProxyServer-thread-2156] ERROR fd - Empty user name specified in NTLM authentication. Prompting for auth again.
Host=tools.google.com, Port=80, Client ip=/10.253.168.148, port=37273, User-Agent: Google Update/1.3.23.9;winhttp;cup-ecdsa
05-05-2014 00:02:02,802 [HttpProxyServer-thread-604] ERROR fd - Empty user name specified in NTLM authentication. Prompting for auth again.
Host=tools.google.com, Port=80, Client ip=/10.253.168.92, port=37280, User-Agent: Google Update/1.3.23.9;winhttp;cup

这是我的代码:

for line in log_file:

   if bool(re.search( r'Empty user name specified in NTLM authentication. Prompting for auth again.', line)):

   host = re.search(r'Host=(\D+.\D+.\D+,)', line).group(1)

问题是主机信息与错误不在同一行。它在下一行。我如何获得re.search(r' Host =(\ D +。\ D +。\ D +,)',line).group(1)在下一行搜索" line& #34;目前在?

3 个答案:

答案 0 :(得分:2)

只需插入

即可
line = next(log_file)
for循环中您目前拥有的两个陈述之间的

答案 1 :(得分:0)

编写一个匹配2个连续行的正则表达式,您可以从中提取每个行的主机信息,并循环匹配而不是逐行读取文件,或者添加一个在行匹配时设置的标志错误,如果为给定行设置了该标志,则提取主机信息&重置标志而不是测试错误。

答案 2 :(得分:0)

试试这个:

>>> import re
>>> fp = open('log_file')
>>> line = fp.readline()
>>> while line:
...    if 'Empty user name specified in NTLM authentication. Prompting for auth again.' in line:
...        host = re.search(r'Host=(\D+.\D+.\D+,)', fp.readline()).group(1)
...        #                                        ^^^^^^^^^^^^^^  
...        #                              this makes re search in the next line 
...        print host
...    line = fp.readline()
... 
tools.google.com,
tools.google.com,
tools.google.com,