Question

我正在尝试在日志文件的每一行中搜索特定字符串，如果匹配，我需要能够从该特定错误中获取主机信息。

考虑以下日志条目：

05-05-2014 00:02:02,771 [HttpProxyServer-thread-1314] ERROR fd - Empty user name specified in NTLM authentication. Prompting for auth again.
Host=tools.google.com, Port=80, Client ip=/10.253.168.128, port=37271, User-Agent: Google Update/1.3.23.9;winhttp;cup-ecdsa
05-05-2014 00:02:02,771 [HttpProxyServer-thread-2156] ERROR fd - Empty user name specified in NTLM authentication. Prompting for auth again.
Host=tools.google.com, Port=80, Client ip=/10.253.168.148, port=37273, User-Agent: Google Update/1.3.23.9;winhttp;cup-ecdsa
05-05-2014 00:02:02,802 [HttpProxyServer-thread-604] ERROR fd - Empty user name specified in NTLM authentication. Prompting for auth again.
Host=tools.google.com, Port=80, Client ip=/10.253.168.92, port=37280, User-Agent: Google Update/1.3.23.9;winhttp;cup

这是我的代码：

for line in log_file:

   if bool(re.search( r'Empty user name specified in NTLM authentication. Prompting for auth again.', line)):

   host = re.search(r'Host=(\D+.\D+.\D+,)', line).group(1)

问题是主机信息与错误不在同一行。它在下一行。我如何获得re.search（r＆＃39; Host =（\ D +。\ D +。\ D +，）＆＃39;，line）.group（1）在下一行搜索＆＃34; line＆＃34;目前在？

Answer 1

只需插入

即可

line = next(log_file)

在for循环中您目前拥有的两个陈述之间的

。

Answer 2

编写一个匹配2个连续行的正则表达式，您可以从中提取每个行的主机信息，并循环匹配而不是逐行读取文件，或者添加一个在行匹配时设置的标志错误，如果为给定行设置了该标志，则提取主机信息＆amp;重置标志而不是测试错误。

Answer 3

试试这个：

>>> import re
>>> fp = open('log_file')
>>> line = fp.readline()
>>> while line:
...    if 'Empty user name specified in NTLM authentication. Prompting for auth again.' in line:
...        host = re.search(r'Host=(\D+.\D+.\D+,)', fp.readline()).group(1)
...        #                                        ^^^^^^^^^^^^^^  
...        #                              this makes re search in the next line 
...        print host
...    line = fp.readline()
... 
tools.google.com,
tools.google.com,
tools.google.com,

Python：正则表达式搜索文件，下一行正则表达式

3 个答案: