什么是特殊文本的最佳python正则表达式?

时间:2017-06-15 11:43:52

标签: python regex

嗨,我有一个日志文件,文件内容如下:

[ 06-15 14:07:48.377 15012:15012 D/ViewRootImpl ]
ViewPostImeInputStage processKey 0

[ 06-15 14:07:48.397  3539: 4649 D/AudioService ]
active stream is 0x8

[ 06-15 14:07:48.407  4277: 4293 D/vol.VolumeDialogControl.VC ]
isSafeVolumeDialogShowing : false

我想从日志文件中提取一些信息。预期格式如下:

[('06-15 14:07:48.377', '15012', 'D', 'ViewRootImpl', 'ViewPostImeInputStage processKey 0'),
('06-15 14:07:48.397', '3539', '4649', 'D', 'AudioService', 'active stream is 0x8'),
('06-15 14:07:48.407', '4277', '4293', 'D', 'vol.VolumeDialogControl.VC',  'isSafeVolumeDialogShowing : false')]

问题:提取预期格式信息的最佳python正则表达式是什么?非常感谢!

upate:我已尝试过以下代码

import re
regex = r"(\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}.\d{3})\s(\d+).*(\w{1})/(.*)\](.*)"
data = [g.groups() for g in re.finditer(regex, log, re.M | re.I)]

我得到的结果是

data=[('06-15 14:07:48.377', '15012', 'D', 'ViewRootImpl', '\r'), (
'06-15 14:07:48.397', '3539', 'D', 'AudioService', '\r'), ('06-15 14:07:48.407', 
'4277', 'D', 'vol.VolumeDialogControl.VC', '\r')]

我无法获得最后一个元素。

2 个答案:

答案 0 :(得分:2)

使用以下方法:

with open('yourlogfile', 'r') as log:
    lines = log.read()
    result = re.sub(r'^\[ (\S+) *(\S+) *(\d+): *(\d+) *([A-Z]+)\/(\S+) \]\n([^\n]+)\n?', 
                    r'\1 \2 \3 \4 \5 \6 \7', lines, flags=re.MULTILINE)

    print(result)

输出:

06-15 14:07:48.377 15012 15012 D ViewRootImpl ViewPostImeInputStage processKey 0
06-15 14:07:48.397 3539 4649 D AudioService active stream is 0x8
06-15 14:07:48.407 4277 4293 D vol.VolumeDialogControl.VC isSafeVolumeDialogShowing : false

要将结果作为匹配列表使用re.findall()函数:

...
result = re.findall(r'^\[ (\S+) *(\S+) *(\d+): *(\d+) *([A-Z]+)\/(\S+) \]\n([^\n]+)\n?', lines, flags=re.MULTILINE)
print(result)

输出:

[('06-15', '14:07:48.377', '15012', '15012', 'D', 'ViewRootImpl', 'ViewPostImeInputStage processKey 0'), ('06-15', '14:07:48.397', '3539', '4649', 'D', 'AudioService', 'active stream is 0x8'), ('06-15', '14:07:48.407', '4277', '4293', 'D', 'vol.VolumeDialogControl.VC', 'isSafeVolumeDialogShowing : false')]

答案 1 :(得分:1)

#!/usr/bin/python2
# -*- coding: utf-8 -*-

import re

input = """
[ 06-15 14:07:48.377 15012:15012 D/ViewRootImpl ]
ViewPostImeInputStage processKey 0

[ 06-15 14:07:48.397  3539: 4649 D/AudioService ]
active stream is 0x8

[ 06-15 14:07:48.407  4277: 4293 D/vol.VolumeDialogControl.VC ]
isSafeVolumeDialogShowing : false
"""

# remove carriage return
input = re.sub('(\])\s+', '\\1 ', input)

# replace D/Something ] -> D Something
input = re.sub('([A-Z]{1})/([^\s]+)\s+\]\s+', '\\1 \\2 ', input)

# remove first [
input = re.sub('\[\s+([0-9]{2}\-[0-9]{2})', '\\1', input)

print input

输出

06-15 14:07:48.377 15012:15012 D ViewRootImpl ViewPostImeInputStage processKey 0

06-15 14:07:48.397  3539: 4649 D AudioService active stream is 0x8

06-15 14:07:48.407  4277: 4293 D vol.VolumeDialogControl.VC isSafeVolumeDialogShowing : false