找到一个字符串以及其他一些单词

时间:2014-11-13 17:51:10

标签: python regex

我有一个字符串(电子邮件),我需要搜索并找到单词“停机时间”,该单词后面的8个字符和从:到:搜索项之前的时间。 例如,

mystring="""
AB\r\n\r\n--_=_swift_v4_13613629825124c026192e8_=_\r\nContent-Type: multipart/related;\r\n  oundary="_=_swift_v4_13613629825124c02620826_=_"\r\n\r\n--_ =_swift_v4_13613629825124c02620826_=_\r\n
From: 2013-01-11 04:26:07, To: 2013-01-11 05:56:08, Downtime: 1h 30m 01s\r\n\r\n
some more text here From: 2013-01-29 04:51:07, To: 2013-01-29 05:41:07, Downtime: 0h 50m 00s\r\n\r\n\r\n\r\n\r\n This is a scheduled report from If you wish to no longer receive t=\r\nhis report you can unsubscribe by logging in to and u=\r\npdate your email report settings.\r\nCopyright: 2013 
"""

预期结果:

From: 2013-01-11 04:26:07, To: 2013-01-11 05:56:08, Downtime: 1h 30m 01s
From: 2013-01-29 04:51:07, To: 2013-01-29 05:41:07, Downtime: 0h 50m 00s

3 个答案:

答案 0 :(得分:1)

您可以使用

形式的正则表达式
From:\s*[^,]+,\s*To:\s[^,]+,\s*Downtime:[\w ]+

测试

>>> import re
>>> re.findall(r'From:\s*[^,]+,\s*To:\s[^,]+,\s*Downtime:[\w ]+',  mystring)
['From: 2013-01-11 04:26:07, To: 2013-01-11 05:56:08, Downtime: 1h 30m 01s', 'From: 2013-01-29 04:51:07, To: 2013-01-29 05:41:07, Downtime: 0h 50m 00s']

答案 1 :(得分:1)

虽然nu11p01n73R的答案有效(我认为,我自己并没有看过正则表达式),但你可以很简单地使用字符串操作。

mystring="""AB\r\n\r\n--_=_swift_v4_13613629825124c026192e8_=_\r\nContent-Type: 
multipart/related;\r\n  oundary="_=_swift_v4_13613629825124c02620826_=_"\r\n\r\n--_ 
=_swift_v4_13613629825124c02620826_=_\r\n
From: 2013-01-11 04:26:07, To: 2013-01-11 05:56:08, Downtime: 1h 30m 01s\r\n\r\n
some more text here From: 2013-01-29 04:51:07, To: 2013-01-29 05:41:07, Downtime: 0h 50m 
00s\r\n\r\n\r\n\r\n\r\n This is a scheduled report from If you wish to no longer receive 
t=\r\nhis report you can unsubscribe by logging in to and u=\r\npdate your email report 
settings.\r\nCopyright: 2013 
"""  #imported from where ever and however

from_loc = mystring.find("From: ")
dtime_right = mystring.find("\r\n",from_loc) #find the end of the line after downtime
msg = mystring[from_loc:dtime_right] #string splicing

>>>打印消息

来自:2013-01-11 04:26:07,收件人:2013-01-11 05:56:08,停机时间:1小时30分01秒

注意:如果您想出于某种原因保存在线上,可以将其压缩为1行:

 msg = mystring[mystring.find("From: "):dtime_right = mystring.find("\r\n",from_loc = mystring.find("From: "))]

真的凌乱,我不推荐它,但选项就在那里:P

答案 2 :(得分:0)

试试这个

 import re
    p = re.compile(ur'from:\s*([0-9\-\s:]+),\s*to:([0-9\-\s:]+),\s*downtime:\s*([0-9\shms]+)', re.MULTILINE | re.IGNORECASE)
    test_str = u"AB\r\n\r\n--_=_swift_v4_13613629825124c026192e8_=_\r\nContent-Type: multipart/related;\r\n oundary=\"_=_swift_v4_13613629825124c02620826_=_\"\r\n\r\n--_ =_swift_v4_13613629825124c02620826_=_\r\n\nFrom: 2013-01-11 04:26:07, To: 2013-01-11 05:56:08, Downtime: 1h 30m 01s\r\n\r\n\nsome more text here From: 2013-01-29 04:51:07, To: 2013-01-29 05:41:07, Downtime: 0h 50m 00s\r\n\r\n\r\n\r\n\r\n This is a scheduled report from If you wish to no longer receive t=\r\nhis report you can unsubscribe by logging in to and u=\r\npdate your email report settings.\r\nCopyright: 2013 \n"

    re.findall(p, test_str)

live demo