Question

问题：我有以下示例字符串：

ex1 = "00:03:34 hello!! this is example number 1 00:04:00"
ex2 = "00:07:08 Hi I am example number 2"

我希望它分组如下（输出）：

ex1 out : ("00:03:34", "hello!! this is example number 1", "00:04:00")
ex2 out : ("00:07:08", "Hi I am example number 2", None)

尝试：

我试过重新分裂：

time_pat = r"(\d{2}:\d{2}:\d{2})"
re.split(time_pat, ex1)
re.split(time_pat, ex2)

它给了我以下输出：

ex1 out : ['', '00:03:34', ' hello!! this is example number 1 ', '00:04:00', '']
ex2 out : ['', '00:07:08', ' Hi I am example number 2']

我将使用过滤器消除空白，然后输出看起来像

ex1 out : ['00:03:34', ' hello!! this is example number 1 ', '00:04:00']
ex2 out : ['00:07:08', ' Hi I am example number 2']

这里的问题是ex2输出的长度为2而不是3，第3个elemet为None。我知道如果长度是2，我可以追加无但我不想那样做，我相信正则表达式可以做到这一点。

我尝试过以下正则表达式：

re1 : r"(\d{2}:\d{2}:\d{2})(.*)(\d{2}:\d{2}:\d{2})"

非常明显，它将解析ex1但不解析ex2

re2 : r"(\d{2}:\d{2}:\d{2})(.*)(\d{2}:\d{2}:\d{2})?"

这将解析两者，但是第3个字符串始终为None，因为＆＃34;。*＆＃34;在正则表达式中使用结束时间模式。

我尝试了先行断言，但是我试过错了，因此没有结果。有人可以帮我在这里找到正则表达式吗？

Answer 1

你可以像你建议的那样使用前瞻，或者你可以只使用非贪婪捕获，一个可选组，并指定你想要匹配直到行尾（$）：

import re

ex1 = "00:03:34 hello!! this is example number 1 00:04:00"
ex2 = "00:07:08 Hi I am example number 2"

for ex in [ex1, ex2]:
    mat = re.match(r'(\d{2}:\d{2}:\d{2})\s(.*?)\s*(\d{2}:\d{2}:\d{2})?$', ex)
    if mat: print mat.groups()

输出：

('00:03:34', 'hello!! this is example number 1', '00:04:00')
('00:07:08', 'Hi I am example number 2', None)

注意：这与你所拥有的非常接近 - 我只是对中间组（?中的(.*?)）使用了非贪婪的捕获，并在其中添加了$结束告诉它匹配整条线。如果没有非贪婪的捕获，最后的可选时间戳会被中间组吃掉，并且没有指定要匹配到行尾，解析器甚至不会尝试匹配非贪婪的中间组和可选的时间戳，因为它没有。

Answer 2

使用此模式捕获而不是拆分

^(\d{2}:\d{2}:\d{2})(.*?)((?:\d{2}:\d{2}:\d{2})|)$

Demo

基于python中的模式拆分和分组字符串

2 个答案: