Question

我有以下列表：someList = ['blablahihix', 'somethinghihi']我希望返回一个列表，其中包含列表中两个元素之间的重复模式（在本例中为＆＃39; hihi＆＃39;）。

这就是我正在做的事情：

p, r = re.compile(r'(.+?)\1+'), []
for i in strList:
    r.extend(p.findall(i) or [i])

当我print r时，它会给我['bla', 'hi', 'hi']。我唯一想要的是['hihi']。我不想要＆＃39; blabla＆＃39;要归还，因为我没有＆＃39; blabla＆＃39;在列表的第二个元素中。

我错过了什么？

Answer 1

使用set操作获取匹配组的交集：

>>> strList = ['blablahihix', 'somethinghihi']
>>> p = re.compile(r'(.+?)\1+')

>>> [set(p.findall(i)) for i in strList]
[{'bla', 'hi'}, {'hi'}]

>>> # from functools import reduce  # In Python 3.x
>>> reduce(lambda a, b: a & b, (set(p.findall(i)) for i in strList))
{'hi'}

使用set & set or set.intersection来获得两个匹配中出现的共同部分。

您需要修改模式或使用re.finditer，因为re.findall根据是否使用捕获组返回的方式;如果模式中存在一个或多个组，则返回组列表而不是整个匹配字符串列表。

>>> import re
>>>
>>> strList = ['blablahihix', 'somethinghihi']
>>> p = re.compile(r'(.+?)\1+')
>>> reduce(lambda a, b: a & b,
           (set(m.group() for m in p.finditer(i)) for i in strList))
{'hihi'}

<强>更新

正如georg建议的那样，你可以使用set.intersection(*...);不需要使用reduce。

>>> set.intersection(*(set(m.group() for m in p.finditer(i)) for i in strList))
{'hihi'}

在字符串之间重复模式

1 个答案: