Question

我使用Python在描述（字符串）中搜索某些单词（也是多重标记）。

要做到这一点，我使用像这样的正则表达式

    result = re.search(word, description, re.IGNORECASE)
    if(result):
        print ("Trovato: "+result.group())

但我需要的是获得比赛前后的前2个单词。例如，如果我有这样的事情：

在这里停车太可怕了，这家店很糟糕。

＆＃34; 此处＆＃34;是我要找的那个词。所以在我将它与我的正则表达式匹配后，我需要在比赛前后的2个单词（如果存在）。

在示例中：停车这里可怕，这个

＆＃34;停车＆＃34;而且可怕，这就是我需要的词语。

ATTTENTION 说明驾驶室很长，模式＆＃34;这里是＆＃34;可以出现多次？

Answer 1

字符串操作怎么样？

line = 'Parking here is horrible, this shop sucks.'

before, term, after = line.partition('here is')
before = before.rsplit(maxsplit=2)[-2:]
after = after.split(maxsplit=2)[:2]

结果：

>>> before
['Parking']
>>> after
['horrible,', 'this']

Answer 2

试试这个正则表达式：((?:[a-z,]+\s+){0,2})here is\s+((?:[a-z,]+\s*){0,2})

re.findall和re.IGNORECASE设置

Demo

Answer 3

根据您的澄清，这会变得有点复杂。下面的解决方案涉及搜索模式实际上也可能在前两个或两个后续单词中的情况。

line = "Parking here is horrible, here is great here is mediocre here is here is "
print line
pattern = "here is"
r = re.search(pattern, line, re.IGNORECASE)
output = []
if r:
    while line:
        before, match, line = line.partition(pattern)
        if match:
            if not output:
                before = before.split()[-2:]
            else:    
                before = ' '.join([pattern, before]).split()[-2:]
            after = line.split()[:2]
            output.append((before, after))
print output

我的例子的输出是：

[（['停车']，['可怕，'，'这里']），（['是'，'可怕，']，['很棒'，'这里']），（ ['is'，'great']，['mediocre'，'here']），（['is'，'mediocre']，['here'，'is']），（['here'，'是']，[]）]

Answer 4

我会这样做（编辑：添加锚点以涵盖大多数情况）：

(\S+\s+|^)(\S+\s+|)here is(\s+\S+|)(\s+\S+|$)

像这样，你将总是有4组（可能需要修剪），具有以下行为：

如果第1组为空，之前没有任何字（第2组也是空的）
如果第2组为空，则之前只有一个单词（第1组）
如果第1组和第2组不为空，则它们是按顺序排列的单词
如果第3组为空，则
如果第4组为空，则
如果第3组和第4组不为空，则它们是按顺序排列后的单词

更正了demo link

在字符串中搜索并获取Python中匹配前后的2个单词

4 个答案: