Question

我知道几乎每个正则表达式问题都必须被询问并回答，但是我去了：

我要匹配一个正则表达式：

"alcohol abuse"
"etoh abuse"
"alcohol dependence"
"etoh dependence"

但不匹配

"denies alcohol dependence"
"denies smoking and etoh dependence"
"denies [anything at all] and etoh abuse"

背后的负面印象很明显，但是我如何不匹配最后两个示例？

到目前为止，我的正则表达式看起来像这样：

re.compile("(?<!denies\s)(alcohol|etoh)\s*(abuse|dependence)")

我不能在后面的负数中包括贪婪的运算符，因为该运算仅适用于要评估的固定长度字符串。

我宁愿一步执行此操作，因为它会馈给接受一个正则表达式作为参数的函数。

感谢提示

Answer 1

您可以利用match groups并采用以下常规模式：

bad|(good)

实际上，您确实匹配了不需要的部分，但是在替换的最后部分中只记住了“好”部分。

因此您的模式将是（请注意所有“仅分组”括号）：

此regex101 demo中的“第1组”仅保留有效的匹配项。

Answer 2

如果您无法安装任何模块，则可以重新编写表达式并检查第1组是否为空：

import re
rx = re.compile("(denies)?.*?(alcohol|etoh)\s*(abuse|dependence)")

sentences = ["alcohol abuse", "etoh abuse", "alcohol dependence", "etoh dependence",
             "denies alcohol dependence", "denies smoking and etoh dependence", "denies [anything at all] and etoh abuse"]

def filterSentences(input):
    m = rx.search(input)
    if m and m.group(1) is None:
        print("Yup: " + sent)

for sent in sentences:
    filterSentences(sent)

这产生

Yup: alcohol abuse
Yup: etoh abuse
Yup: alcohol dependence
Yup: etoh dependence

如果您有超过denies（即does not like...），只需更改第一个字幕组即可。

正则表达式忽略负向后查找和匹配之间的所有内容

2 个答案: