Question

在Java的正则表达式中，我希望匹配任何包含单词“Mary”和单词“are”的句子，但不要在“Mary”和“are”之间包含“Bob”。

Eg: Mary and Rob are married - MATCH
Eg: Mary and John and Michael became good friends and are living together <- MATCH
Eg: Mary, Rob and Bob are dead <- does not MATCH

任何想法？

Answer 1

稍短的版本：

(?m)^.*\bMary\b((?!\bBob\b).)*\bare\b.*$


public class Main {
    public static void main(String[] args) {
        String[] tests = {
                "Mary and Rob are married",
                "Mary and John and Michael became good friends and are living together",
                "Mary, Rob and Bob are dead"
        };
        String regex = "(?m)^.*\\bMary\\b((?!\\bBob\\b).)*\\bare\\b.*$";
        for(String t : tests) {
            System.out.println(t.matches(regex) + " -> " + t);
        }
    }
}

Answer 2

在我写的时候，有两个很好的答案可以在一个正则表达式中完成。

我想建议，除非你对性能进行优化（并且记住，过早的优化是不好的，不管怎样？）值得分成更多，更简单的正则表达式，并使用语言功能来提高可读性。

不是那么复杂的正则表达式总是很有效 - 很容易意外地编写一个在整个地方回溯的正则表达式。

对于你的代码的读者来说，他们也很陌生，他们可能不熟悉你所拥有的正则表达方言的异域特征。

boolean isMatch(String s) {
    // First pass test
    Pattern basicPattern = Pattern.compile("\bMary\b.*\bare\b");
    // ... and a test for exclusions
    String rejectRE = "\bMary\b.*\bBob\b.*\bare\b";

    Matcher m = basicPattern.matcher(s);

    while(m.find()) {
         // We have a candidate match
         if(! m.group().matches(rejectRE)) {
              // and it passed the secondary test
              return true;
         }
    }

    // we fell through
    return false;
}

Answer 3

 (?m)^(?:(?<!\bare\b).)*?Mary(?:(?<!\bBob\b).)+are.*?$

应该这样做。

一些固定长度的negative look-behind确保：

Mary之前没有“are”（单词“are”）
之前没有Bob

它的内容如下：

^：anchor for：在行首开始匹配
(?:：不要将以下内容作为一组捕获
(?<!\bare\b).：任何不带字符前面的换行符都是（意味着“Mare”不会阻止下一个字符匹配，但“......是x”会阻止“x”匹配）：见word boundaries
)*?：匹配至少一个字符
'are'的原则相同（不以“Bob”为单词）
。*？$：在“是”之后0到n个字符，直到行尾。

更多关于regular-expressions.info。

所以模式：

Pattern.compile("(?m)^(?:(?<!\\bare\\b).)*?Mary(?:(?<!\\bBob\\b).)+are.*?$");

会从三行中返回2个匹配项：

Eg: Mary and Rob are married - MATCH
Eg: Mary and John and Michael became good friends and are living together <- MATCH
Eg: Mary, Rob and Bob are dead <- does not MATCH

正则表达式问题

3 个答案: