在Java的正则表达式中,我希望匹配任何包含单词“Mary”和单词“are”的句子,但不要在“Mary”和“are”之间包含“Bob”。
Eg: Mary and Rob are married - MATCH
Eg: Mary and John and Michael became good friends and are living together <- MATCH
Eg: Mary, Rob and Bob are dead <- does not MATCH
任何想法?
答案 0 :(得分:3)
稍短的版本:
(?m)^.*\bMary\b((?!\bBob\b).)*\bare\b.*$
public class Main {
public static void main(String[] args) {
String[] tests = {
"Mary and Rob are married",
"Mary and John and Michael became good friends and are living together",
"Mary, Rob and Bob are dead"
};
String regex = "(?m)^.*\\bMary\\b((?!\\bBob\\b).)*\\bare\\b.*$";
for(String t : tests) {
System.out.println(t.matches(regex) + " -> " + t);
}
}
}
答案 1 :(得分:2)
在我写的时候,有两个很好的答案可以在一个正则表达式中完成。
我想建议,除非你对性能进行优化(并且记住,过早的优化是不好的,不管怎样?)值得分成更多,更简单的正则表达式,并使用语言功能来提高可读性。
不是那么复杂的正则表达式总是很有效 - 很容易意外地编写一个在整个地方回溯的正则表达式。
对于你的代码的读者来说,他们也很陌生,他们可能不熟悉你所拥有的正则表达方言的异域特征。
boolean isMatch(String s) {
// First pass test
Pattern basicPattern = Pattern.compile("\bMary\b.*\bare\b");
// ... and a test for exclusions
String rejectRE = "\bMary\b.*\bBob\b.*\bare\b";
Matcher m = basicPattern.matcher(s);
while(m.find()) {
// We have a candidate match
if(! m.group().matches(rejectRE)) {
// and it passed the secondary test
return true;
}
}
// we fell through
return false;
}
答案 2 :(得分:1)
(?m)^(?:(?<!\bare\b).)*?Mary(?:(?<!\bBob\b).)+are.*?$
应该这样做。
一些固定长度的negative look-behind确保:
它的内容如下:
^
:anchor for:在行首开始匹配(?:
:不要将以下内容作为一组捕获(?<!\bare\b).
:任何不带字符前面的换行符都是(意味着“Mare”不会阻止下一个字符匹配,但“......是x”会阻止“x”匹配):见word boundaries )*?
:匹配至少一个字符
'are'的原则相同(不以“Bob”为单词)
。*?$:在“是”之后0到n个字符,直到行尾。
所以模式:
Pattern.compile("(?m)^(?:(?<!\\bare\\b).)*?Mary(?:(?<!\\bBob\\b).)+are.*?$");
会从三行中返回2个匹配项:
Eg: Mary and Rob are married - MATCH
Eg: Mary and John and Michael became good friends and are living together <- MATCH
Eg: Mary, Rob and Bob are dead <- does not MATCH