正则表达式从段落中查找包含特定单词(java)的句子

时间:2013-04-13 16:28:00

标签: java regex

我有一个单词列表: dog cat leopard

我正试图在Java中提出一个正则表达式来从包含任何一个单词(不区分大小写)的长段中提取句子。句子以. ?!结尾 有人可以帮忙吗?谢谢!

5 个答案:

答案 0 :(得分:3)

以下假设一个句子以大写字母开头,并且句子中没有.!?,除了它的结尾。

String str = "Hello. It's a leopard I think. How are you? It's just a dog or a cat. Are you sure?";
Pattern p = Pattern.compile("[A-Z](?i)[^.?!]*?\\b(dog|cat|leopard)\\b[^.?!]*[.?!]");
Matcher m = p.matcher(str);

while (m.find()) {
    System.out.println(m.group());
}
// It's a leopard I think.
// It's just a dog or a cat.

答案 1 :(得分:3)

假设

  • 句子必须以大写字母开头,中间没有行终结符[。?!]。
  • 关键字匹配不区分大小写。但是子字符串匹配无效。
  • 关键字可能出现在句子的任何位置(开头,结尾或中间)。
  • 支持引文和非正式双标点符号。如果不需要,请使用第二个正则表达式。

public class SentenceFinder {

    public static void main(String[] args) {
        String paragraph = "I have a list of words to match: dog, cat, leopard. But blackdog or catwoman shouldn't match. Dog may bark at the start! Is that meow at the end my cat? Some bonus sentence matches shouldn't hurt. My dog gets jumpy at times and behaves super excited!! My cat sees my goofy dog and thinks WTF?! Leopard likes to quote, \"I'm telling you these Lions suck bro!\" Sometimes the dog asks too, \"Cat got your tongue?!\"";
        Pattern p = Pattern.compile("([A-Z][^.?!]*?)?(?<!\\w)(?i)(dog|cat|leopard)(?!\\w)[^.?!]*?[.?!]{1,2}\"?");
        Matcher m = p.matcher(paragraph);
        while (m.find()) {
            System.out.println(m.group());
        }
    }
    /* Output:
       I have a list of words to match: dog, cat, leopard.
       Dog may bark at the start!
       Is that meow at the end my cat?
       My dog gets jumpy at times and behaves super excited!!
       My cat sees my goofy dog and thinks WTF?!
       Leopard likes to quote, "I'm telling you these Lions suck bro!"
       Sometimes the dog asks too, "Cat got your tongue?!"
    */
}


简化正则表达式,如果“引用?!” (或非正式标点符号)不是必需的:
"([A-Z][^.?!]*?)?(?<!\\w)(?i)(dog|cat|leopard)(?!\\w)[^.?!]*?[.?!]"

要获取那些不以大写字母开头的句子(如果输入可能有这样的拼写错误):
"(?i)([a-z][^.?!]*?)?(?<!\\w)(dog|cat|leopard)(?!\\w)[^.?!]*?[.?!]"

答案 2 :(得分:1)

这应该这样做。你只需要在中间填充你想要的单词。例如:

你好,我是一只狗,我喜欢做事吗?不要因为善意而把我的弱点。我的树皮比跳跃的叮咬更好!因此,采用另一种动物。像一只猫。

匹配:

你好,我是一只狗,我喜欢做事吗? 我的树皮比跳跃的叮咬更好! 像一只猫。这样做(?i)忽略大小写。我没有把它放入,因为我不记得语法,但有人写了它

"(?=.*?\\.)[^ .?!][^.?!]*?(dog|cat|leapord).*?[.?!]"

答案 3 :(得分:0)

试试这个正则表达式

   str.matches("(?i)(^|\\s+)(dog|cat|leopard)(\\s+|[.?!]$)");

(?i)是一个特殊的构造,意味着不区分大小写

答案 4 :(得分:0)

(cat | dog | leopard)。(\。| \?| \!)$并且您应该使用java.util.regex.Pattern的CASE_INSENSITIVE选项。