如何将段落拆分为句子

时间:2014-01-21 04:29:38

标签: java

我们正在将一个段落分成基于点的句子。

String[] sentences = message.split("(?<=[.!?])\\s*");

以下句子

HP E2B16UT Mini-tower Workstation - 1 x Intel Xeon E3-1245V3 3.40 GHz

分为

HP E2B16UT Mini-tower Workstation - 1 x Intel Xeon E3-1245V3 3
40 GHz

我应该如何避免在3.40 GHz之类的东西上分裂,因为我们知道它形成一个单词而不是分隔符

3 个答案:

答案 0 :(得分:2)

你可以试试这个:

public static void main(String[] args) throws IOException
{
    String message = "HP E2B16UT Mini-tower Workstation - 1 x Intel Xeon E3-1245V3 3.40 GHz. Hello, you are welcome. StackOverflow. some_email@hotmail.com";
    String[] sentences = message.split("(?<=[.!?])\\s* ");
    for (String s : sentences) {
        System.out.println(s);
    }
}

<强>输出:

HP E2B16UT Mini-tower Workstation - 1 x Intel Xeon E3-1245V3 3.40 GHz.
Hello World.
StackOverflow.
some_email@hotmail.com

答案 1 :(得分:0)

String message= "This is an example. This string is for split on '.'."//add a space after . for new sentence

替换

 String[] sentences = message.split("(?<=[.!?])\\s*");

通过

String[] sentences = message.split("(?<=[.!?])\\s* ");//add a space to split on new sentence

答案 2 :(得分:0)

尝试这对我来说很容易理解

        String str = "This is how I tried to split a paragraph into a sentence. But, there is a problem. My paragraph includes dates like Jan 13, 2014 , words like U.S and numbers like 2.2. They all got splitted by the above code.";
    String[] sentenceHolder = str.split("[.?!][^A-Z0-9]");
    for (int i = 0; i < sentenceHolder.length; i++) {
        System.out.println(sentenceHolder[i]);
    }