不接受正则表达式

时间:2013-10-23 08:30:51

标签: java regex

我已经实现了代码来计算文本中单词的出现次数。但是,我的正则表达式由于某种原因不被接受,我收到以下错误:     Exception in thread "main" java.util.regex.PatternSyntaxException: Unclosed character class near index 12

我的代码是:

import java.util.*;
公共类CountOccurrenceOfWords {

/**
 * @param args the command line arguments
 */
public static void main(String[] args) {
    // TODO code application logic here
    char lf = '\n';

String text = "It was the best of times, it was the worst of times," + 
lf +
"it was the age of wisdom, it was the age of foolishness," + 
lf +
"it was the epoch of belief, it was the epoch of incredulity," + 
lf +
"it was the season of Light, it was the season of Darkness," + 
lf +
"it was the spring of hope, it was the winter of despair," + 
lf +
"we had everything before us, we had nothing before us," + 
lf +
"we were all going direct to Heaven, we were all going direct" + 
lf +
"the other way--in short, the period was so far like the present" + 
lf +
"period, that some of its noisiest authorities insisted on its" + 
lf +
"being received, for good or for evil, in the superlative degree" + 
lf +
"of comparison only." + 
lf +
"There were a king with a large jaw and a queen with a plain face," + 
lf +
"on the throne of England; there were a king with a large jaw and" + 
lf +
"a queen with a fair face, on the throne of France.  In both" + 
lf +
"countries it was clearer than crystal to the lords of the State" + 
lf +
"preserves of loaves and fishes, that things in general were" + 
lf +
"settled for ever";

    TreeMap<String, Integer> map = new TreeMap<String, Integer>();
    String[] words = text.split("[\n\t\r.,;:!?(){");
    for(int i = 0; i < words.length; i++){
        String key = words[i].toLowerCase();

        if(key.length() > 0) {
            if(map.get(key) == null){
                map.put(key, 1);
            }
            else{
                int value = map.get(key);
                value++;
                map.put(key, value);
            }
        }
    }

    Set<Map.Entry<String, Integer>> entrySet = map.entrySet();

    //Get key and value from each entry
    for(Map.Entry<String, Integer> entry: entrySet){
        System.out.println(entry.getValue() + "\t" + entry.getKey());
    }
    }
}

另外,您能否提供一个关于如何按字母顺序排列单词的提示?提前谢谢

3 个答案:

答案 0 :(得分:1)

您在正则表达式结束时错过了"]"

"[\n\t\r.,;:!?(){"不正确。

您需要将正则表达式替换为"[\n\t\r.,;:!?(){]"

答案 1 :(得分:0)

您需要转义正则表达式的特殊字符。在您的情况下,您尚未转义()[?.{。使用\转义它们。例如。 \[。您还可以考虑为空格\s预定义的字符类 - 这将匹配\r\t等等。

答案 2 :(得分:0)

您的问题是正则表达式中未关闭的字符类。 RegEx有一些“预定义”的字符,你需要在寻找它们时逃脱。

字符类是:

  

使用“字符类”,也称为“字符集”,您可以告诉正则表达式引擎只匹配多个字符中的一个。只需将要匹配的字符放在方括号中即可。   Source

这意味着您必须要转义这些字符:

\[\n\t\r\.,;:!\?\(\){

或关闭角色类

[\n\t\r\.,;:!\?\(\){]

无论哪种方式,你都需要逃避点,问号和圆括号。