用模式扫描整个单词

时间:2012-04-20 02:55:35

标签: java regex

我需要使用正则表达式= \ w(或所有单词)来实现Pattern。

当我运行程序输出时应该是:

a [1]
is [1]
test[1,2]

但是它是:

a [1]
e [2]
h [1]
i [1, 1]
s [1, 1, 2]
t [1, 2, 2]

负责扫描和模式匹配的代码如下:

public class DocumentIndex {

  private TreeMap<String, ArrayList<Integer>> map = 
  new TreeMap<String, ArrayList<Integer>>();       // Stores words and their locations
  private String regex = "\\w";                //any word

  /**
   * A constructor that scans a document for words and their locations
   */
  public DocumentIndex(Scanner doc){
    Pattern p = Pattern.compile(regex);       //Pattern class: matches words
    Integer location = 0;                   // the current line number
        // while the document has lines
        // set the Matcher to the current line
        while(doc.hasNextLine()){
            location++;
            Matcher m = p.matcher(doc.nextLine());
            // while there are value in the current line
            // check to see if they are words
            // and if so save them to the map
            while(m.find()){
                if(map.containsKey(m.group())){
                    map.get(m.group()).add(location);
                } else {
                    ArrayList<Integer> list = new ArrayList<Integer>();
                    list.add(location);
                    map.put(m.group(), list);
                }
            }
        }
    }
...
}

将整个单词作为模式阅读的最佳方法是什么?

2 个答案:

答案 0 :(得分:2)

您需要使用\\w+,而不是\\w。后者只匹配一个字符(前者,一个或多个字符)。

答案 1 :(得分:0)

([^ ]+)+

或者您可以使用StringTokenizer类。