Mallet TokenSequenceRemoveStopwords无法读取文件

时间:2018-08-24 12:12:32

标签: mallet

我正在尝试使用Mallet进行主题建模。所以这是我的代码:

{
    ArrayList<Pipe> pipeList = new ArrayList<Pipe>();
    // Lowercase everything
    pipeList.add(new CharSequenceLowercase());
    // Unicode letters, underscore, and hashtag
    Pattern pat = Pattern.compile("[\\p{L}_#]+");
    pipeList.add(new CharSequence2TokenSequence(pat));
    // Remove stop words
    pipeList.add( new TokenSequenceRemoveStopwords(new File("C:\\mallet\\stoplists\\en.txt"), "UTF-8", false, false, false) );
    // Convert the token sequence to a feature sequence.
    pipeList.add(new TokenSequence2FeatureSequence());
    return pipeList;
}

如果我运行程序,它会显示

  

线程“主”中的异常java.lang.IllegalArgumentException:故障   读取文件C:\ mallet \ stoplists \ en.txt

有人可以帮我解决这个问题吗?

0 个答案:

没有答案