使用stanford解析器解析中文

时间:2014-03-28 03:26:09

标签: stanford-nlp

这是我的代码,主要来自演示。程序运行完美,但结果非常错误。它没有溢出的话。 谢谢

public static void main(String[] args) {
 LexicalizedParser lp = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/xinhuaFactored.ser.gz");

  demoAPI(lp);

}


public static void demoAPI(LexicalizedParser lp) {


// This option shows loading and using an explicit tokenizer
String sent2 = "我爱你";
TokenizerFactory<CoreLabel> tokenizerFactory =
    PTBTokenizer.factory(new CoreLabelTokenFactory(), "");
Tokenizer<CoreLabel> tok =
    tokenizerFactory.getTokenizer(new StringReader(sent2));
List<CoreLabel> rawWords2 = tok.tokenize();

Tree parse = lp.apply(rawWords2);

TreebankLanguagePack tlp = new PennTreebankLanguagePack();
GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
List<TypedDependency> tdl = gs.typedDependenciesCCprocessed();
System.out.println(tdl);
System.out.println();

// You can also use a TreePrint object to print trees and dependencies
TreePrint tp = new TreePrint("penn,typedDependenciesCollapsed");
tp.printTree(parse);
}

1 个答案:

答案 0 :(得分:1)

你确定要分词吗?例如,尝试使用&#34;我爱你。&#34;再次运行它。作为句子。我相信在命令行中解析器会自动分段,但是我不确定它在Java中的作用。