Corenlp解析对于输入错误来说太慢了

时间:2015-07-24 10:52:37

标签: parsing nlp stanford-nlp

Corenlp解析对于输入错误来说太慢了。它提供了以下类型的警告,并且需要花费大量时间进行解析。

输入: "林肯'第四个儿子托马斯"塔德"林肯,1853年4月4日出生,于1871年7月16日18岁时因心力衰竭去世。"
它产生了这个错误:


    Jul 24, 2015 4:03:42 PM edu.stanford.nlp.dcoref.RuleBasedCorefMentionFinder funkyFindLeafWithApproximateSpan
    WARNING: RuleBasedCorefMentionFinder: Failed to find head token:
    Tree is: (ROOT (S (NP (NP (NP (DT The) (NNS Lincolns) (POS ')) (JJ fourth) (NN son)) (, ,) (NP (NNP Thomas) () (NNP Tad) ('' '') (NNP Lincoln)) (, ,)) (VP (VP (VBD was) (VP (VBN born) (PP (IN on) (NP (NP (NNP April) (CD 4)) (, ,) (NP (CD 1853)) (, ,))))) (CC and) (VP (VBD died) (PP (IN of) (NP (NN heart) (NN failure))) (PP (IN at) (NP (NP (DT the) (NN age)) (PP (IN of) (NP (CD 18))))) (PP (IN on) (NP (NNP July) (CD 16))) (, ,) (NP (CD 1871)))) (. .)))
    token = |NP|0|, approx=0
    Jul 24, 2015 4:03:42 PM edu.stanford.nlp.dcoref.RuleBasedCorefMentionFinder funkyFindLeafWithApproximateSpan
    WARNING: RuleBasedCorefMentionFinder: Last resort: returning as head: 1871
    Jul 24, 2015 4:03:42 PM edu.stanford.nlp.dcoref.RuleBasedCorefMentionFinder findHead
    WARNING: Invalid index for head 34=34-0: originalSpan=[The Lincolns '], head=1871-35
    Jul 24, 2015 4:03:42 PM edu.stanford.nlp.dcoref.RuleBasedCorefMentionFinder findHead
    WARNING: Setting head string to entire mention
 


我花了600.339秒来解析此文档https://en.wikipedia.org/wiki/Abraham_Lincoln的已清理文本。
 有没有办法加速这件事?在corenlp中是否有任何选项可以自动跳过错误的句子?或者有没有办法设置解析句子的时间限制,之后解析器会自动跳过句子?

0 个答案:

没有答案
相关问题