如何识别斯坦福CoreNLP Coreferences中的Coreference集和代表性提及?

时间:2016-03-25 01:53:05

标签: nlp stanford-nlp opennlp sharpnlp

我正在使用Stanford CoreNLP。我需要在输入文本中检测并识别每个CorefChain的“Coreference set”和“代表性提及”:

例如: 输入: 奥巴马于1996年当选为伊利诺伊州参议员,并在那里任职八年。 2004年,他以创纪录的多数票当选伊利诺伊州参议员,并于2007年2月宣布参选总统。

输出:使用“Pretty Print”我可以得到以下输出:

**Coreference set:
(2,4,[4,5]) -> (1,1,[1,2]), that is: "he" -> "Obama"

(2,24,[24,25]) -> (1,1,[1,2]), that is: "his" -> "Obama"

(3,22,[22,23]) -> (1,1,[1,2]), that is: "Obama" -> "Obama"**

但是,我需要以编程方式识别和检测上面的输出,称为“Coreference set”。 (我的意思是我需要识别所有的对象:“他” - >“奥巴马”)

注意:我的基本代码如下(来自http://stanfordnlp.github.io/CoreNLP/coref.html):

import edu.stanford.nlp.hcoref.CorefCoreAnnotations;
import edu.stanford.nlp.hcoref.data.CorefChain;
import edu.stanford.nlp.hcoref.data.Mention;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.util.CoreMap;
import java.util.Properties;
public class CorefExample {

public static void main(String[] args) throws Exception {

Annotation document = new Annotation("Obama was elected to the Illinois state senate in 1996 and served there for eight years. In 2004, he was elected by a record majority to the U.S. Senate from Illinois and, in February 2007, announced his candidacy for President.");
Properties props = new Properties();
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse,mention,coref");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
pipeline.annotate(document);
System.out.println("---");
System.out.println("coref chains");
for (CorefChain cc : document.get(CorefCoreAnnotations.CorefChainAnnotation.class).values()) {
  System.out.println("\t"+cc);
}
for (CoreMap sentence : document.get(CoreAnnotations.SentencesAnnotation.class)) {
  System.out.println("---");
  System.out.println("mentions");
  for (Mention m : sentence.get(CorefCoreAnnotations.CorefMentionsAnnotation.class)) {
    System.out.println("\t"+m);
     }
   }
  }
 }

 ///// Any Idea? THANK YOU in ADVANCE

1 个答案:

答案 0 :(得分:2)

CorefChain包含该信息。

例如,您可以获得:

List<CorefChain.CorefMention> 

使用此方法:

cc.getMentionsInTextualOrder();

这将为您提供该特定群集文档中的所有CorefChain.CorefMention。

您可以使用此方法获得代表性提及:

cc.getRepresentativeMention();

CorefChain.CorefMention代表coref集群中的特别提及。您可以从CorefChain.CorefMention(句子编号,句子中的提及编号)中获取诸如完整字符串和位置之类的信息:

for (CorefChain.CorefMention cm : cc.getMentionsInTextualOrder()) {
    String textOfMention = cm.mentionSpan;
    IntTuple positionOfMention = cm.position;
}

以下是CorefChain的javadoc链接:

http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/dcoref/CorefChain.html

以下是CorefChain.CorefMention的javadoc链接:

http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/dcoref/CorefChain.CorefMention.html