使用GATE中的Stanford Parser从tokenID获取标记字符串

时间:2018-06-05 07:51:09

标签: java stanford-nlp gate

我正在尝试使用一些Java RHS在GATE中使用Stanford依赖解析器获取依赖令牌的字符串值,并将它们添加为新注释的功能。

我在定位令牌的'dependencies'功能时遇到问题,并从tokenID获取字符串值。

使用下面仅指定'depdencies'也会抛出java null指针错误:

for(Annotation lookupAnn : tokens.inDocumentOrder())
  {
   FeatureMap lookupFeatures  = lookupAnn.getFeatures();
   token = lookupFeatures.get("dependencies").toString();  
  }

我可以使用下面的内容来获取令牌的所有功能,

gate.Utils.inDocumentOrder

但它返回所有功能,包括依赖的tokenID;即:

dependencies = [nsubj(8390), dobj(8394)]

我想从这些tokenID中获取依赖标记的字符串值。

有没有办法访问依赖的标记字符串值并将它们作为特征添加到注释中?

非常感谢你的帮助

1 个答案:

答案 0 :(得分:1)

这是一个有效的JAPE示例。它只打印到GATE的消息窗口(std out),它不会创建任何带有您要求的功能的新注释。请自己完成......

必须在GATE中加载

Stanford_CoreNLP插件才能使此JAPE文件可加载。否则,您将获得DependencyRelation类的class not found异常。

Imports: {
  import gate.stanford.DependencyRelation;
}

Phase: GetTokenDepsPhase
Input: Token
Options: control = all
Rule: GetTokenDepsRule
(
  {Token}
): token
--> 
:token {
  //note that tokenAnnots contains only a single annotation so the loop could be avoided...
  for (Annotation token : tokenAnnots) {
    Object deps = token.getFeatures().get("dependencies");

    //sometimes the dependencies feature is missing - skip it
    if (deps == null) continue;

    //token.getFeatures().get("string") could be used instead of gate.Utils.stringFor(doc,token)...
    System.out.println("Dependencies for token " + gate.Utils.stringFor(doc, token));

    //the dependencies feature has to be typed to List<DependencyRelation>
    List<DependencyRelation> typedDeps = (List<DependencyRelation>) deps;
    for (DependencyRelation r : typedDeps) {

      //use DependencyRelation.getTargetId() to get the id of the target token
      //use inputAS.get(id) to get the annotation for its id
      Annotation targetToken = inputAS.get(r.getTargetId());

      //use DependencyRelation.getType() to get the dependency type
      System.out.println("  " +r.getType()+ ": " +gate.Utils.stringFor(doc, targetToken));
    }
  }
}