Question

我是自然语言处理的新手。我需要从文本中提取名词短语。到目前为止，我已经使用open nlp的分块解析器来解析我的文本以获得树结构。但是我无法提取名词树结构中的短语，在打开的nlp中是否有正则表达式模式，以便我可以用它来提取名词短语。

以下是我正在使用的代码

    InputStream is = new FileInputStream("en-parser-chunking.bin");
    ParserModel model = new ParserModel(is);
    Parser parser = ParserFactory.create(model);
    Parse topParses[] = ParserTool.parseLine(line, parser, 1);
        for (Parse p : topParses){
                 p.show();}

这里我的输出为

（TOP（S（AD（欢迎JJ）（PP（TO to）（NP（NNP Big）（NNP Data。）））））（S（NP（PRP We））（VP（VP（ VBP是）（VP（VBG工作）（PP（IN on）（NP（NNP自然）（NNP语言）（NNP Processing.can）））））（NP（DT some）（CD one）（NN帮助））（NP（PRP us））（PP（IN in）（S（VP（VBG提取）（NP（DT）（NN名词）（NNS短语））（PP（IN）（NP（DT））（NN）树）（WP结构。）））））））））））

有人可以帮助我获取NP，NNP，NN等名词短语。可以告诉我，我是否需要使用任何其他NP Chunker来获取名词短语？是否有任何正则表达式模式来实现相同。

请帮我解决这个问题。

提前致谢

Gouse。

Answer 1

Parse对象是一棵树;您可以使用getParent()和getChildren()以及getType()来导航树。

List<Parse> nounPhrases;

public void getNounPhrases(Parse p) {
    if (p.getType().equals("NP")) {
         nounPhrases.add(p);
    }
    for (Parse child : p.getChildren()) {
         getNounPhrases(child);
    }
}

Answer 2

如果你只想要名词短语，那么使用句子chunker而不是树解析器。代码是这样的（你需要从你获得解析器模型的同一个地方获取模型）

public void chunk() {
    InputStream modelIn = null;
    ChunkerModel model = null;

    try {
      modelIn = new FileInputStream("en-chunker.bin");
      model = new ChunkerModel(modelIn);
    }
    catch (IOException e) {
      // Model loading failed, handle the error
      e.printStackTrace();
    }
    finally {
      if (modelIn != null) {
        try {
          modelIn.close();
        }
        catch (IOException e) {
        }
      }
    }

//After the model is loaded a Chunker can be instantiated.


    ChunkerME chunker = new ChunkerME(model);



    String sent[] = new String[]{"Rockwell", "International", "Corp.", "'s",
      "Tulsa", "unit", "said", "it", "signed", "a", "tentative", "agreement",
      "extending", "its", "contract", "with", "Boeing", "Co.", "to",
      "provide", "structural", "parts", "for", "Boeing", "'s", "747",
      "jetliners", "."};

    String pos[] = new String[]{"NNP", "NNP", "NNP", "POS", "NNP", "NN",
      "VBD", "PRP", "VBD", "DT", "JJ", "NN", "VBG", "PRP$", "NN", "IN",
      "NNP", "NNP", "TO", "VB", "JJ", "NNS", "IN", "NNP", "POS", "CD", "NNS",
      "."};

    String tag[] = chunker.chunk(sent, pos);
  }

然后查看您想要的类型的标签数组

http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.parser.chunking.api

Answer 3

将继续您的代码本身。该程序块将提供句子中的所有名词短语。使用 getTagNodes（） 方法获取令牌及其类型

Parse topParses[] = ParserTool.parseLine(line, parser, 1);
Parse words[]=null; //an array to store the tokens
//Loop thorugh to get the tag nodes
for (Parse nodes : topParses){
        words=nodes.getTagNodes(); // we will get a list of nodes
}

for(Parse word:words){
//Change the types according to your desired types
    if(word.getType().equals("NN") || word.getType().equals("NNP") || word.getType().equals("NNS")){
            System.out.println(word);
            }
        }

如何使用Open nlp的分块解析器提取名词短语

3 个答案: