Question

有没有办法从命令行调用stanford解析器，以便一次解析一个句子，如果特定句子出现问题，只需转到下一个句子？

更新：

我一直在修改StanfordNLP帮助下的脚本。但是，我注意到，使用最后一个版本的corenlp（2015-04-20），CCprocessed依赖项存在问题：崩溃似乎没有发生（如果我在输出上grep prep_，我什么也没发现）。例如，折叠与2015-04-20和PCFG一起使用，因此我认为该问题是特定于模型的。

如果我在corenlp 2015-01-29中使用了相同的java类（将depparse.model更改为parse.model，并删除原始依赖项部分），则折叠工作正常。也许我只是以错误的方式使用解析器，这就是我在这里重新发布而不是开始新帖子的原因。以下是该类的更新代码：

import java.io.*;
import java.util.*;
import edu.stanford.nlp.io.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.semgraph.*;
import edu.stanford.nlp.trees.TreeCoreAnnotations.*;
import edu.stanford.nlp.util.*;


public class StanfordSafeLineExample {

public static void main (String[] args) throws IOException {
    // build pipeline                                                                                                                                                                                    
    Properties props = new Properties();
    props.setProperty("annotators","tokenize, ssplit, pos, lemma, depparse");
    props.setProperty("ssplit.eolonly","true");
    props.setProperty("tokenize.whitespace","false");
    props.setProperty("depparse.model", "edu/stanford/nlp/models/parser/nndep/english_SD.gz");
    props.setProperty("parse.originalDependencies", "true");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    // open file                                                                                                                                                                                         
    BufferedReader br = new BufferedReader(new FileReader(args[0]));
    // go through each sentence                                                                                                                                                                          
    for (String line = br.readLine() ; line != null ; line = br.readLine()) {
        try {
            Annotation annotation = new Annotation(line);
            pipeline.annotate(annotation);
            ArrayList<String> edges = new ArrayList<String>();
            CoreMap sentence = annotation.get(CoreAnnotations.SentencesAnnotation.class).get(0);

            System.out.println("sentence: "+line);
            for (CoreLabel token: annotation.get(CoreAnnotations.TokensAnnotation.class)) {

                    Integer identifier = token.get(CoreAnnotations.IndexAnnotation.class);
                    String word = token.get(CoreAnnotations.TextAnnotation.class);
                    String pos = token.get(CoreAnnotations.PartOfSpeechAnnotation.class);
                    String lemma = token.get(CoreAnnotations.LemmaAnnotation.class);
                    System.out.println(identifier+"\t"+word+"\t"+pos+"\t"+lemma);
            }

            SemanticGraph tree = sentence.get(SemanticGraphCoreAnnotations.BasicDependenciesAnnotation.class);
            SemanticGraph tree2 = sentence.get(SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation.class);
            System.out.println("---BASIC");
            System.out.println(tree.toString(SemanticGraph.OutputFormat.READABLE));
            System.out.println("---CCPROCESSED---");
            System.out.println(tree2.toString(SemanticGraph.OutputFormat.READABLE)+"</s>");
        } catch (Exception e) {

            System.out.println("Error with this sentence: "+line);
            System.out.println("");
        }
    }

}

}

Answer 1

有很多方法可以解决这个问题。

我这样做的方法是运行Stanford CoreNLP管道。

在这里你可以得到合适的罐子：

http://nlp.stanford.edu/software/corenlp.shtml

进入目录stanford-corenlp-full-2015-04-20

之后

你可以发出这个命令：

java -cp“*” - Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize，ssplit，pos，parse -ssplit.eolonly -outputFormat text -file sample_sentences.txt

sample_sentences.txt将包含您要解析的句子，每行一个句子

这会将结果放在sample_sentences.txt.out中，您可以使用一些轻量级脚本来提取它。

如果你将-outputFormat更改为json而不是text，你会得到一些你可以轻松加载并从中获取解析的json

如果您对此方法有任何疑问，请告诉我，我可以修改答案以进一步帮助您/澄清！

更新：

我不确定您运行的确切方式，但这些选项可能会有所帮助。

如果使用-fileList在文件列表而不是单个文件上运行管道，然后使用此标志：-continueOnAnnotateError它应该只是跳过坏文件，这是进度，但不可否认只是跳过坏句子

我写了一些Java来完成你所需要的东西，所以如果你只想使用我的鞭打Java代码，我会尝试在接下来的24小时内发布它，我仍然在寻找它...... / p>

Answer 2

以下是您需要的示例代码：

<textarea id="my_home" placeholder="Enter words...."></textarea>

操作的指令：

将其剪切并粘贴到StanfordSafeLineExample.java
将该文件放在目录stanford-corenlp-full-2015-04-20
javac -cp＆＃34; *：。＆＃34; StanfordSafeLineExample.java
将每行一句话添加到名为sample_sentences.txt
java -cp＆＃34; *：。＆＃34; StanfordSafeLineExample sample_sentences.txt

Stanford Parser：在命令行上逐句判刑

2 个答案: