Question

我仍然是识别过程的新手，仍在努力了解更多信息。我有一个需要识别的项目：表名，人员，部门。我尝试使用斯坦福NER，它的3级，它确实识别了人名。对于部门名称，我试图训练NER将部门识别为组织。因为我没有找到关于如何为它们创建新注释的结果。我确实按照他们网站上的说明进行操作。首先，我创建了一个包含以下内容的txt文件：

Ahmad在客户服务部工作。部门名称是客户服务。它始于1997年，它被称为客户从那时起服务。客户服务部有一名经理和多名经理雇员。客户服务部门的数量是1122D。艾哈迈德在开发部门工作。部门名称是Development。它始于1997年，从那时起被称为开发。发展有一名经理和许多员工。的数量开发部门是1122D。艾哈迈德在财务部门工作。部门名称是财务。它始于1997年，它被称为从那时起财务。财务部门有一名经理和许多员工。该财务部门的数量是1122D。艾哈迈德在人力资源部工作部门。部门名称是人力资源。它已经开始了 1997年，它从那时起被称为人力资源。人力资源部门一位经理和许多员工。人力资源数量部门是1122D。艾哈迈德在营销部门工作。该部门名称是市场营销。它始于1997年，它被称为从那时起营销。营销部门有一名经理和许多员工。营销部门的数量是1122D。

然后我使用了这些命令：

java -cp stanford-ner.jar edu.stanford.nlp.process.PTBTokenizer corpus.txt > corpus.tok

perl -ne 'chomp; print "$_\tO\n"' corpus.tok > corpus.tsv 

java -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop corpus.prop

然后我收到以下错误：

CRFClassifier invoked on Mon Dec 01 09:38:10 AST 2014 with arguments:
   -prop corpus.prop
argsToProperties could not read properties file: null
Exception in thread "main" edu.stanford.nlp.io.RuntimeIOException: java.io.IOException: Unable to resolve "corpus.prop" as either class path, filename or URL
    at edu.stanford.nlp.util.StringUtils.argsToProperties(StringUtils.java:879)
    at edu.stanford.nlp.util.StringUtils.argsToProperties(StringUtils.java:818)
    at edu.stanford.nlp.ie.crf.CRFClassifier.main(CRFClassifier.java:2869)
Caused by: java.io.IOException: Unable to resolve "corpus.prop" as either class path, filename or URL
    at edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:448)
    at edu.stanford.nlp.util.StringUtils.argsToProperties(StringUtils.java:866)
    ... 2 more

如何正确培训优等品？

非常感谢

更新：这是我的.prop文件

#location of the training file
trainFile = /Users/ha/stanford-ner-2014-10-26/corpus.tsv
#location where you would like to save (serialize to) your
#classifier; adding .gz at the end automatically gzips the file,
#making it faster and smaller
serializeTo = dept-model.ser.gz

#structure of your training file; this tells the classifier
#that the word is in column 0 and the correct answer is in
#column 1
map = word=0,answer=1

#these are the features we'd like to train with
#some are discussed below, the rest can be
#understood by looking at NERFeatureFactory
useClassFeature=true
useWord=true
useNGrams=true
#no ngrams will be included that do not contain either the
#beginning or end of the word
noMidNGrams=true
useDisjunctive=true
maxNGramLeng=6
usePrev=true
useNext=true
useSequences=true
usePrevSequences=true
maxLeft=1
#the next 4 deal with word shape features
useTypeSeqs=true
useTypeSeqs2=true
useTypeySequences=true
wordShape=chris2useLC

培训斯坦福大学NER失败了

0 个答案: