如何将nltk.parse.corenlp.CoreNLPTokenizer用于Standford中文分段器

时间:2018-07-19 17:49:08

标签: python-3.x nlp nltk stanford-nlp sentence

我在http://www.nltk.org/api/nltk.tokenize.html#module-nltk.tokenize.stanford_segmenter处使用了本教程中的以下代码,试图进行汉语句子段

import pandas as pd
from nltk.tokenize.stanford_segmenter import StanfordSegmenter

seg = StanfordSegmenter(path_to_slf4j='/Users/edamame/Documents/jars/slf4j-api-1.7.25.jar', path_to_jar='/Users/edamame/Documents/jars/stanford-segmenter-3.9.1.jar')


seg = StanfordSegmenter()
seg.default_config('zh')
sent = u'这是斯坦福中文分词器测试'
print(seg.segment(sent))

但是,我收到以下警告和错误:

/Users/edamame/workspace/git/chinese_nlp/venv/bin/python /Users/edamame/workspace/git/chinese_nlp/chinese_segmenter.py
/Users/edamame/workspace/git/chinese_nlp/chinese_segmenter.py:5: DeprecationWarning: 
The StanfordTokenizer will be deprecated in version 3.2.5.
Please use nltk.parse.corenlp.CoreNLPTokenizer instead.'
       :
    more errors ...

我正在使用nltk-3.3。根据警告,我应该对斯坦福中文分割器使用nltk.parse.corenlp.CoreNLPTokenizer(即,将上面的代码转换为使用nltk.parse.corenlp.CoreNLPTokenizer)。但是,我在网站上找不到任何示例,以前有人做过类似的事情吗?谢谢!

0 个答案:

没有答案
相关问题