我在http://www.nltk.org/api/nltk.tokenize.html#module-nltk.tokenize.stanford_segmenter处使用了本教程中的以下代码,试图进行汉语句子段
import pandas as pd
from nltk.tokenize.stanford_segmenter import StanfordSegmenter
seg = StanfordSegmenter(path_to_slf4j='/Users/edamame/Documents/jars/slf4j-api-1.7.25.jar', path_to_jar='/Users/edamame/Documents/jars/stanford-segmenter-3.9.1.jar')
seg = StanfordSegmenter()
seg.default_config('zh')
sent = u'这是斯坦福中文分词器测试'
print(seg.segment(sent))
但是,我收到以下警告和错误:
/Users/edamame/workspace/git/chinese_nlp/venv/bin/python /Users/edamame/workspace/git/chinese_nlp/chinese_segmenter.py
/Users/edamame/workspace/git/chinese_nlp/chinese_segmenter.py:5: DeprecationWarning:
The StanfordTokenizer will be deprecated in version 3.2.5.
Please use nltk.parse.corenlp.CoreNLPTokenizer instead.'
:
more errors ...
我正在使用nltk-3.3
。根据警告,我应该对斯坦福中文分割器使用nltk.parse.corenlp.CoreNLPTokenizer
(即,将上面的代码转换为使用nltk.parse.corenlp.CoreNLPTokenizer
)。但是,我在网站上找不到任何示例,以前有人做过类似的事情吗?谢谢!