非英语模型的crfsuite CRF模型特征的打印问题

时间:2018-10-29 10:16:45

标签: python encoding scikit-learn pickle crf

我有一个CRF(对象类型:sklearn_crfsuite.estimator.CRF)模型,其中要素数据为utf8格式。该模型在预测方面运行良好。现在,我想了解CRF模型。

为此,每当我尝试打印crf.attributes_crf.state_features_crf.transition_features_时,都会出现以下错误:

Traceback (most recent call last):
  File "C:\Users\user123\eclipse-workspace\xxx_path\standalone scripts\crfModelAnalysis.py", line 20, in <module>
    print_transitions(Counter(crf.transition_features_).most_common(k))
  File "C:\Python27\lib\site-packages\sklearn_crfsuite\estimator.py", line 490, in transition_features_
    if self._info is None:
  File "C:\Python27\lib\site-packages\sklearn_crfsuite\estimator.py", line 499, in _info
    self._info_cached = self.tagger_.info()
  File "pycrfsuite\_pycrfsuite.pyx", line 704, in pycrfsuite._pycrfsuite.Tagger.info
  File "pycrfsuite\_pycrfsuite.pyx", line 706, in pycrfsuite._pycrfsuite.Tagger.info
  File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 27: invalid start byte

基本信息: 模型以pickle格式保存。 Python Version : 2.7 sklearn-crfsuite==0.3.6

任何帮助将不胜感激。

0 个答案:

没有答案