' PunktSentenceTokenizer'对象不可调用

时间:2016-05-08 14:09:21

标签: python nltk

我是pyhton和nltk的新手。我想标记一个字符串,并在nltk中将一些字符串添加到拆分列表中。我使用了帖子How to tweak the NLTK sentence tokenizer中的代码。以下是我写的代码

from nltk.tokenize import sent_tokenize
extra_abbreviations = ['\n']
sentence_tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
sentence_tokenizer._params.abbrev_types.update(extra_abbreviations)

sent_tokenize_list = sentence_tokenizer(document)
sent_tokenize_list

这给了我以下错误:

TypeError Traceback(最近一次调用最后一次)  in()       4 sentence_tokenizer._params.abbrev_types.update(extra_abbreviations)       五 ----> 6 sent_tokenize_list = sentence_tokenizer(文件)       7 sent_tokenize_list

TypeError:' PunktSentenceTokenizer'对象不可调用

我该如何解决这个问题?

1 个答案:

答案 0 :(得分:0)

这使您的示例有效:

import nltk
from nltk.tokenize import sent_tokenize
extra_abbreviations = ['\n']
sentence_tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
sentence_tokenizer._params.abbrev_types.update(extra_abbreviations)
document = """This is my test doc. It has two sentences; however, one of wich with interesting punctuation."""
sent_tokenize_list = sentence_tokenizer.tokenize(document)
print(sent_tokenize_list)

您的错误是由于sentence_tokenizer是一个对象。您必须在对象上调用函数tokenize

了解如何详细了解对象in the python docs

的功能