NLTK标记引号内的感叹号和问号

时间:2014-06-14 10:21:23

标签: python string nltk quotes

输入:

"Hello! What is your name?"My name is ABC.

我得到的输出是:

"Hello!

What is your name?"

My name is ABC.

我希望将输出作为整个句子,如:

"Hello! What is your name?" My name is ABC.

请建议我必须在代码中进行一些更改。

此代码用于从段落中提取句子。句子以句号,感叹号和问号结尾。但如果它们出现在引号内,则不应将该段分开。

例如,如果它是'"Hello! What is your name?"My name is ABC.'那么它应该将整个句子作为整个句子返回,并且在遇到感叹号和问号时不会分开。

from nltk.tokenize.punkt import PunktSentenceTokenizer, PunktParameters
punkt_param = PunktParameters()
punkt_param.abbrev_types = set(['dr', 'vs', 'mr', 'mrs', 'prof', 'inc'])
sentence_splitter = PunktSentenceTokenizer(punkt_param)
text =str(input())
text = text.replace('!"','!" ').replace('?"','?" ').replace('."','." ').replace('.','.').replace('?','? ').replace('!','! ')
sentences = sentence_splitter.tokenize(text)
for j in sentences:
    print(j)

0 个答案:

没有答案
相关问题