如何在包含标点符号的同时将字符串拆分为句子?

时间:2019-03-29 11:12:40

标签: regex python-3.x string punctuation sentence

我希望拆分的句子包含标点符号(例如:?,!,。),并且如果句子末尾有双引号,我也希望包含标点符号。

我在python3中使用了re.split()函数来将我的字符串拆分为句子。但是令人遗憾的是,如果句子的末尾出现一个字符串,则结果字符串不包含标点符号,也不包含双引号。

这是我当前的代码:

x = 'This is an example sentence. I want to include punctuation! What is wrong with my code? It makes me want to yell, "PLEASE HELP ME!"'
sentence = re.split('[\.\?\!]\s*', x)

我得到的输出是:

['This is an example sentence', 'I want to include punctuation', 'What is wrong with my code', 'It makes me want to yell, "PLEASE HELP ME', '"']

1 个答案:

答案 0 :(得分:1)

尝试在后向拆分:

sentences = re.split('(?<=[\.\?\!])\s*', x)
print(sentences)

['This is an example sentence.', 'I want to include punctuation!',
 'What is wrong with my code?', 'It makes me want to yell, "PLEASE HELP ME!"']

当我们看到紧接在我们后面的标点符号时,此正则表达式将通过拆分来起作用。在这种情况下,在继续向下输入字符串之前,我们还匹配并消耗我们前面的任何空格。

这是我处理双引号问题的平庸尝试:

x = 'This is an example sentence. I want to include punctuation! "What is wrong with my code?"  It makes me want to yell, "PLEASE HELP ME!"'
sentences = re.split('((?<=[.?!]")|((?<=[.?!])(?!")))\s*', x)
print filter(None, sentences)

['This is an example sentence.', 'I want to include punctuation!',
 '"What is wrong with my code?"', 'It makes me want to yell, "PLEASE HELP ME!"']

请注意,它可以正确地将偶数双引号结尾的句子分开。