Question

“她真好！” - ＆GT; [ “她”， “'”， “S”， “所以”， “好”， “！”] 我想要像这样分开句子！所以我写了代码，但它包括空格！如何仅使用正则表达式创建代码？

        words = re.findall('\W+|\w+')

- ＆GT; [“她”，“'”，“s”，“”，“so”，“”，“很好”，“！”]

        words = [word for word in words if not word.isspace()]

Answer 1

在[^A-Za-z ]添加字符中，您不想匹配。

详细说明：

Python代码：

text = "She's so nice!"
matches = re.findall(r'[A-Za-z]+|[^A-Za-z ]', text)

输出：

['She', "'", 's', 'so', 'nice', '!']

Answer 2

Python的re模块不允许您拆分零宽度断言。您可以使用python的pypi regex package代替（确保指定使用版本1，它正确处理零宽度匹配）。

import regex

s = "She's so nice!"
x = regex.split(r"\s+|\b(?!^|$)", s, flags=regex.VERSION1)

print(x)

输出：['She', "'", 's', 'so', 'nice', '!']