重新拆分特殊情况以拆分逗号分隔的字符串

时间:2017-05-30 17:37:27

标签: python regex

我想使用python re.split()通过逗号将句子分成多个字符串,但我不想申请用逗号分隔的单个单词,例如:

实施例

s = "Yes, alcohol can have a place in a healthy diet."
desired result = ["Yes, alcohol can have a place in a healthy diet."]

另一个例子:

s = "But, of course, excess alcohol is terribly harmful to health in a variety of ways, and even moderatealcohol intake is associated with an increase in the number two cause of premature death: cancer."
desired output = ["But, of course" , "excess alcohol is terribly harmful to health in a variety of ways" , "and even moderatealcohol intake is associated with an increase in the number two cause of premature death: cancer."] 

任何指针?请。

1 个答案:

答案 0 :(得分:1)

由于Python在正则表达式中不支持可变长度lookbehind assertions,我会改为使用re.findall()

In [3]: re.findall(r"\s*((?:\w+,)?[^,]+)",s)
Out[3]:
['But, of course',
 'excess alcohol is terribly harmful to health in a variety of ways',
 'and even moderatealcohol intake is associated with an increase in the number two cause of premature death: cancer.']

<强>解释

\s*        # Match optional leading whitespace, don't capture that
(          # Capture in group 1:
 (?:\w+,)? #  optionally: A single "word", followed by a comma 
 [^,]+     #  and/or one or more characters except commas
)          # End of group 1