正则表达式匹配多个分隔符

时间:2015-06-28 13:24:32

标签: python regex

我试图拆分以下分隔符:句号,分号,*,+ ,?而且 - 但是,我只想在句子开头出现时将“ - ”分开(以便不分割“非功能性”等词语

我尝试了以下但我没有取得任何进展,任何帮助将不胜感激:

sentences = re.split("[.-;]*[\+]*[\?]*[\*]*", txt)

这是我一直在尝试的示例文本:

- Text Editor: Now you can edit plain text files with airport tools
* Updated Dropbox support 
* Improved
stability
- New icon                                                                          
* See this case mis-alignment

拆分后的预期输出是项目列表:

TextEditor: Now you can edit plain text files with airport tools, Updated Dropbox support, Improved stability, New icon, See this case mis-alignment

3 个答案:

答案 0 :(得分:1)

尝试枚举这样的分隔符:

  

re.split(“[。; * +?]”)

答案 1 :(得分:1)

如果你想在一组定义的分隔符中拆分字符串而不是这样:

>>> txt = '- Text Editor: Now you can edit plain text files with airport tools'
>>> r = re.split(r'([.;*+?-]+)',txt)
>>> r
['', '-', ' Text Editor: Now you can edit plain text files with airport tools']

如果您不希望在结果列表中输入分隔符:

>>> r = re.split(r'[.;*+?-]+',txt)
>>> r
['', ' Text Editor: Now you can edit plain text files with airport tools']

编辑:在回复您的以下评论时,请使用\s作为空格:

    >>> txt = '''- Text Editor: Now you can edit plain text files with airport tools
    * Updated Dropbox support 
    * Improved
    stability
    - New icon'''
     >>> r = re.split('(^|\s)+[.;*+?-]+($|\s)+',txt) 
     >>> [i for i in r if len(i) > 1]
['Text Editor: Now you can edit plain text files with airport tools', 'Updated Dropbox support', 'Improved\n    stability', 'New icon']

答案 2 :(得分:1)

您可以使用此re.split功能。

>>> import re
>>> s = '''- Text Editor: Now you can edit plain text files with airport tools
* Updated Dropbox support 
* Improved
stability
- New icon'''
>>> [i for i in re.split(r'(?m)\s*^[-*+?]+\s*', s) if i]
['Text Editor: Now you can edit plain text files with airport tools', 'Updated Dropbox support', 'Improved\nstability', 'New icon']