Question

假设我有一个这样的文件：

document = ["This is a document\nwhich has to be splitted\nOK/Right?"]

我想在遇到“ \ n”或“ /”的任何地方拆分此文档（以开始使用）。

因此上述文档应转换为以下文档：

document = ["This is a document", "which has to be splitted", "OK", "Right?"]

我该怎么做？

请记住，文本中可能还有其他特殊字符，我暂时不想删除它们。

Answer 1

使用re根据多个字符或字符组合来分割文本字符串：

document = ["This is a document\nwhich has to be splitted\nOK/Right?"]
re.split("[\n/]",document[0])

产生请求的字符串：

['This is a document', 'which has to be splitted', 'OK', 'Right?']

Answer 2

这是Regular Expressions发光的独特情况！使用Python的re模块：

>>> import re
>>> document = ["This is a document\nwhich has to be splitted\nOK/Right?"]
>>> re.split(r"[\n/]", document[0])
['This is a document', 'which has to be splitted', 'OK', 'Right?']

This SO post对该主题的讨论最多

Answer 3

您可以使用re.split()：

import re
def split_document(document):
    if document == []:
        return []
    tmp_str = document[0]
    tmp_list = re.split("\n|/",tmp_str)
    return tmp_list+split_document(document[1:])

Answer 4

使用re.split()可能是最好的解决方案。

没有正则表达式的替代解决方案：

document = ["This is a document\nwhich has to be splitted\nOK/Right?"]
document[0] = document[0].replace('/', '\n')
document[0].splitlines()

基于多个分隔符（'\ n'，'/'）分割文本

4 个答案: