在python中使用正则表达式分隔符分割字符串

时间:2019-02-01 11:37:26

标签: python regex

我有以下字符串:

txt='agadsfa_(2asdf_sdfsaf)asfsadf[adsf_klnalfk;jn234kmafs)adfs,nlnawr23'

这是定界符:

delimiters = " \t,;.?!-:@[](){}_*/"

作为输出,我想要以下值列表:

"agadsfa","2asdf","sdfsaf","asfsadf","adsf","klnalfk","jn234kmafs","adfs","nlnawr23"

我尝试使用正则表达式:

re.split(delimiters,txt)

但是我遇到了这个错误:

re.error: unterminated character set at position 10

这是怎么了?

4 个答案:

答案 0 :(得分:2)

您的正则表达式不正确。从评论中,您添加了一个要求,即不得触摸delimiters字符串。

然后我们需要做的是处理输入字符串并将其转换为split()可以使用的正则表达式。方法如下:

# need to enclose regex in [], we want to split on any of
# the chars; also some of the chars need to be escaped    
delimiters = ' \t,;.?!-:@[](){}_*/'
regex = delimiters.replace(']', '\]').replace('-', '\-')
regex = r'[{}]+'.format(regex)

结果符合预期:

txt = 'agadsfa_(2asdf_sdfsaf)asfsadf[adsf_klnalfk;jn234kmafs)adfs,nlnawr23'
re.split(regex, txt)
=> ['agadsfa', '2asdf', 'sdfsaf', 'asfsadf', 'adsf', 'klnalfk', 'jn234kmafs', 'adfs', 'nlnawr23']

答案 1 :(得分:0)

您必须使用|来分隔定界符:

delimiters = r' |\t|,|;|\.|\?|!|-|:|@|\[|\]|\(|\)|\{|\}|_|\*|/'
# then use this to eliminate empty strings if you have two delimiters next to each other
print([w for w in re.split(delimiters,txt) if w])   
# or list(filter(lambda a: a, re.split(delimiters,txt)))

结果是:

['agadsfa', '2asdf', 'sdfsaf', 'asfsadf', 'adsf', 'klnalfk', 'jn234kmafs', 'adfs', 'nlnawr23']

答案 2 :(得分:0)

尝试一下:

import re

txt = "agadsfa_(2asdf_sdfsaf)asfs?adf[adsf_klna!lfk;jn234kmafs)adfs, nlnawr*23"

line = re.sub(
           r"[ \t,;\.?!\-:@\[\](){}_*/]+", 
           r",", 
           txt
       )

print(line.split(","))

答案 3 :(得分:0)

Python 3代码

import re

txt="agadsfa_(2asdf_sdfsaf)asfsadf[adsf_klnalfk;jn234kmafs)adfs,nlnawr23"

delimiters = "_|;|,|\)|\(|\[|\]"

list(filter(None, re.split(delimiters, txt)))

输出

['agadsfa', '2asdf', 'sdfsaf', 'asfsadf', 'adsf', 'klnalfk', 'jn234kmafs', 'adfs', 'nlnawr23']

用|分隔符号和使用蟒列表过滤器功能,以避免空字符串