使用re.sub清除嵌套列表

时间:2019-08-22 20:22:16

标签: regex python-3.x list

我需要清除一组嵌套列表(不超过三个)。类似的例子是这样:

test = [['qte%#', 'EKO*^'], ['eoim&', ['35ni%', 'mmie']]]

我想运行以下命令:

re.sub(r'[^a-zA-Z\d\[\] ], '',  test)

我知道这里的问题是我需要遍历嵌套列表,但是我在维护结构时遇到了麻烦。也许还有一种更简单的方法来解决该问题。我尝试过这种变化:

for a in test:
    for b in a:
        if isinstance(b, list):
            for c in b:
                c = re.sub(r'[^a-zA-Z\d\[\] ]', ' ', c)
                clean.append(c)
        else:
            print(b)
            b = re.sub(r'[^a-zA-Z\d\[\] ]', ' ', b)
            clean.append(b)

2 个答案:

答案 0 :(得分:1)

此脚本将按原样保留列表的结构-只需应用re.sub函数:

test = [['qte%#', 'EKO*^'], ['eoim&', ['35ni%', 'mmie']]]

import re

def clean(lst):
    if not isinstance(lst, list):
        return re.sub(r'[^a-zA-Z\d\[\] ]', '', lst)

    return [clean(v) for v in lst]

print( clean(test) )

打印:

[['qte', 'EKO'], ['eoim', ['35ni', 'mmie']]]

答案 1 :(得分:0)

由于您只需要将所有嵌套列表编译为一个扁平化列表,因此可以在列表上使用flatten function并对其进行正则表达式。

def flatten(lst):
    flat = []
    for x in lst:
        if hasattr(x, '__iter__') and not isinstance(x, basestring):
            flat.extend(flatten(x))
        else:
            flat.append(x)
    return flat

clean = []
for c in flatten(test):
    clean.append(re.sub(r'[^a-zA-Z\d\[\] ]', ' ', c))
相关问题