从列表

时间:2017-12-23 15:02:05

标签: python regex

我得到了以下名单" phonenumbers"。我很难删除包含' \ n \ t \ t \ t \ t'和' \ n \ t \ t \ t \ t \ t'。 我试过"尝试除了" -methode并删除(' \ n \ t \ t \ t \ t \ t \ t')但无法使其正常工作。 有什么提示吗?

  

['(02271)6 79',' 70',' \ n \ t \ t \ t \ t','(02271)6 79',' \ n \ t \ t \ t \ t \ t&#39 ;,' 70',' \ n \ t \ t \ t',' \ n \ t \ t \ t','(02181)27 0', ' \ n \ t \ t \ t \ t',' 3-0',' \ n \ t \ t \ t \ t&t 39;,' \ n \ t \ t \ t','(02181)27 0',' \ n \ t \ t \ t \ t&t 39;,' 3-0&# 39;,' \ n \ t \ t \ t',' \ n \ t \ t \ t \ t','(02161)24 19',&# 39; \ n \ t \ t \ t \ t \ t,',' 40',' \ n \ t \ t \ t',' \ n \ t \ t \ t \ t','(02161)24 19', ' \ n \ t \ t \ t \ t \ t \ t',' 40',' \ n \ t \ t \ t',' \ n \ t \ t \ t','(02131)66 67', ' \ n \ t \ t \ t \ t \ t \ t',' 10',' \ n \ t \ t \ t',' \ n \ t \ t \ t','(02131)66 67', ' \ n \ t \ t \ t \ t \ t \ t',' 10',' \ n \ t \ t \ t',' \ n \ t \ t \ t','(02103)39 00', ' \ n \ t \ t \ t \ t \ t \ t',' 93',' \ n \ t \ t \ t',' \ n \ t \ t \ t \ t','(02103)39 00', ' \ n \ t \ t \ t \ t \ t \ t',' 93',' \ n \ t \ t \ t',' \ n \ t \ t \ t \ t \ t','(02173)2 04 7' ,' \ n \ t \ t \ t \ t',' 3-0',' \ n \ t \ t \ t \ t&t 39;,' \ n \ t \ t \ t','(02173)2 04 7',' \ n \ t \ t \ t \ t \ tt',' 3- 0',' \ n \ t \ t \ t \ t',' \ n \ t \ t \ t','(02235)9 23 04' ,' \ n \ t \ t \ t \ t \ t,',' 30',' \ n \ t \ t \ t',' \ n \ t \ t \ t&t 39,'(02235)9 23 04' ,' \ n \ t \ t \ t \ t \ t,',' 30',' \ n \ t \ t \ t',' \ n \ t \ t \ t \ t',' \ n \ t \ t \ t \ t \ t','(0221)3 46 79 40',' \ n \ t \ t \ t',' \ n \ t \ t \ t \ t&#39 ;,' \ n \ t \ t \ t \ tt','(0221)3 46 79 40',' \ n \ t \ t \ t' ,' \ n \ t \ t \ t','(02232)4 23',' \ n \ t \ t \ t \ t \ t&t 39;,&# 39; 05',' \ n \ t \ t \ t',' \ n \ t \ t \ t \ t','(02232)4 23', ' \ n \ t \ t \ t \ t \ t \ t',' 05',' \ n \ t \ t \ t',' \ n \ t \ t \ t&t 39,'(0157)86 85 74' ,' \ n \ t \ t \ t \ t \ t,',' 43',' \ n \ t \ t \ t',' \ n \ t \ t \ t \ t&t 39,'(0157)86 85 74' ,' \ n \ t \ t \ t \ t \ t,',' 43',' \ n \ t \ t \ t',' \ n \ t \ t \ t \ t','(02181)2 78 11' ,' \ n \ t \ t \ t \ t \ t,',' 47',' \ n \ t \ t \ t',' \ n \ t \ t \ t&t 39,'(02181)2 78 11' ,' \ n \ t \ t \ t \ t \ t,',' 47',' \ n \ t \ t \ t',' \ n \ t \ t \ t','(02181)47 49 0' ,' \ n \ t \ t \ t \ t',' 0-0',' \ n \ t \ t \ t \ t&t 39;,' \ n \ t \ t \ t','(02181)47 49 0',' \ n \ t \ t \ t \ t \ tt',' 0- 0',' \ n \ t \ t \ t',' \ n \ t \ t \ t','(02202)1 88', ' \ n \ t \ t \ t \ t \ t \ t',' 60',' \ n \ t \ t \ t',' \ n \ t \ t \ t \ t','(02202)1 88', ' \ n \ t \ t \ t \ t \ t \ t',' 60',' \ n \ t \ t \ t',' \ n \ t \ t \ t','(0211)23 80', ' \ n \ t \ t \ t \ t \ t \ t',' 70',' \ n \ t \ t \ t',' \ n \ t \ t \ t','(0211)23 80', ' \ n \ t \ t \ t \ t \ t \ t',' 70',' \ n \ t \ t \ t',' \ n \ t \ t \ t \ t&t 39,'(02235)9 23 0' ,' \ n \ t \ t \ t \ t',' 4-0',' \ n \ t \ t \ t \ t&t 39;,' \ n \ t \ t \ t','(02235)9 23 0',' \ n \ t \ t \ t \ t \ tt',' 4- 0',' \ n \ t \ t \ t \ tt']

3 个答案:

答案 0 :(得分:1)

你可以选择像

这样的简单表达式
from itertools import product

def brute_force():
    for length in range(min_length, max_length + 1):
        for p in product(chars, repeat=length):
            guess = ''.join(p)
            if guess == password:
                return guess

^\s+$

Python

这会产生并剥离所有不仅从开始到结束的空格的数字:

import re

lst = ['(02271) 6 79', ' 70', '\n\t\t\t', '(02271) 6 79', '\n\t\t\t\t', ' 70', '\n\t\t\t', '\n\t\t\t', '(02181) 27 0', '\n\t\t\t\t', '3-0', '\n\t\t\t', '\n\t\t\t', '(02181) 27 0', '\n\t\t\t\t', '3-0', '\n\t\t\t', '\n\t\t\t', '(02161) 24 19', '\n\t\t\t\t', ' 40', '\n\t\t\t', '\n\t\t\t', '(02161) 24 19', '\n\t\t\t\t', ' 40', '\n\t\t\t', '\n\t\t\t', '(02131) 66 67', '\n\t\t\t\t', ' 10', '\n\t\t\t', '\n\t\t\t', '(02131) 66 67', '\n\t\t\t\t', ' 10', '\n\t\t\t', '\n\t\t\t', '(02103) 39 00', '\n\t\t\t\t', ' 93', '\n\t\t\t', '\n\t\t\t', '(02103) 39 00', '\n\t\t\t\t', ' 93', '\n\t\t\t', '\n\t\t\t', '(02173) 2 04 7', '\n\t\t\t\t', '3-0', '\n\t\t\t', '\n\t\t\t', '(02173) 2 04 7', '\n\t\t\t\t', '3-0', '\n\t\t\t', '\n\t\t\t', '(02235) 9 23 04', '\n\t\t\t\t', ' 30', '\n\t\t\t', '\n\t\t\t', '(02235) 9 23 04', '\n\t\t\t\t', ' 30', '\n\t\t\t', '\n\t\t\t', '\n\t\t\t\t', '(0221) 3 46 79 40', '\n\t\t\t', '\n\t\t\t', '\n\t\t\t\t', '(0221) 3 46 79 40', '\n\t\t\t', '\n\t\t\t', '(02232) 4 23', '\n\t\t\t\t', ' 05', '\n\t\t\t', '\n\t\t\t', '(02232) 4 23', '\n\t\t\t\t', ' 05', '\n\t\t\t', '\n\t\t\t', '(0157) 86 85 74', '\n\t\t\t\t', ' 43', '\n\t\t\t', '\n\t\t\t', '(0157) 86 85 74', '\n\t\t\t\t', ' 43', '\n\t\t\t', '\n\t\t\t', '(02181) 2 78 11', '\n\t\t\t\t', ' 47', '\n\t\t\t', '\n\t\t\t', '(02181) 2 78 11', '\n\t\t\t\t', ' 47', '\n\t\t\t', '\n\t\t\t', '(02181) 47 49 0', '\n\t\t\t\t', '0-0', '\n\t\t\t', '\n\t\t\t', '(02181) 47 49 0', '\n\t\t\t\t', '0-0', '\n\t\t\t', '\n\t\t\t', '(02202) 1 88', '\n\t\t\t\t', ' 60', '\n\t\t\t', '\n\t\t\t', '(02202) 1 88', '\n\t\t\t\t', ' 60', '\n\t\t\t', '\n\t\t\t', '(0211) 23 80', '\n\t\t\t\t', ' 70', '\n\t\t\t', '\n\t\t\t', '(0211) 23 80', '\n\t\t\t\t', ' 70', '\n\t\t\t', '\n\t\t\t', '(02235) 9 23 0', '\n\t\t\t\t', '4-0', '\n\t\t\t', '\n\t\t\t', '(02235) 9 23 0', '\n\t\t\t\t', '4-0', '\n\t\t\t']

rx = re.compile(r'^\s+$')

lst = [item.strip() for item in lst if not rx.match(item)]
print(lst)

<小时/> 正如@dawg指出的那样,实际上并不需要正则表达式:

['(02271) 6 79', '70', '(02271) 6 79', '70', '(02181) 27 0', '3-0', '(02181) 27 0', '3-0', '(02161) 24 19', '40', '(02161) 24 19', '40', '(02131) 66 67', '10', '(02131) 66 67', '10', '(02103) 39 00', '93', '(02103) 39 00', '93', '(02173) 2 04 7', '3-0', '(02173) 2 04 7', '3-0', '(02235) 9 23 04', '30', '(02235) 9 23 04', '30', '(0221) 3 46 79 40', '(0221) 3 46 79 40', '(02232) 4 23', '05', '(02232) 4 23', '05', '(0157) 86 85 74', '43', '(0157) 86 85 74', '43', '(02181) 2 78 11', '47', '(02181) 2 78 11', '47', '(02181) 47 49 0', '0-0', '(02181) 47 49 0', '0-0', '(02202) 1 88', '60', '(02202) 1 88', '60', '(0211) 23 80', '70', '(0211) 23 80', '70', '(02235) 9 23 0', '4-0', '(02235) 9 23 0', '4-0']

答案 1 :(得分:0)

试试这个,

result = [i for i in lst if not i.endswith('\t\t')]

答案 2 :(得分:0)

您可以使用list-comprehension创建strings列表,其中每个人都必须通过allc个字符string的测试1}}是in'\t\n'。我认为这是最有效的,通用的解决方案,适用于只包含stringstabs的{​​{1}},在Python中也非常易读:

newlines

给出正确的结果:

[i for i in lst if all(c not in '\t\n' for c in i)]

你也可以使用str.isspace()这个更短,但可能(我不确定['(02271) 6 79', ' 70', '(02271) 6 79', ' 70', '(02181) 27 0', '3-0', '(02181) 27 0', '3-0', '(02161) 24 19', ' 40', '(02161) 24 19', ' 40', '(02131) 66 67', ' 10', '(02131) 66 67', ' 10', '(02103) 39 00', ' 93', '(02103) 39 00', ' 93', '(02173) 2 04 7', '3-0', '(02173) 2 04 7', '3-0', '(02235) 9 23 04', ' 30', '(02235) 9 23 04', ' 30', '(0221) 3 46 79 40', '(0221) 3 46 79 40', '(02232) 4 23', ' 05', '(02232) 4 23', ' 05', '(0157) 86 85 74', ' 43', '(0157) 86 85 74', ' 43', '(02181) 2 78 11', ' 47', '(02181) 2 78 11', ' 47', '(02181) 47 49 0', '0-0', '(02181) 47 49 0', '0-0', '(02202) 1 88', ' 60', '(02202) 1 88', ' 60', '(0211) 23 80', ' 70', '(0211) 23 80', ' 70', '(02235) 9 23 0', '4-0', '(02235) 9 23 0', '4-0'] 确定)稍微慢,因为它会检查 all < / em> 100%字符:

whitespace

给出相同的结果。