模式后面没有数字正则表达式python

时间:2017-10-20 18:04:29

标签: python regex python-2.7

我有一个包含多个值的列表,例如:

l = [
 '210-4521268-18',
 '210.0622277.13', 
 'rachid 312-0653348-08',
 '3000401732 00000 064 77063',
 ....,
 '312-0653348-08 rachid'
]

我只想获取与以下正则表达式对应的格式为“210.0622277.13”的项目:

r'\d{3}\D?\d{7}\D?\d{2}'

到目前为止,我已经编写了以下正则表达式来获取这些值:

regex = re.compile(r'((\d{3}\D?\d{7}\D?\d{2}$)|(^\d{3}\D?\d{7}\D?\d{2}))')
# loop through the list to fetch desired part of value
for line in l:
   match = regex.search(line)
   if match:
       print('line : {} found a match {}'.format(line, line[match.start():match.end()]))
   else:
      print('line : {} found no match'.format(line)

问题是值'3000401732 00000 064 77063'匹配

如何优化此正则表达式,使其不再接受所需模式之后的数字,以防在模式之后有更多数字将丢弃该值。

我需要捕捉的比赛是:

l = [
   '210-4521268-18',
   '210.0622277.13', 
   '312-0653348-08',
   '312-0653348-08'
]

所以输出将是这样的:

line : 210-4521268-18 found a match 210-4521268-18
line : 210.0622277.13 found a match 210.0622277
line : rachid 312-0653348-08 found a match 312-0653348-08
line : 3000401732 00000 064 77063 found no match
line : 312-0653348-08 rachid found a match 312-0653348-08

6 个答案:

答案 0 :(得分:2)

这应该适合你:

\d{3}[^\d]\d{7}[^\d]\d{2}

现场演示here

<强>解释

\d{3}:寻找3位数

[^\d]\d{7}:查找非数字,然后查找7位

[^\d]\d{2}:再次查找非数字,然后查找2位

答案 1 :(得分:1)

你可以试试这个:

import re
l = [
'210-4521268-18',
'210.0622277.13', 
'rachid 312-0653348-08',
'3000401732 00000 064 77063',
'312-0653348-08 rachid'
]
final_vals = [re.findall('\d+[\W]\d+[\W]\d+', i)[0] for i in l if re.findall('\d+\.|-\d+\.|-\d+', i)]

输出:

['210-4521268-18', '210.0622277.13', '312-0653348-08', '312-0653348-08']

答案 2 :(得分:1)

使用以下方法:

l = [
 '210-4521268-18',
 '210.0622277.13',
 'rachid 312-0653348-08',
 '3000401732 00000 064 77063',
 '312-0653348-08 rachid'
]

regex = re.compile(r'\d{3}(?:\.|-)\d{7}(?:\.|-)\d{2}')    
for line in l:
   match = regex.search(line)
   if match:
       print('line : {} found a match {}'.format(line, match.group()))
   else:
       print('line : {} found no match'.format(line))

输出:

line : 210-4521268-18 found a match 210-4521268-18
line : 210.0622277.13 found a match 210.0622277.13
line : rachid 312-0653348-08 found a match 312-0653348-08
line : 3000401732 00000 064 77063 found no match
line : 312-0653348-08 rachid found a match 312-0653348-08

答案 3 :(得分:0)

这些是有效的匹配 - 该字符串在返回的字符串中。

尝试在前面添加^,在后面添加/或以其他方式添加,以指定其他数据将无法匹配。

regex = re.compile(r'^((\d{3}\D?\d{7}\D?\d{2}$)|(^\d{3}\D?\d{7}\D?\d{2}))$')

答案 4 :(得分:0)

您可以按点添加过滤器“。”像这样:

import re

l = [
 '210-4521268-18',
 '210.0622277.13', 
 'rachid 312-0653348-08',
 '3000401732 00000 064 77063',
 '312-0653348-08 rachid'
]

regex = re.compile(r'\b(\w+[.]\w+)')
# loop through the list to fetch desired part of value
for line in l:
   match = regex.search(line)
   if match:
       print('line : {} found a match {}'.format(line, line[match.start():match.end()]))
   else:
      print('line : {} found no match'.format(line))

结果我得到了:

line : 210-4521268-18 found no match
line : 210.0622277.13 found a match 210.0622277
line : rachid 312-0653348-08 found no match
line : 3000401732 00000 064 77063 found no match
line : 312-0653348-08 rachid found no match

答案 5 :(得分:0)

尝试指定点明确表示并标记开始和结束。

r'^\d{3}[^\d]\d{7}[^\d]\d{3}$'