通过删除特殊字符来连接数字

时间:2015-12-09 13:37:15

标签: regex

我可以在python中使用一些基本的正则表达式来提取数字。但是我想要实现的是将除了空格之外的任何字符分隔的所有数字连接起来。

>>> ss = ["apple-12.34 ba33na fanc-14.23yapple+45+67.56",
          'hello X42 I\'m a Y-32.35 string Z30',
          'he33llo 42 I\'m a 32 string -30',
          'h3110 23 cat 444.4 rabbit 11 2 dog',
          "hello 12 hi 89"]                                                     
>>> for s in ss:
...     print re.findall("\d+", s)   

['12', '34', '33', '14', '23', '45', '67', '56']
['42', '32', '35', '30']
['33', '42', '32', '30']
['3110', '23', '444', '4', '11', '2']
['12', '89']

预期结果:

['1234', '33', '1423456756']
['42', '3235', '30']
['33', '42', '32', '30']
['3110', '23', '4444', '11', '2']
['12', '89']

3 个答案:

答案 0 :(得分:2)

试试这行:

...your for loop:
    print re.findall("\d+", re.sub(r'(?<=\d)[^a-zA-Z0-9\s]+(?=\d)','',s))

用你的例子测试,输出:

In [4]: for s in ss:
    print re.findall("\d+", re.sub(r'(?<=\d)[^a-zA-Z0-9\s]+(?=\d)','',s))
   ...:     
['1234', '33', '1423', '456756']
['42', '3235', '30']
['33', '42', '32', '30']
['3110', '23', '4444', '11', '2']
['12', '89']

自OP更改要求后更新

这个想法是删除数字之间的非空字符,然后用\d+

提取
In [4]: for s in ss:
    print re.findall("\d+", re.sub(r'(?<=\d)[^\s\d]+(?=\d)','',s))
   ...:     
['1234', '33', '1423456756']
['42', '3235', '30']
['33', '42', '32', '30']
['3110', '23', '4444', '11', '2']
['12', '89']

答案 1 :(得分:2)

替换字符串中除数字和空格以外的每个字符,然后拆分。

>>> import re
>>> line = 'apple-12.34 ba33na fanc-14.23yapple+45+67.56'
>>> list_of_numbers = re.replace('[^\d\s]', '', line).split()
>>> print list_of_numbers
['1234', '33', '1423456756']

答案 2 :(得分:1)

只需使用

re.findall("\d+", re.sub(r"(?<=\d)[^\s\d]*(?=\d)", "", s))  

请参阅this demo

使用(?<=\d)[^\s\d]*(?=\d),您将删除除空白之外的任意数量的字符和两位数之间的数字。然后,您将能够使用仅仅\d+模式提取剩余的数字序列。

结果:

['1234', '33', '1423456756']
['42', '3235', '30']
['33', '42', '32', '30']
['3110', '23', '4444', '11', '2']
['12', '89']