使用Python将包含一些关键字的字符串分成列表

时间:2012-01-18 12:30:33

标签: python regex parsing list

我正在尝试解析Ubuntu中的/ etc / network / interfaces配置文件,因此我需要将字符串分成字符串列表,其中每个字符串以给定关键字之一开头。

根据手册:

  

该文件由零个或多个“iface”,“mapping”,“auto”,“allow-”和“source”节组成。

所以如果文件包含:

auto lo eth0
allow-hotplug eth1

iface eth0-home inet static
    address 192.168.1.1
    netmask 255.255.255.0

我想获得清单:

  

['auto lo eth0','allow-hotplug eth1','iface eth0-home inet static \ n address ...']

现在我的功能如下:

def get_sections(text):
    start_indexes = [s.start() for s in re.finditer('auto|iface|source|mapping|allow-', text)]
    start_indexes.reverse()
    end_idx = -1
    res = []
    for i in start_indexes:
        res.append(text[i: end_idx].strip())
        end_idx = i
        res.reverse()
    return res

但这不好......

2 个答案:

答案 0 :(得分:3)

您可以在单个正则表达式中执行此操作:

>>> reobj = re.compile("(?:auto|allow-|iface)(?:(?!(?:auto|allow-|iface)).)*(?<!\s)", re.DOTALL)
>>> result = reobj.findall(subject)
>>> result
['auto lo eth0', 'allow-hotplug eth1', 'iface eth0-home inet static\n    address 192.168.1.1\n    netmask 255.255.255.0']

<强>解释

(?:auto|allow-|iface)   # Match one of the search terms
(?:                     # Try to match...
 (?!                    #  (as long as we're not at the start of
  (?:auto|allow-|iface) #  the next search term):
 )                      #  
 .                      # any character.
)*                      # Do this any number of times.
(?<!\s)                 # Assert that the match doesn't end in whitespace

当然,您也可以根据评论中的要求将结果映射到元组列表中:

>>> reobj = re.compile("(auto|allow-|iface)\s*((?:(?!(?:auto|allow-|iface)).)*)(?<!\s)", re.DOTALL)
>>> result = [tuple(match.groups()) for match in reobj.finditer(subject)]
>>> result
[('auto', 'lo eth0'), ('allow-', 'hotplug eth1'), ('iface', 'eth0-home inet static\n    address 192.168.1.1\n    netmask 255.255.255.0')]

答案 1 :(得分:2)

当你计算起始指数时,你非常接近一个干净的解决方案。有了这些,您可以添加一行来提取所需的切片:

indicies = [s.start() for s in re.finditer(
            'auto|iface|source|mapping|allow-', text)]
answer = map(text.__getslice__, indicies, indicies[1:] + [len(text)])
相关问题