Question

我正在尝试解析Ubuntu中的/ etc / network / interfaces配置文件，因此我需要将字符串分成字符串列表，其中每个字符串以给定关键字之一开头。

根据手册：

该文件由零个或多个“iface”，“mapping”，“auto”，“allow-”和“source”节组成。

所以如果文件包含：

auto lo eth0
allow-hotplug eth1

iface eth0-home inet static
    address 192.168.1.1
    netmask 255.255.255.0

我想获得清单：

['auto lo eth0'，'allow-hotplug eth1'，'iface eth0-home inet static \ n address ...']

现在我的功能如下：

def get_sections(text):
    start_indexes = [s.start() for s in re.finditer('auto|iface|source|mapping|allow-', text)]
    start_indexes.reverse()
    end_idx = -1
    res = []
    for i in start_indexes:
        res.append(text[i: end_idx].strip())
        end_idx = i
        res.reverse()
    return res

但这不好......

Answer 1

您可以在单个正则表达式中执行此操作：

>>> reobj = re.compile("(?:auto|allow-|iface)(?:(?!(?:auto|allow-|iface)).)*(?<!\s)", re.DOTALL)
>>> result = reobj.findall(subject)
>>> result
['auto lo eth0', 'allow-hotplug eth1', 'iface eth0-home inet static\n    address 192.168.1.1\n    netmask 255.255.255.0']

<强>解释

(?:auto|allow-|iface)   # Match one of the search terms
(?:                     # Try to match...
 (?!                    #  (as long as we're not at the start of
  (?:auto|allow-|iface) #  the next search term):
 )                      #  
 .                      # any character.
)*                      # Do this any number of times.
(?<!\s)                 # Assert that the match doesn't end in whitespace

当然，您也可以根据评论中的要求将结果映射到元组列表中：

>>> reobj = re.compile("(auto|allow-|iface)\s*((?:(?!(?:auto|allow-|iface)).)*)(?<!\s)", re.DOTALL)
>>> result = [tuple(match.groups()) for match in reobj.finditer(subject)]
>>> result
[('auto', 'lo eth0'), ('allow-', 'hotplug eth1'), ('iface', 'eth0-home inet static\n    address 192.168.1.1\n    netmask 255.255.255.0')]

Answer 2

当你计算起始指数时，你非常接近一个干净的解决方案。有了这些，您可以添加一行来提取所需的切片：

indicies = [s.start() for s in re.finditer(
            'auto|iface|source|mapping|allow-', text)]
answer = map(text.__getslice__, indicies, indicies[1:] + [len(text)])

使用Python将包含一些关键字的字符串分成列表

2 个答案: