使用python etree从xml中删除模式

时间:2018-10-16 17:25:30

标签: python xml parsing recursion

我有一个xml文件,如下所示:

<?xml version="1.0" encoding="UTF-8"?>
<kw name="k1" library="k1">
    <kw name="k2" library="k2">
        <kw name="Keep This" library="Keep This">
            <c name="c4" library="c4">
            </c>
        </kw>
        <kw name="k3" library="k3">
            <c name="c4" library="c4">
            </c>
        </kw>
        <c name="c3" library="c3">
            <c name="c4" library="c4">
            </c>
        </c>
    </kw>
</kw>

我想删除表,但除外,请遵守以下规则:

  1. 标签 = kw ,并且属性包含“保持此状态”
  2. 这些标签不是kw

另一个表应从xml中删除

所以输出应该像:

<?xml version="1.0" encoding="UTF-8"?>
<kw name="k1" library="k1">
    <kw name="k2" library="k2">
        <kw name="Keep This" library="Keep This">
            <c name="c4" library="c4">
            </c>
        </kw>
        <c name="c3" library="c3">
            <c name="c4" library="c4">
            </c>
        </c>
    </kw>
</kw>

跟踪递归函数真的很困难,有人可以帮助我还是推荐另一种方式来满足我的要求?

import xml.etree.ElementTree as ET
tree = ET.parse('a.xml')
root = tree.getroot()


def check(root):
    # if subchild exist "kw" tag, parse to the subchild
    if 'kw' in ([child.tag for child in root]):
        for child in root:
            flag = check(child)
            # remove
            if not flag:
                root.remove(child)
    # if subchild dose not exist "kw" tag
    else:
        if root.tag == 'kw':
            # Check if itself's tag is kw and "Keep this"
            if 'Keep This' in [root.attrib[child] for child in root.attrib]:
                return True
            # Remove if itself's tag is kw but without "Keep this"
            else:
                print ('remove')
                return False
        else:
            return True

check(root)

ET.dump(root)

1 个答案:

答案 0 :(得分:1)

您可以改为使用以下递归函数。请注意,使用异常作为通知父级删除子级的方法,因为必须从父级执行节点删除,并且布尔返回值仅指示带有标签kw和子级{找到Keep This的属性值。这样做的好处是,当根节点下根本没有找到“保持”节点时,通知调用者,根节点根据规则应将其删除,但不能删除,因为它是根节点:

import xml.etree.ElementTree as ET

def check(node):
    if node.tag == 'kw' and any(value == 'Keep This' for value in node.attrib.values()):
        return True
    keep = False
    removals = []
    for child in node:
        try:
            if check(child):
                keep = True
        except RuntimeError:
            removals.append(child)
    for child in removals:
        node.remove(child)
    if node.tag == 'kw' and not keep:
        raise RuntimeError('No "keep" node found under this node')
    return keep

tree = ET.parse('a.xml')
root = tree.getroot()
check(root)
ET.dump(root)

使用示例输入,将输出:

<kw library="k1" name="k1">
    <kw library="k2" name="k2">
        <kw library="Keep This" name="Keep This">
            <c library="c4" name="c4">
            </c>
        </kw>
        <c library="c3" name="c3">
            <c library="c4" name="c4">
            </c>
        </c>
    </kw>
</kw>