使用ElementTree跟踪父元素

时间:2014-02-05 00:19:23

标签: python xml elementtree

这是我的XML:

<beans>
    <property name = "type1">
        <list>
            <bean class = "bean1">
                <property name = "typeb">
                    <value>foo</value>
                </property>
            </bean>
            <bean class = "bean2">
                <property name ="typeb">
                    <value>bar</value>
                </property>
            </bean>
        </list>
    </property>

    <property name = "type2">
        <list>
            <bean class = "bean3">
                <list>
                    <property name= "typec">
                        <sometags/>
                    </property>
                    <property name= "typed">
                        <list>
                            <value>foo</value>
                            <value>bar</bar>
                        </list>
                    </property> 
               </list>


            </bean>
        </list>
    </property>
</beans>

现在我们要做的是扫描并删除这些元素:

            <bean class = "bean1">
                <property = "typeb">
                    <value>foo</value>
                </property>
            </bean>

            <value>foo</value>

(来自property class =“typed”元素)。

现在要做到这一点,我想做的是这样的事情:

for element in root.iter('value'):
    if element.text == 'foo':
        p1= element.getParent()
        if p1.tag == 'list': #second case scenario, remove just the value tag. 
            p1.remove(element)
        else: #first case scenario - remove entire bean
            p2 = p1.getParent()
            p3 = p2.getParent()
            p3.remove(p2)

但是ElementTree不支持孩子看到其父元素。

实现这一目标的有效方法是什么?鉴于它是一个深度XML结构,我不太喜欢在每个级别检查标记类型的递归函数的想法。

3 个答案:

答案 0 :(得分:1)

使用ElementTree,使用parent查找相关的子项:

>>> parent = root.find('.//bean[@class="bean1"]')
>>> parent
<Element 'bean' at 0x10eb31550>
>>> parent.find('.//value').text
'foo'

答案 1 :(得分:1)

以下是我如何解决它:

#gives you a list of every parent,child tuple
def iterparent(tree):
    for parent in tree.getiterator():
        for child in parent:
            yield parent, child

#recursive function. Deletes the given child node, from n parents back. 
#If n = 0 it deletes just the child. 
def removeParent(root, childToRemove, n):

    for parent, child in iterparent(root):
        if (childToRemove == child):
            if n>0:
                removeParent(root, parent, n-1)
            else: 
                parent.remove(child)


for parent, child in iterparent(root):
    if (child.tag == 'value' and (child.text in valuesToDelete):
        if (parent.tag == 'list'):
            removeParent(root, child, 0)
        else:
            removeParent(root, child, 2)    

它实际上相当优雅。我喜欢。

就我的目的而言,这种方法效果很好,但人们可能会遇到各种各样的元素结构和深度问题。

答案 2 :(得分:0)

lxml.etree模块有getparent方法。给出你的示例XML(好吧,在修复了不匹配的结束标记之后),我可以这样做:

>>> from lxml import etree
>>> 
>>> with open('data.xml') as fd:
...     doc = etree.parse(fd)
... 
>>> matches = doc.xpath('//value[text()="foo"]')
>>> element = matches[0]
>>> etree.tostring(element)
'<value>foo</value>\n        '
>>> parent = element.getparent()
>>> print etree.tostring(element)
<value>foo</value>

>>> parent = element.getparent()
>>> print etree.tostring(parent)
<property name="typeb">
          <value>foo</value>
        </property>
>>> parent = parent.getparent()
>>> print etree.tostring(parent)
<bean class="bean1">
        <property name="typeb">
          <value>foo</value>
        </property>
      </bean>

..等等。