使用python合并xml中的特定子节点

时间:2017-10-02 05:57:33

标签: python xml

我想将xml文件的某些子元素合并在一起。以下是我的格式:

 <?xml version='1.0' encoding='ISO-8859-1'?><?xml-stylesheet type='text/xsl' href='image_metadata_stylesheet.xsl'?><dataset><name>imglab dataset</name><comment>Created by imglab tool.</comment><images>
<image file='/home/user126043/Documents/testimages/9935.jpg'>
<box top='329' left='510' width='385' height='534'>
<label>Pirelli
</label></box></image>
<image file='/home/user126043/Documents/testimages/9935.jpg'>
<box top='360' left='113' width='440' height='147'>
<label>Pirelli
</label></box></image>
<image file='/home/user126043/Documents/testimages/9921.jpg'>
<box top='329' left='510' width='385' height='534'>
<label>Pirelli
</label></image>
</images></dataset>

在上面的xml中,我有两次指定的图像99.jpg的盒子坐标,我要合并为一个。我想删除针对同一图像重复显示的<image>标记,并希望合并其自己的图像标记中每个图像的所有框坐标。我从未使用过XML,因此我不确定我使用的定义是否正确。所需的输出是:

<?xml version='1.0' encoding='ISO-8859-1'?><?xml-stylesheet type='text/xsl' href='image_metadata_stylesheet.xsl'?><dataset><name>imglab dataset</name><comment>Created by imglab tool.</comment><images>
    <image file='/home/user126043/Documents/testimages/9935.jpg'>
    <box top='329' left='510' width='385' height='534'>
    <label>Pirelli
    </label></box>
    <box top='360' left='113' width='440' height='147'>
    <label>Pirelli
    </label></box></image>
    <image file='/home/user126043/Documents/testimages/9921.jpg'>
    <box top='329' left='510' width='385' height='534'>
    <label>Pirelli
    </label></image>
    </images></dataset>

1 个答案:

答案 0 :(得分:2)

您可以尝试使用模块xml.etree.ElementTree

import xml.etree.ElementTree as ET
tree = ET.parse('dataset.xml')
root = tree.getroot()
file_dict = dict()
for image in root.iter('image'):    
    file_str = image.get('file')    
    if file_str in file_dict:
        root.find('images').remove(image) #remove the duplicate one
        root.find('images').find("./image[@file='"+file_str+"']").append(image.find('box')) #append duplicated subelement to merge with same image element
    else:
        file_dict[file_str]=image
print(ET.tostring(root))

新的root将是:

<dataset><images>
<image file="/home/user126043/Documents/testimages/9941.jpg">
<box height="147" left="113" top="360" width="440">
<label>Pirelli
</label></box></image>
<image file="/home/user126043/Documents/testimages/99.jpg">
<box height="276" left="247" top="160" width="228">
<label>Pirelli
</label></box><box height="276" left="247" top="439" width="506">
<label>Pirelli
</label></box></image>
</images></dataset>
相关问题