Question

我必须解析包含

等条目的xml文件

<error code="UnknownDevice">
    <description />
</error>

在其他地方定义为

<group name="error definitions">
     <errordef id="0x11" name="UnknownDevice">
        <description>Indicated device is unknown</description>
     </errordef>
     ...
</group>

给出

import xml.etree.ElementTree as ET

parser = ET.XMLParser()
parser.parser.UseForeignDTD(True)

tree = ET.parse(inputFileName, parser=parser)
root = tree.getroot()

如何获取errorDef的这些值？我的意思是id和description的价值？

我如何搜索＆amp;使用unknownDevice？

提取这些值

[更新]错误组具有不同的名称，但始终采用“XXX错误定义”，“YYY错误定义”等格式

此外，它们似乎嵌套在不同文档的不同深度。

鉴于错误的标题，例如“unknownDevice”，如何搜索根目录下的所有内容以获取相应的id和description values?

我可以直接使用例如“unknownDevice”，或者我是否必须首先搜索错误组？

Answer 1

首先，将错误定义解析为字典：

errors = {
    errordef.attrib["name"]: {"id": errordef.attrib.get("id"), "description": errordef.findtext("description")}
    for errordef in root.xpath(".//group[@name='error definitions']/errordef[@name]")
}

然后，每次您需要获取错误ID和描述时，请按代码查找：

error_code = root.find("error").attrib["code"]
print(errors.get(error_code, "Unknown Error"))

请注意，xpath()方法来自lxml.etree。如果您使用xml.etree.ElementTree，请将xpath()替换为findall() - xml.etree.ElementTree提供的有限XPath支持对于提供的表达式已足够。

Answer 2

如果你有这个：

<group name="error definitions">
     <errordef id="0x11" name="UnknownDevice">
        <description>Indicated device is unknown</description>
     </errordef>
     ...
</group>

并且您希望为每个description元素获取id和errordef的值，您可以这样做：

for err in tree.xpath('//errordef'):
    print err.get('id'), err.find('description').text

这会给你类似的东西：

0x11 Indicated device is unknown

Answer 3

您希望获取每个errordef元素的description和id值，您可以这样做：

import xml.etree.ElementTree as ET
dict01={}
tree=ET.parse('grpError.xml')
root=tree.getroot()
print (root)
docExe=root.findall('errordef') #Element reference
dict01=docExe[0].attrib #Store Attributes in dictionary
print (dict01)
print (dict01['id']) #Attributes of an element
print (dict01['name']) #Attributes of an element
print (docExe[0].find('description').text) #Child Elements inside parent Element

输出是：

<Element 'group' at 0x000001A582EDB4A8>
{'id': '0x11', 'name': 'UnknownDevice'}
0x11
UnknownDevice
Indicated device is unknown

Answer 4

你需要一个选择器，虽然我不确定你能用lxml做到这一点。它有css选择器，但我没有找到任何东西来选择doc中的“id”... 我只使用lxml删除/添加到HTML的东西。也许看看scrapy？使用scrapy，当你加载你的html时，它会是这样的。

response.xpath('//div[@id="0x11"]/text()').extract()

使用xml.etree.ElementTree解析XML文件时出现问题

4 个答案: