Question

我的XML文件可用here。虽然我能够从该文件中获取根节点及其子节点。但是，我无法获得所需的那个。我想获取<ce:section-title>Methods</ce:section-title>的内容我已经尝试了xml和lxml包。

当我使用以下内容时，

 tree = lxml.etree.parse(fname) #fname is xml filename
 root= tree.getroot()

print(root[5].findall("ce:section-title",root.nsmap)

这只是给我空[]括号。当我使用以下命令时，它会给出相同的空括号：

for item in tree.iter('{http://www.elsevier.com/xml/ja/dtd}ce:section-title'):
    print(item)

我确实尝试使用here提供的解决方案来解决，但是此代码出现以下错误：

ns = {"ce":"http://www.elsevier.com/xml/common/dtd"}
print(root.findall("ce:title", ns).text)

AttributeError：'NoneType'对象没有属性'text'

任何方向都会有所帮助

Answer 1

它应该与findall(.//ce:section-title, root.nsmap)一起使用。

以.//为前缀，您正在上下文节点下方的所有级别搜索section-title的后代。使用findall(ce:section-title, root.nsmap)，只能找到直接子元素。

示例：

from lxml import etree

tree = etree.parse("data.xml")  # Your XML
root = tree.getroot()

for e in root.findall(".//ce:section-title", root.nsmap):
    print(e.text)

输出：

Abstract
Keywords
Introduction
Materials and methods
Results
The appearing species by taxon
List of regional appearing species
Discussion
Acknowledgments
References

命名空间XML解析问题。显示为空。在Python中

1 个答案: