使用lxml访问包含和不包含名称空间的元素

时间:2013-01-30 14:50:07

标签: python lxml

有没有办法在使用lxml的使用和不使用命名空间的文档中同时搜索相同的元素?作为一个例子,我想要得到元素identifier的所有出现,而不管它是否与特定命名空间相关联。我目前只能单独访问它们,如下所示。

代码:

from lxml import etree

xmlfile = etree.parse('xmlfile.xml')
root = xmlfile.getroot()

for l in root.iter('identifier'):
   print l.text

for l in root.iter('{http://www.openarchives.org/OAI/2.0/provenance}identifier'):
   print l.text

文件:xmlfile.xml

<?xml version="1.0"?>
<record>
  <header>
    <identifier>identifier1</identifier>
    <datestamp>datastamp1</datestamp>
    <setSpec>setspec1</setSpec>
  </header>
  <metadata>
    <oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
      <dc:title>title1</dc:title>
      <dc:title>title2</dc:title>
      <dc:creator>creator1</dc:creator>
      <dc:subject>subject1</dc:subject>
      <dc:subject>subject2</dc:subject>
    </oai_dc:dc>
  </metadata>
  <about>
    <provenance  xmlns="http://www.openarchives.org/OAI/2.0/provenance" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/provenance http://www.openarchives.org/OAI/2.0/provenance.xsd">
      <originDescription altered="false" harvestDate="2011-08-11T03:47:51Z">
        <baseURL>baseURL1</baseURL>
        <identifier>identifier3</identifier>
        <datestamp>datestamp2</datestamp>
        <metadataNamespace>xxxxx</metadataNamespace>
        <originDescription altered="false" harvestDate="2010-10-10T06:15:53Z">
          <baseURL>xxxxx</baseURL>
          <identifier>identifier4</identifier>
          <datestamp>2010-04-27T01:10:31Z</datestamp>
          <metadataNamespace>xxxxx</metadataNamespace>
        </originDescription>
      </originDescription>
    </provenance>
  </about>
</record>

1 个答案:

答案 0 :(得分:1)

您可以使用XPath来解决此类问题:

from lxml import etree

xmlfile = etree.parse('xmlfile.xml')
identifier_nodes = xmlfile.xpath("//*[local-name() = 'identifier']")
相关问题