使用Python中的lxml从每个名称空间属性

时间:2016-06-01 18:00:21

标签: python xml lxml

以下是提取FODT文件的代码片段:

<office:document xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0" xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0" xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0"
xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0" xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0" xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0" xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0" xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0" xmlns:math="http://www.w3.org/1998/Math/MathML" xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0" xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0" xmlns:config="urn:oasis:names:tc:opendocument:xmlns:config:1.0" xmlns:ooo="http://openoffice.org/2004/office" xmlns:ooow="http://openoffice.org/2004/writer" xmlns:oooc="http://openoffice.org/2004/calc" xmlns:dom="http://www.w3.org/2001/xml-events" xmlns:xforms="http://www.w3.org/2002/xforms" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:rpt="http://openoffice.org/2005/report" xmlns:of="urn:oasis:names:tc:opendocument:xmlns:of:1.2" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:grddl="http://www.w3.org/2003/g/data-view#" xmlns:officeooo="http://openoffice.org/2009/office" xmlns:tableooo="http://openoffice.org/2009/table" xmlns:drawooo="http://openoffice.org/2010/draw" xmlns:calcext="urn:org:documentfoundation:names:experimental:calc:xmlns:calcext:1.0" xmlns:loext="urn:org:documentfoundation:names:experimental:office:xmlns:loext:1.0" xmlns:field="urn:openoffice:names:experimental:ooo-ms-interop:xmlns:field:1.0" xmlns:formx="urn:openoffice:names:experimental:ooxml-odf-interop:xmlns:form:1.0" xmlns:css3t="http://www.w3.org/TR/css3-text/" office:version="1.2" office:mimetype="application/vnd.oasis.opendocument.text">

我想分隔每个命名空间的内容。例如,我想提取xmlns:office =“urn:oasis:names:tc:opendocument:xmlns:office:1.0”,xmlns:style =“urn:oasis:names:tc:opendocument:xmlns:style:1.0”等等包括命名空间名称本身。

如何使用lxml?

1 个答案:

答案 0 :(得分:2)

根元素上的nsmap属性包含一个包含所有已声明名称空间的字典。例如:

from lxml import etree

XML = "your XML document here..."

root = etree.fromstring(XML)
for ns in sorted(root.nsmap.items()):
    print ns

输出:

('calcext', 'urn:org:documentfoundation:names:experimental:calc:xmlns:calcext:1.0')
('chart', 'urn:oasis:names:tc:opendocument:xmlns:chart:1.0')
('config', 'urn:oasis:names:tc:opendocument:xmlns:config:1.0')
('css3t', 'http://www.w3.org/TR/css3-text/')
('dc', 'http://purl.org/dc/elements/1.1/')
('dom', 'http://www.w3.org/2001/xml-events')
('dr3d', 'urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0')
('draw', 'urn:oasis:names:tc:opendocument:xmlns:drawing:1.0')
('drawooo', 'http://openoffice.org/2010/draw')
('field', 'urn:openoffice:names:experimental:ooo-ms-interop:xmlns:field:1.0')
('fo', 'urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0')
('form', 'urn:oasis:names:tc:opendocument:xmlns:form:1.0')
('formx', 'urn:openoffice:names:experimental:ooxml-odf-interop:xmlns:form:1.0')
('grddl', 'http://www.w3.org/2003/g/data-view#')
('loext', 'urn:org:documentfoundation:names:experimental:office:xmlns:loext:1.0')
('math', 'http://www.w3.org/1998/Math/MathML')
('meta', 'urn:oasis:names:tc:opendocument:xmlns:meta:1.0')
('number', 'urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0')
('of', 'urn:oasis:names:tc:opendocument:xmlns:of:1.2')
('office', 'urn:oasis:names:tc:opendocument:xmlns:office:1.0')
('officeooo', 'http://openoffice.org/2009/office')
('ooo', 'http://openoffice.org/2004/office')
('oooc', 'http://openoffice.org/2004/calc')
('ooow', 'http://openoffice.org/2004/writer')
('rpt', 'http://openoffice.org/2005/report')
('script', 'urn:oasis:names:tc:opendocument:xmlns:script:1.0')
('style', 'urn:oasis:names:tc:opendocument:xmlns:style:1.0')
('svg', 'urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0')
('table', 'urn:oasis:names:tc:opendocument:xmlns:table:1.0')
('tableooo', 'http://openoffice.org/2009/table')
('text', 'urn:oasis:names:tc:opendocument:xmlns:text:1.0')
('xforms', 'http://www.w3.org/2002/xforms')
('xhtml', 'http://www.w3.org/1999/xhtml')
('xlink', 'http://www.w3.org/1999/xlink')
('xsd', 'http://www.w3.org/2001/XMLSchema')
('xsi', 'http://www.w3.org/2001/XMLSchema-instance')