使用python在xml中解析CDATA

时间:2012-12-04 00:21:03

标签: python xml parsing lxml

我需要解析一个XML文件,其中包含许多CDATA块,我需要保留这些块以供以后绘图:

<process id="process1"> <log name="name1" device="device1"><![CDATA[timestamp value]]]></log> <log name="name2" device="device2"><![CDATA[timestamp value, timestamp value, timestamp]]]></log> </process>

我需要反复快速地做到这一点,我正在寻找最佳方法。我已经读过ElementTree是方法中比较快的,但我对其他建议持开放态度。

1 个答案:

答案 0 :(得分:12)

以下是两个如何操作的示例:

from lxml import etree
import xml.etree.ElementTree as ElementTree

CONTENT = """
<process id="process1">
 <log name="name1" device="device1"><![CDATA[timestamp value]]></log>
 <log name="name2" device="device2"><![CDATA[timestamp value, timestamp value, timestamp]]></log>
</process>
"""

def parse_with_lxml():
    root = etree.fromstring(CONTENT)
    for log in root.xpath("//log"):
        print log.text

def parse_with_stdlib():
    root = ElementTree.fromstring(CONTENT)
    for log in root.iter('log'):
        print log.text

if __name__ == '__main__':
    parse_with_lxml()
    parse_with_stdlib()

输出:

timestamp value
timestamp value, timestamp value, timestamp
timestamp value
timestamp value, timestamp value, timestamp

它在两种情况下都处理它的文本属性。

相关问题