中文的xml.etree.ElementTree

时间:2017-08-16 15:59:34

标签: python xml

鉴于带有中文字符的XML,我想使用xml.etree来帮助我解析XML以进行一些处理。英文版有效。例如:

>el.xml printf '%s\n' $'<?xml version=\'1.0\' encoding=\'utf8\'?><Color>Grey</Color>'
>cl.xml printf '%s\n' $'<?xml version=\'1.0\' encoding=\'utf8\'?><Color>灰色</Color>'

tryParse() {
  python -c 'import xml.etree.ElementTree as ET; import sys; ET.parse(sys.argv[1])' "$@"
}

tryParse el.xml && printf '%s\n\n' "English works"
tryParse cl.xml && printf '%s\n\n' "Chinese works"

...作为输出发出:

English works

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1182, in parse
    tree.parse(source, parser)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 656, in parse
    parser.feed(data)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1642, in feed
    self._raiseerror(v)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
    raise err
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 44

1 个答案:

答案 0 :(得分:1)

改为使用lxml

>>> import lxml.etree as ET
>>> doc = ET.parse('cl.xml')
>>> print doc.getroot().text
灰色