在Mac上解析XML但在PC上工作时的UnicodeDecodeError

时间:2017-09-25 14:16:21

标签: python xml ascii lxml parsexml

使用以下内容解析XML文件时

from lxml import etree

with open('cortex_full.xml', 'r') as infile:
    root = etree.parse(infile)

我正在下面的UnicodeDecodeError。这只发生在我的Mac上 - 如果我在工作PC上使用相同的脚本解析同一个文件,一切正常。

File "/Users/Desktop/CPET/xml_test2.py", line 5, in <module>
    root = etree.parse(infile)
  File "src/lxml/lxml.etree.pyx", line 3442, in lxml.etree.parse (src/lxml/lxml.etree.c:81701)
  File "src/lxml/parser.pxi", line 1832, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:118888)
  File "src/lxml/parser.pxi", line 1852, in lxml.etree._parseFilelikeDocument (src/lxml/lxml.etree.c:119171)
  File "src/lxml/parser.pxi", line 1747, in lxml.etree._parseDocFromFilelike (src/lxml/lxml.etree.c:117959)
  File "src/lxml/parser.pxi", line 1162, in lxml.etree._BaseParser._parseDocFromFilelike (src/lxml/lxml.etree.c:112686)
  File "src/lxml/parser.pxi", line 595, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:105881)
  File "src/lxml/parser.pxi", line 702, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:107548)
  File "src/lxml/lxml.etree.pyx", line 324, in lxml.etree._ExceptionContext._raise_if_stored (src/lxml/lxml.etree.c:12152)
  File "src/lxml/parser.pxi", line 373, in lxml.etree._FileReaderContext.copyToBuffer (src/lxml/lxml.etree.c:103210)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 783: ordinal not in range(128)

考虑到这里的线程数量,这似乎是非常常见的,但是没有一个建议的修复程序似乎适用于此实例。让它发挥作用的任何想法?完整的XML文件here

1 个答案:

答案 0 :(得分:1)

发布一个对我有用的答案供将来参考。 归功于@Burhan Khalid的答案。

打开utf-8文件时,需要将编码设置为xml

with open('cortex_full.xml', 'r', encoding='utf-8') as infile: