修复无效的xml字符

时间:2017-01-05 19:33:31

标签: python xml lxml

我的词典中有以下元素:

d = {'Name': 'La vie r\xc3\xaav\xc3\xa9e de Gaspard'}

打印名称或直接将其插入我的数据库可以正常工作:

>>> print d['Name']
La vie rêvée de Gaspard

但是,如果我将其添加到我的xml中,则会出现以下错误:

ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

我该如何解决这个问题?

1 个答案:

答案 0 :(得分:3)

'La vie r\xc3\xaav\xc3\xa9e de Gaspard'是一个字节字符串,因此您需要将其解码为unicode,如异常所示。

>>> from lxml import etree
>>> d = {'Name': 'La vie r\xc3\xaav\xc3\xa9e de Gaspard'}
>>> e = etree.Element('root')
>>> e.set('name', d['Name'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "lxml.etree.pyx", line 746, in lxml.etree._Element.set (src/lxml/lxml.etree.c:42970)
  File "apihelpers.pxi", line 547, in lxml.etree._setAttributeValue (src/lxml/lxml.etree.c:19025)
  File "apihelpers.pxi", line 1395, in lxml.etree._utf8 (src/lxml/lxml.etree.c:26485)
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

>>> e.set('name', d['Name'].decode('utf-8'))
>>> etree.tostring(e)
'<root name="La vie r&#234;v&#233;e de Gaspard"/>'

这同样适用于设置元素的文本属性:

>>> e = etree.Element('root')
>>> e.text = d['Name'].decode('utf-8')
>>> etree.tostring(e)
'<root>La vie r&#234;v&#233;e de Gaspard</root>'