Question

我正在尝试将unicode字符串写入Python中的文件，但是当我使用linux“cat”或“less”读取文件时，不会写入正确的字符，而是显示为垃圾。

我正在从Oracle数据库中读取对象。当我打印类型（其中a是数据库结果中的一行）时：

logger.debug(type(a[index]))

输出：

<type 'unicode'>

我打开文件进行写作：

ff = codecs.open(filename, mode='w', encoding='utf-8')

我将该行写入文件，如：

ff.write(a[index]))

但是当我读取输出文件时，它没有显示正确重音的字符，而是显示垃圾：

$Buï¿½ï¿½rger, Udo, -1985. Way to perfect horsemanship

如何在Python中正确地将unicode字符串对象写入文件？

Answer 1

我可以猜测你是如何到达Mojibake of a string的。这是相当复杂的，我印象深刻，这是多么糟糕。

使用error='replace'将字节解码为从字节到Unicode的文本，屏蔽了使用错误编解码器的事实，因为未识别的字节被替换为替换字符。

然后将带有U+FFFD REPLACEMENT CHARACTER代码点的结果Unicode文本编码为UTF-8，但再次将其解码为Latin 1，最有可能由您的终端解码为cat或les输出原始字节。

以这种方式编码的文本是：

>>> print u'$Buï¿½ï¿½rger, Udo, -1985. Way to perfect horsemanship'.encode('latin1').decode('utf8')
$Bu��rger, Udo, -1985. Way to perfect horsemanship

据推测，这应该是Bürger，Udo， - 1985。完善马术的方法，ü由角色u和{{3}组成代码点，它本来是UTF-8中的CC 88，但不能解码为ASCII：

>>> text = u'Bu\u0308rger, Udo, - 1985. Way to perfect horsemanship'
>>> print text
Bürger, Udo, - 1985. Way to perfect horsemanship
>>> text.encode('utf8')
'Bu\xcc\x88rger, Udo, - 1985. Way to perfect horsemanship'
>>> text.encode('utf8').decode('ascii', errors='replace')
u'Bu\ufffd\ufffdrger, Udo, - 1985. Way to perfect horsemanship'

故事的寓意：除非你完全确定自己在做什么，否则不要使用errors='replace' 。

在Python中将unicode类型对象写入文件

1 个答案: