Question

我做了一些研究并看过解决方案，但没有一个对我有用。

Python - 'ascii' codec can't decode byte

这对我不起作用。我知道0xe9是é角色。但我仍然无法弄清楚如何使这个工作，这是我的代码

output_lines = ['<menu>', '<day name="monday">', '<meal name="BREAKFAST">', '<counter name="Entreé">', '<dish>', '<name icon1="Vegan" icon2="Mindful Item">', 'Cream of Wheat (Farina)','</name>', '</dish>', '</counter >', '</meal >', '</day >', '</menu >']
output_string = '\n'.join([line.encode("utf-8") for line in output_lines])

这给了我错误ascii codec cant decode byte 0xe9

我尝试过解码，我试图取代“é”，但似乎也无法解决这个问题。

Answer 1

您正在尝试编码字节串：

>>> '<counter name="Entreé">'.encode('utf8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 20: ordinal not in range(128)

Python试图提供帮助，您只能将 Unicode 字符串编码为字节，因此首先使用默认编码对Python进行解码编码。

解决方案是不编码已经编码的数据，或者在尝试再次编码之前首先使用合适的编解码器进行解码，如果数据被编码为与您需要的编解码不同的编解码器。 / p>

如果混合使用unicode和bytestring值，只需解码字节串或仅编码unicode值;尽量避免混合类型。以下将字节字符串解码为unicode：

def ensure_unicode(v):
    if isinstance(v, str):
        v = v.decode('utf8')
    return unicode(v)  # convert anything not a string to unicode too

output_string = u'\n'.join([ensure_unicode(line) for line in output_lines])

Answer 2

问题的一个简单示例是：

>>> '\xe9'.encode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128)

\xe9不是ascii字符，这意味着您的字符串已经编码。您需要将其解码为python的unicode，然后以您想要的序列化格式再次对其进行编码。

因为我不知道你的字符串来自哪里，所以我只是偷看了python codecs，从西欧挑选了一些东西并给了它一个：

>>> '\xe9'.decode('cp1252')
u'\xe9'
>>> u'\xe9'.encode('utf-8')
'\xc3\xa9'
>>>

如果您确切知道文件来自哪个编码，那么您将获得最好的运气。

Answer 3

encode =将unicode字符串转换为bytestring

decode =将字节串转换为unicode

因为你已经有了一个bytestring，你需要解码才能使它成为一个unicode实例（假设这实际上是你想要做的）

output_string = '\n'.join(output_lines)
print output_string.decode("latin1")  #now this returns unicode

Answer 4

根据你想要对你的线条做什么，你可以在这里做不同的工作，如果你只是想在领事馆打印，因为通常领事使用utf8编码，你不需要自己这样做您的字符串格式不是unicode：

>>> output_string = '\n'.join(output_lines)
>>> print output_string
<menu>
<day name="monday">
<meal name="BREAKFAST">
<counter name="Entreé">
<dish>
<name icon1="Vegan" icon2="Mindful Item">
Cream of Wheat (Farina)
</name>
</dish>
</counter >
</meal >
</day >
</menu >

但是如果要写入文件，可以使用codecs模块：

import codecs
f= codecs.open('out_file','w',encoding='utf8')

ascii编解码器无法解码字节0xe9

4 个答案: