Question

我的数据库中存在一个带有波浪号（ñ）的字母n，而我的Django应用程序在尝试将其用作字符串时会出现一些问题。

当我访问REPL中的值时，它显示如下：

find(selector)

如果我错了请纠正我 - 我认为>>> person.last_name u'xxxxxxa\xf1oxxxx' >>> str(person.last_name) Traceback (most recent call last): File "<console>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in position 15: ordinal not in range(128)字符串包含在Unicode字符串中是一个问题，并且它应该以不同的方式处理 previous < / em>将此值变为Unicode字符串...但我不知道这是否是一种症状或实际疾病。

所以我不知道该怎么做。我可能首先错误地存储这个值吗？也许我只是需要有人向我展示如何正确解码？我的目标是将此值写入CSV，最终涉及通过\xf1运行它。非常感谢！

Answer 1

字符ñ是unicode字符LATIN SMALL LETTER N WITH TILDE（U + 00F1）。所以你看到的unicode字符串是正确的。 Python显示了escape \ xf1，实际上意味着在unicode字符串的上下文中，字符为U + 00F1。

没有什么可以解码，而是如果你想将unicode字符串写入某个字节流（如文件），你需要编码它。

问题来自于str(foo)，其中foo是一个unicode字符串。这相当于foo.encode('ascii')。但是，字符ñ在ASCII编码中不存在，因此您有错误。

相反，如果您想要unicode字符串的二进制编码表示，则必须知道所需的编码并手动编码：

>>> foo = u'xxxxxxa\xf1oxxxx'
>>> foo.encode('utf8')
'xxxxxxa\xc3\xb1oxxxx'
>>> foo.encode('latin1')
'xxxxxxa\xf1oxxxx'

只需确保使用CSV文件的编码，否则您将使用无效字符。

对于python 3顺便说一句也是如此，只有你的unicode字符串是str类型，你的编码字符串将是bytes类型：

>>> foo = u'xxxxxxa\xf1oxxxx'  # note the u prefix is accepted for compatibility but has no effect
>>> foo.encode('utf8')
b'xxxxxxa\xc3\xb1oxxxx'
>>> foo.encode('latin1')
b'xxxxxxa\xf1oxxxx'

Answer 2

您可以使用简单的python encode函数将unicode转换为str。第二个参数ignore用于忽略python无法以该特定格式编码的字符。

In [1]: foo = u'xxxxxxa\xf1oxxxx'

In [2]: foo.encode('ascii', 'ignore')
Out[2]: 'xxxxxxaoxxxx'

In [3]: foo.encode('utf-8', 'ignore')
Out[3]: 'xxxxxxa\xc3\xb1oxxxx'

Python 2.7 UnicodeEncode错误

2 个答案: