Python编码问题似乎无法解决

时间:2015-12-08 02:08:42

标签: python encoding python-unicode

嘿,我在python中遇到编码这个主要问题。我对python并不太熟悉,并且已经坚持了几个星期这个bug。我觉得我已经尝试了所有可能的事情,但我似乎无法得到它。

我正在阅读要处理的文件,并且在一些有中文字符的文件上出现以下错误。

 'ascii' codec can't encode characters in position 10314-10316: ordinal not in range(128)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 112, in get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/lib/python2.7/site-packages/cc_counter-0.65-py2.7.egg/cc_counter/views.py", line 154, in reviewrequest_recent_cc
    prev_reviewrequest_ccdata = _reviewrequest_recent_cc(request, review_request_id, False, revision_offset=1)
  File "/usr/lib/python2.7/site-packages/cc_counter-0.65-py2.7.egg/cc_counter/views.py", line 140, in _reviewrequest_recent_cc
    filename, comparison_data = _download_comparison_data(request, review_request_id, revision, filediff_id, modified)
  File "/usr/lib/python2.7/site-packages/cc_counter-0.65-py2.7.egg/cc_counter/views.py", line 89, in _download_comparison_data
    revision, filediff_id, local_site, modified)
  File "/usr/lib/python2.7/site-packages/cc_counter-0.65-py2.7.egg/cc_counter/views.py", line 68, in _download_analysis
    temp_file.write(working_file)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 10314-10316: ordinal not in range(128)

我在这方面的代码看起来像这样:

working_file = get_original_file(filediff, request, encoding_list)

if modified:
    working_file = get_patched_file(working_file, filediff, request)

working_file = convert_to_unicode(working_file, encoding_list)[1]
logging.debug("Encoding List: %s", encoding_list)
logging.debug("Source File: " + filediff.source_file)

temp_file_name = "cctempfile_" + filediff.source_file.replace("/","_")
logging.debug("temp_file_name: " + temp_file_name)
source_file = os.path.join(HOMEFOLDER, temp_file_name)


logging.debug("File contents" + working_file)
#temp_file = codecs.open(source_file, encoding='utf-8')
#temp_file.write(working_file.encode('utf-8'))

temp_file = open(source_file, 'w')
temp_file.write(working_file)
temp_file.close()

注意注释掉的行。 工作文件永远不会是空的。 来自已记录的"编码列表的编码"是

Encoding List: [u'iso-8859-15']

任何可以提供帮助的人都会非常感激。经过连续8个小时的调试+前两周,我不得不休息一下。

2 个答案:

答案 0 :(得分:1)

错误表明working_file是一个Unicode字符串,但是正被写入一个打开以期望字节字符串的文件。 Python 2使用默认的ascii编解码器将Unicode字符串隐式转换为字节字符串,非ASCII字符触发UnicodeEncodeError

注释行接近正确,但write期望带有codecs.open的Unicode字符串,因此无需显式编码,并且需要打开文件进行编写:

temp_file = codecs.open(source_file, 'w', encoding='utf-8')
temp_file.write(working_file)

答案 1 :(得分:0)

convert_to_unicode功能的返回类型是什么?

如果是字节,您可能应该将temp_file = open(source_file, 'w')更改为temp_file = open(source_file, 'wb'),这意味着将字节写入文件。