嘿,我在python中遇到编码这个主要问题。我对python并不太熟悉,并且已经坚持了几个星期这个bug。我觉得我已经尝试了所有可能的事情,但我似乎无法得到它。
我正在阅读要处理的文件,并且在一些有中文字符的文件上出现以下错误。
'ascii' codec can't encode characters in position 10314-10316: ordinal not in range(128)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 112, in get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/usr/lib/python2.7/site-packages/cc_counter-0.65-py2.7.egg/cc_counter/views.py", line 154, in reviewrequest_recent_cc
prev_reviewrequest_ccdata = _reviewrequest_recent_cc(request, review_request_id, False, revision_offset=1)
File "/usr/lib/python2.7/site-packages/cc_counter-0.65-py2.7.egg/cc_counter/views.py", line 140, in _reviewrequest_recent_cc
filename, comparison_data = _download_comparison_data(request, review_request_id, revision, filediff_id, modified)
File "/usr/lib/python2.7/site-packages/cc_counter-0.65-py2.7.egg/cc_counter/views.py", line 89, in _download_comparison_data
revision, filediff_id, local_site, modified)
File "/usr/lib/python2.7/site-packages/cc_counter-0.65-py2.7.egg/cc_counter/views.py", line 68, in _download_analysis
temp_file.write(working_file)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 10314-10316: ordinal not in range(128)
我在这方面的代码看起来像这样:
working_file = get_original_file(filediff, request, encoding_list)
if modified:
working_file = get_patched_file(working_file, filediff, request)
working_file = convert_to_unicode(working_file, encoding_list)[1]
logging.debug("Encoding List: %s", encoding_list)
logging.debug("Source File: " + filediff.source_file)
temp_file_name = "cctempfile_" + filediff.source_file.replace("/","_")
logging.debug("temp_file_name: " + temp_file_name)
source_file = os.path.join(HOMEFOLDER, temp_file_name)
logging.debug("File contents" + working_file)
#temp_file = codecs.open(source_file, encoding='utf-8')
#temp_file.write(working_file.encode('utf-8'))
temp_file = open(source_file, 'w')
temp_file.write(working_file)
temp_file.close()
注意注释掉的行。 工作文件永远不会是空的。 来自已记录的"编码列表的编码"是
Encoding List: [u'iso-8859-15']
任何可以提供帮助的人都会非常感激。经过连续8个小时的调试+前两周,我不得不休息一下。
答案 0 :(得分:1)
错误表明working_file
是一个Unicode字符串,但是正被写入一个打开以期望字节字符串的文件。 Python 2使用默认的ascii
编解码器将Unicode字符串隐式转换为字节字符串,非ASCII字符触发UnicodeEncodeError
。
注释行接近正确,但write
期望带有codecs.open
的Unicode字符串,因此无需显式编码,并且需要打开文件进行编写:
temp_file = codecs.open(source_file, 'w', encoding='utf-8')
temp_file.write(working_file)
答案 1 :(得分:0)
convert_to_unicode
功能的返回类型是什么?
如果是字节,您可能应该将temp_file = open(source_file, 'w')
更改为temp_file = open(source_file, 'wb')
,这意味着将字节写入文件。