如果错误仍然存在，请使用errors ='ignore'用不同的编码进行检查

Question

我是python和stackoverflow的新手。

我有一个包含csv文件的文件夹，我正在尝试从每个文件中读取字段名称并将其写入新的csv文件。
感谢stackoverflow，我能够制作和编辑我的代码，直到unicode错误出现。
我尽力解决这个错误并做了研究。
我发现在Mac或Linux中创建的文件有utf8 unicode，在windows中创建的文件有cp949。
因此，我必须通过utf8打开它们。

我的代码首先看起来像这样：

import csv
import glob
lst=[]
files=glob.glob('C:/dataset/*.csv')
with open('test.csv','w',encoding='cp949',newline='') as testfile:
    csv_writer=csv.writer(testfile)
    for file in files:
        with open(file,'r') as infile:
            file=file[file.rfind('\\')+1:]
            reader=csv.reader(infile)
            headers=next(reader) 
            headers=[str for str in headers if str] 
            while len(headers) < 3 :
                headers=next(reader) 
                headers=[str for str in headers if str]
            lst=[file]+headers
            csv_writer.writerow(lst)

然后出现了这个错误：

Traceback (most recent call last):
  File "C:\Python35\2.py", line 12, in <module>
    headers=next(reader)
UnicodeDecodeError: 'cp949' codec can't decode byte 0xec in position 6: illegal multibyte sequence

以下是我尝试修复unicode错误的方法：

import csv
import glob
lst=[]
files=glob.glob('C:/dataset/*.csv')
with open('test.csv','w',encoding='cp949',newline='') as testfile:
    csv_writer=csv.writer(testfile)
    for file in files:
        try:
            with open(file,'r') as infile:
                file=file[file.rfind('\\')+1:]
                reader=csv.reader(infile)
                headers=next(reader) 
                headers=[str for str in headers if str] 
                while len(headers) < 3 :
                    headers=next(reader) 
                    headers=[str for str in headers if str]
                lst=[file]+headers
                csv_writer.writerow(lst)
        except:
            with open(file,'r',encoding='utf8') as infile:
                file=file[file.rfind('\\')+1:]
                reader=csv.reader(infile)
                headers=next(reader)
                headers=[str for str in headers if str]
                while len(headers) < 3 :
                    headers=next(reader) 
                    headers=[str for str in headers if str]
                lst=[file]+headers
                csv_writer.writerow(lst)

出现了这个错误：

Traceback (most recent call last):
  File "C:\Python35\2.py", line 12, in <module>
    headers=next(reader)
UnicodeDecodeError: 'cp949' codec can't decode byte 0xec in position 6: illegal multibyte sequence

在处理上述异常期间，发生了另一个异常：

Traceback (most recent call last):
  File "C:\Python35\2.py", line 20, in <module>
    with open(file,'r',encoding='utf8') as infile:
FileNotFoundError: [Errno 2] No such file or directory: '2010_1_1.csv'

文件'2010_1_1.csv'肯定存在于我的目录('C:/dataset/*.csv')

中

当我尝试使用open('C:/dataset/2010_1_1.csv','r',encoding='utf8')单独打开此文件时，它可以正常工作，但文件名旁边有'\ ufeff'。

我不确定，但我的猜测是此文件正在try:中打开但尚未关闭，因此python无法在except打开此文件。

如何编辑我的代码以解决此Unicode问题？

import glob
from chardet.universaldetector import UniversalDetector
files=glob.glob('C:/example/*.csv')
for filename in files:
print(filename.ljust(60)),
detector.reset()
for line in file(filename, 'rb'):
    detector.feed(line)
    if detector.done: break
detector.close()
print(detector.result)

错误：

Traceback (most recent call last):
  File "<pyshell#20>", line 4, in <module>
    for line in file(filename, 'rb'):
TypeError: 'str' object is not callable

Answer 1

我对Python不是很有经验，所以打电话给我是不可能的，但你可以在打开文件时尝试忽略文件的编码。我是一名Java程序员，根据我的经验，编码只需要在创建新文件时指定，而不是在打开时编写。

Answer 2

如果文件无法正确解码，您的文件似乎不会写入cp949。你必须弄清楚正确的编码。像chardet这样的模块可以提供帮助。

在Windows上，当读取文件时，使用编写的编码打开它。如果是UTF-8，请使用utf-8-sig，它将自动处理并删除字节顺序标记（BOM）{{1如果存在，则为字符。写作时，最好的办法是使用U+FEFF，因为它会处理所有可能的Unicode字符并添加BOM，因此Windows工具（如Notepad和Excel）将识别UTF-8编码的文件。没有它，大多数Windows工具将采用ANSI编码，这取决于Windows的本地化版本。

Answer 3

错误以Unicode解码的形式出现错误仅在某处会丢失，可能是由于特定文件的解码不支持格式或解码是完美的，但文件以任何格式写入时均出错例如：> json，xml，csv ....

避免卡在此问题中的唯一方法是通过使用open（）中的errors ='ignore'参数来忽略代码第一部分中的解码错误。

       with open('test.csv','w',encoding='cp949',newline='') as testfile:

#to         

       with open(r'test.csv','w',encoding='cp949',newline='',errors='ignore') as testfile:
                                    #or
            data = open(r'test.csv',errors='ignore').read()#read the file as a data

如果错误仍然存在，请使用errors ='ignore'用不同的编码进行检查

尝试在python中读取csv文件时出现Unicode解码错误

3 个答案:

如果错误仍然存在，请使用errors ='ignore'用不同的编码进行检查

尝试在python中读取csv文件时出现Unicode解码错误

3 个答案:

如果错误仍然存​​在，请使用errors ='ignore'用不同的编码进行检查

如果错误仍然存在，请使用errors ='ignore'用不同的编码进行检查