Question

import sys
dataset = open('file-00.csv','r')
dataset_l = dataset.readlines()

打开上述文件时，出现以下错误：

**UnicodeDecodeError: 'utf-8' codec cant decode byte 0xfe in position 156: invalide start byte**

所以我将代码更改为以下

import sys
dataset = open('file-00.csv','r', errors='replace')
dataset_l = dataset.readlines()

我也尝试过 errors ='ignore'，但是对于这两个初始错误现在都消失了，但是后来在我的代码中我遇到了另一个错误：

def find_class_1(row):
    global file_l_sp
    for line in file_l_sp:
        if line[0] == row[2] and line[1] == row[4] and line[2] == row[5]:
            return line[3].strip()
    return 'other'

文件“ Label_Classify_Dataset.py”，第56行，位于

dataset_w_label += dataset_l[it].strip() + ',' + find_class_1(l) + ',' + find_class_2(l) + '\n'

find_class_1中第40行的文件“ Label_Classify_Dataset.py”

if line[0] == row[2] and line[1] == row[4] and line[2] == row[5]:strong text



IndexError: list index out of range

如何解决第一个或第二个错误？

更新......

我已经使用readline枚举并打印了每一行，并设法找出导致错误的行。这确实是一些随机字符，但是tshark必须替换了。删除它可以消除错误，但是显然我宁愿跳过这些行而不是删除它们

with open('file.csv') as f:
    for i, line in enumerate(f):
        print('{} = {}'.format(i+1, line.strip()))

我肯定有更好的方法来枚举哈哈

Answer 1

尝试以下操作；

dataset = open('file-00.csv','rb')

open（）的模式说明符中的b表示该文件应被视为二进制文件，因此内容将保留为字节。不会像这样执行解码。

'utf-8'编解码器无法解码字节和IndexError：列表索引超出范围错误

1 个答案: