读取大型CSV文件时设置dtypes的问题

时间:2019-05-21 08:15:09

标签: python

我的csv文件很大〜14gb(124列),读取df = pd.read_csv(r'C:\Users\AdamPer\Desktop\Python\Magisterka\test2.csv', encoding= "utf_8_sig")时出现内存错误 我尝试设置选项low_memory = Falseerror_bad_lines = False,但是它没有帮助,所以我决定设置dtype并对此产生问题。 我做了什么。

我制作了一个较小的csv文件〜16mb,并将其读取到数据框并检查列df.info(max_cols=200)的类型

Soft              39347 non-null object
Hand_ID           39347 non-null int64
Table_Name        39345 non-null object
SmallBlind        39347 non-null float64
BigBlind          39347 non-null float64
Currency          39347 non-null object
Day               39347 non-null object
Hour              39347 non-null object
Seat_1            39347 non-null object
Seat_2            39347 non-null object
Seat_3            39347 non-null object
Seat_4            39347 non-null object
Seat_5            39347 non-null object
Seat_6            39347 non-null object
Stack_1           39347 non-null float64
Stack_2           39347 non-null float64
Stack_3           39347 non-null float64
Stack_4           39347 non-null float64
Stack_5           39347 non-null float64
Stack_6           39347 non-null float64
Raise_Pre_S1      39347 non-null object
Raise_Pre_S2      39347 non-null object
Raise_Pre_S3      39347 non-null object
Raise_Pre_S4      39347 non-null object
Raise_Pre_S5      39347 non-null object
Raise_Pre_S6      39347 non-null object
Call_Pre_S1       39347 non-null object
Call_Pre_S2       39347 non-null object
Call_Pre_S3       39347 non-null object
Call_Pre_S4       39347 non-null object
Call_Pre_S5       39347 non-null object
Call_Pre_S6       39347 non-null object
Flop_Bet_S1       39347 non-null float64
Flop_Bet_S2       39347 non-null float64
Flop_Bet_S3       39347 non-null float64
Flop_Bet_S4       39347 non-null float64
Flop_Bet_S5       39347 non-null float64
Flop_Bet_S6       39347 non-null float64
Flop_Raise_S1     39347 non-null object
Flop_Raise_S2     39347 non-null object
Flop_Raise_S3     39347 non-null object
Flop_Raise_S4     39347 non-null object
Flop_Raise_S5     39347 non-null object
Flop_Raise_S6     39347 non-null object
Flop_Call_S1      39347 non-null object
Flop_Call_S2      39347 non-null object
Flop_Call_S3      39347 non-null object
Flop_Call_S4      39347 non-null object
Flop_Call_S5      39347 non-null object
Flop_Call_S6      39347 non-null object
Saw_Flop_S1       39347 non-null int64
Saw_Flop_S2       39347 non-null int64
Saw_Flop_S3       39347 non-null int64
Saw_Flop_S4       39347 non-null int64
Saw_Flop_S5       39347 non-null int64
Saw_Flop_S6       39347 non-null int64
Turn_Bet_S1       39347 non-null float64
Turn_Bet_S2       39347 non-null float64
Turn_Bet_S3       39347 non-null float64
Turn_Bet_S4       39347 non-null float64
Turn_Bet_S5       39347 non-null float64
Turn_Bet_S6       39347 non-null float64
Turn_Raise_S1     39347 non-null object
Turn_Raise_S2     39347 non-null object
Turn_Raise_S3     39347 non-null object
Turn_Raise_S4     39347 non-null object
Turn_Raise_S5     39347 non-null object
Turn_Raise_S6     39347 non-null object
Turn_Call_S1      39347 non-null object
Turn_Call_S2      39347 non-null object
Turn_Call_S3      39347 non-null object
Turn_Call_S4      39347 non-null object
Turn_Call_S5      39347 non-null object
Turn_Call_S6      39347 non-null object
Saw_Turn_S1       39347 non-null int64
Saw_Turn_S2       39347 non-null int64
Saw_Turn_S3       39347 non-null int64
Saw_Turn_S4       39347 non-null int64
Saw_Turn_S5       39347 non-null int64
Saw_Turn_S6       39347 non-null int64
River_Bet_S1      39347 non-null float64
River_Bet_S2      39347 non-null float64
River_Bet_S3      39347 non-null float64
River_Bet_S4      39347 non-null float64
River_Bet_S5      39347 non-null float64
River_Bet_S6      39347 non-null float64
River_Raise_S1    39347 non-null object
River_Raise_S2    39347 non-null object
River_Raise_S3    39347 non-null object
River_Raise_S4    39347 non-null object
River_Raise_S5    39347 non-null object
River_Raise_S6    39347 non-null object
River_Call_S1     39347 non-null object
River_Call_S2     39347 non-null object
River_Call_S3     39347 non-null object
River_Call_S4     39347 non-null object
River_Call_S5     39347 non-null object
River_Call_S6     39347 non-null object
Saw_River_S1      39347 non-null int64
Saw_River_S2      39347 non-null int64
Saw_River_S3      39347 non-null int64
Saw_River_S4      39347 non-null int64
Saw_River_S5      39347 non-null int64
Saw_River_S6      39347 non-null int64
S1_shows?         39347 non-null int64
S2_shows?         39347 non-null int64
S3_shows?         39347 non-null int64
S4_shows?         39347 non-null int64
S5_shows?         39347 non-null int64
S6_shows?         39347 non-null int64
Winner?_S1        39347 non-null int64
Winner?_S2        39347 non-null int64
Winner?_S3        39347 non-null int64
Winner?_S4        39347 non-null int64
Winner?_S5        39347 non-null int64
Winner?_S6        39347 non-null int64
W/L_amount_S1     39347 non-null float64
W/L_amount_S2     39347 non-null float64
W/L_amount_S3     39347 non-null float64
W/L_amount_S4     39347 non-null float64
W/L_amount_S5     39347 non-null float64
W/L_amount_S6     39347 non-null float64
Pot               39347 non-null float64
Rake              39347 non-null float64

根据我设置的dtypes:

dtypes = {'Soft': np.object,
          'Hand_ID': np.int64,
          'Table_Name': np.object,
          'SmallBlind': np.float64,
          'BigBlind': np.float64,
          'Currency': np.object,
          'Day': np.object,
          'Hour': np.object,
          'Seat_1': np.object, 'Seat_2': np.object, 'Seat_3': np.object, 'Seat_4': np.object, 'Seat_5': np.object, 'Seat_6': np.object,
          'Stack_1': np.float64, 'Stack_2': np.float64, 'Stack_3': np.float64, 'Stack_4': np.float64, 'Stack_5': np.float64, 'Stack_6': np.float64,
'Raise_Pre_S1': np.object, 'Raise_Pre_S2': np.object, 'Raise_Pre_S3': np.object, 'Raise_Pre_S4': np.object, 'Raise_Pre_S5': np.object, 'Raise_Pre_S6': np.object,
'Call_Pre_S1': np.object, 'Call_Pre_S2': np.object, 'Call_Pre_S3': np.object, 'Call_Pre_S4': np.object, 'Call_Pre_S5': np.object, 'Call_Pre_S6': np.object,
'Flop_Bet_S1': np.float64, 'Flop_Bet_S2': np.float64, 'Flop_Bet_S3': np.float64, 'Flop_Bet_S4': np.float64, 'Flop_Bet_S5': np.float64, 'Flop_Bet_S6': np.float64,
'Flop_Raise_S1': np.object, 'Flop_Raise_S2': np.object, 'Flop_Raise_S3': np.object, 'Flop_Raise_S4': np.object, 'Flop_Raise_S5': np.object, 'Flop_Raise_S6': np.object,
'Flop_Call_S1': np.object, 'Flop_Call_S2': np.object, 'Flop_Call_S3': np.object, 'Flop_Call_S4': np.object, 'Flop_Call_S5': np.object, 'Flop_Call_S6': np.object, 
'Saw_Flop_S1': np.int64, 'Saw_Flop_S2': np.int64, 'Saw_Flop_S3': np.int64, 'Saw_Flop_S4': np.int64, 'Saw_Flop_S5': np.int64, 'Saw_Flop_S6': np.int64,
'Turn_Bet_S1': np.float64, 'Turn_Bet_S2': np.float64, 'Turn_Bet_S3': np.float64, 'Turn_Bet_S4': np.float64, 'Turn_Bet_S5': np.float64, 'Turn_Bet_S6': np.float64,
'Turn_Raise_S1': np.object, 'Turn_Raise_S2': np.object, 'Turn_Raise_S3': np.object, 'Turn_Raise_S4': np.object, 'Turn_Raise_S5': np.object, 'Turn_Raise_S6': np.object,
'Turn_Call_S1': np.object, 'Turn_Call_S2': np.object, 'Turn_Call_S3': np.object, 'Turn_Call_S4': np.object, 'Turn_Call_S5': np.object, 'Turn_Call_S6': np.float64,
'Saw_Turn_S1': np.int64, 'Saw_Turn_S2': np.int64, 'Saw_Turn_S3': np.int64, 'Saw_Turn_S4': np.int64, 'Saw_Turn_S5': np.int64, 'Saw_Turn_S6': np.int64,
'River_Bet_S1': np.float64,'River_Bet_S2': np.float64,'River_Bet_S3': np.float64,'River_Bet_S4': np.float64,'River_Bet_S5': np.float64,'River_Bet_S6': np.float64,
'River_Raise_S1': np.object, 'River_Raise_S2': np.object,'River_Raise_S3': np.object, 'River_Raise_S4': np.object, 'River_Raise_S5': np.object, 'River_Raise_S6': np.object,
'River_Call_S1': np.object, 'River_Call_S2': np.object, 'River_Call_S3': np.object, 'River_Call_S4': np.object, 'River_Call_S5': np.object, 'River_Call_S6': np.object,
'Saw_River_S1': np.int64,'Saw_River_S2': np.int64,'Saw_River_S3': np.int64,'Saw_River_S4': np.int64,'Saw_River_S5': np.int64, 'Saw_River_S6': np.int64,
'S1_shows?': np.int64, 'S2_shows?': np.int64, 'S3_shows?': np.int64, 'S4_shows?': np.int64, 'S5_shows?': np.int64, 'S6_shows?': np.int64,
'Winner?_S1': np.int64, 'Winner?_S2': np.int64, 'Winner?_S3': np.int64, 'Winner?_S4': np.int64, 'Winner?_S5': np.int64, 'Winner?_S6': np.int64,
'W/L_amount_S1': np.float64, 'W/L_amount_S2': np.float64, 'W/L_amount_S3': np.float64, 'W/L_amount_S4': np.float64, 'W/L_amount_S5': np.float64, 'W/L_amount_S6': np.float64,
'Pot': np.float64,
'Rake': np.float64}

,并尝试使用以下代码读取相同的csv:

df = pd.read_csv(r'C:\Users\AdamPer\Desktop\Python\Magisterka\test2.csv', encoding= "utf_8_sig", dtype=dtypes)

这给我带来一个错误:

ValueError: could not convert string to float: '[]'

任何想法如何解决这个问题? Link to smaller csv file

0 个答案:

没有答案