OneHotEncoder TypeError:类型不支持转换:(dtype(' float64'),dtype(' O'))

时间:2014-12-28 13:56:44

标签: python numpy pandas

我是scikit学习和熊猫的新手。我有以下代码:

cols = ['delay', 'month', 'day', 'dow', 'hour', 'distance', 'carrier', 'dest','year','origin','arr']

tp = read_csv('D:/CCP DS Nov 2014/smartfly/smartfly_historic_train.csv', iterator=True,  chunksize=1000) 
data_2007 = concat(tp, ignore_index=True) # df is DataFrame. If error do list(tp)
data_2007.columns = ['delay', 'month', 'day', 'dow', 'hour', 'distance', 'carrier',  'dest','year','origin','arr']
data_2007 = data_2007.dropna(subset=['delay'])
tp = read_csv('D:/CCP DS Nov 2014/smartfly/smartfly_historic_test.csv', iterator=True, chunksize=1000) 
categ = [cols.index(x) for x in ['month','day','dow','hour','distance','carrier','dest']]
enc = OneHotEncoder(categorical_features = categ,sparse=True)
df = data_2007.drop('delay', axis=1)
df['carrier'] = pd.factorize(df['carrier'])[0]
df['dest'] = pd.factorize(df['dest'])[0]
train_x = enc.fit_transform(df)

smartfly_historic_train.csv中的示例记录如下所示

-5,8,11,7,10,361,US,CLT,2013,BWI,1132

我正在尝试将USCLT等分类变量转换为整数,以便提供RandomForest,但我收到以下错误:

TypeError: no supported conversion for types: (dtype('float64'), dtype('O')

堆栈错误跟踪:

File "D:\Meerkat\meerkat\ml_new1.py", line 58, in run_from_command_line
 train_x = enc.fit_transform(df)
  File "C:\Python33\lib\site-packages\sklearn\preprocessing\data.py", line 1054, in fit_transform
 self.categorical_features, copy=True)
  File "C:\Python33\lib\site-packages\sklearn\preprocessing\data.py", line 897, in _transform_selected
  return sparse.hstack((X_sel, X_not_sel))
  File "C:\Python33\lib\site-packages\scipy\sparse\construct.py", line 453, in hstack
  return bmat([blocks], format=format, dtype=dtype)
  File "C:\Python33\lib\site-packages\scipy\sparse\construct.py", line 583, in bmat
    dtype = upcast(*tuple([A.dtype for A in blocks[block_mask]]))
  File "C:\Python33\lib\site-packages\scipy\sparse\sputils.py", line 62, in upcast
    raise TypeError('no supported conversion for types: %r' % (args,))
TypeError: no supported conversion for types: (dtype('float64'), dtype('O'))

0 个答案:

没有答案