我尝试在python (3.5)
中为分类数据创建神经网络。
我有一个包含47个独立变量(X)的表,以及包含1列因变量(y)的表。此变量是分类的,它是三种可能的选项之一。
因此,我将其标记为LabelEncoder()
,以便此变量现在为0
或1
或2
。
然后我将这些数字放在三列中:使用OneHotEncoder
,并删除最后一列。原因:因为两个 1
和0
的组合带来了3种可能的结果。
对于神经网络,我在输出层使用softmax
,在损失函数使用categorical_crossentropy
(这应该用于分类数据)
当我运行我的代码时,我收到错误:
classification.py in _check_targets(y_true=array([[ 1., 0., 0.],
[ 0., 1., 0.],
...
[ 0., 0., 1.],
[ 0., 1., 0.]]), y_pred=array([2, 2, 2, 2, 2]))
77 if y_type == set(["binary", "multiclass"]):
78 y_type = set(["multiclass"])
79
80 if len(y_type) > 1:
81 raise ValueError("Can't handle mix of {0} and {1}"
---> 82 "".format(type_true, type_pred))
type_true = 'multilabel-indicator'
type_pred = 'binary'
83
84 # We can't have more than one value on y_type => The set is no more needed
85 y_type = y_type.pop()
86
ValueError: Can't handle mix of multilabel-indicator and binary
我不明白错误:type_true
- >可能是真实数据的类型(我拥有的真实数据),我可以看到它们是二进制的。
如果我删除了y
中的两列而不是一列(那么我只剩下一列),并且我使用sigmoid
函数和binary_crossentropy
丢失函数,我不会任何错误。那么数据准备好了吗?
我的代码是这样的:
# y is like [['first'], ['second'], ['third'],...]
labelencoder_y_1 = LabelEncoder()
y[:, 0] = labelencoder_y_1.fit_transform(y[:, 0])
onehotencoder_y = OneHotEncoder(categorical_features = [0])
y = onehotencoder_y.fit_transform(y).toarray()
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,
random_state = 0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Tuning the ANN
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV
from keras.models import Sequential
from keras.layers import Dense
def build_classifier(optimizer, units, layers):
classifier = Sequential()
classifier.add(Dense(units = units, kernel_initializer = 'uniform', activation = 'relu', input_dim = 47))
for i in range(layers):
classifier.add(Dense(units = units, kernel_initializer = 'uniform', activation = 'relu'))
classifier.add(Dense(units = 3, kernel_initializer = 'uniform', activation = 'softmax'))
classifier.compile(optimizer = optimizer, loss = 'categorical_crossentropy', metrics = ['accuracy'])
return classifier
classifier = KerasClassifier(build_fn = build_classifier)
parameters = {'batch_size': [32],
'epochs': [64],
'optimizer': ['rmsprop'],
'units': [16],
'layers': [2]}
grid_serach = GridSearchCV(estimator = classifier,
param_grid = parameters,
scoring = 'accuracy',
cv = 10,
n_jobs = 3)
grid_serach = grid_serach.fit(X_train, y_train)
best_parameters = grid_serach.best_params_
best_accuracy = grid_serach.best_score_
编辑: 由于来自@ djk47463
的评论,我编辑我的问题以获得所有三列