如何获得所有预测类标签的准确性

时间:2018-05-08 09:31:40

标签: python machine-learning scikit-learn

如何通过运行决策树算法找到我们获得的输出的整体准确性。我能够获得活动用户输入的前五个类标签,但我获得了X_train和Y_train数据集的准确性使用accuracy_score()。假设我得到五个最佳推荐。我希望得到每个类标签的准确性,并借助这些标准,输出的整体准确性。请提出一些想法。

我的python脚本在这里: 这里的事件是不同的类标签

DTC= DecisionTreeClassifier()

DTC.fit(X_train_one_hot,y_train)
print("output from DTC:")
res=DTC.predict_proba(X_test_one_hot)
new=list(chain.from_iterable(res))
#Here I got the index value of top five probabilities
index=sorted(range(len(new)), key=lambda i: new[i], reverse=True)[:5]
for i in index:
    print(event[i])

Here is the sample code which i tried to get the accuracy for the predicted class labels: 
here index is the index for the top five probability of class label and event is the different class label. 
for i in index: 
    DTC.fit(X_train_one_hot,y_train) 
    y_pred=event[i]  
    AC=accuracy_score((event,y_pred)*100) 
    print(AC) 

1 个答案:

答案 0 :(得分:0)

由于您有多类分类问题,因此可以使用Python中的confusion_matrix函数计算分类器的准确性。

要获得总体准确度,请将对角线中的值相加,并将总和除以样本总数。

使用IRIS数据集考虑以下简单的多类分类example

import itertools
import numpy as np
import matplotlib.pyplot as plt

from sklearn import svm, datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix

# import some data to play with
iris = datasets.load_iris()
X = iris.data
y = iris.target
class_names = iris.target_names

# Split the data into a training set and a test set
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

# Run classifier, using a model that is too regularized (C too low) to see
# the impact on the results
classifier = svm.SVC(kernel='linear', C=0.01)
y_pred = classifier.fit(X_train, y_train).predict(X_test)

现在要计算整体准确度,请使用混淆矩阵:

conf_mat = confusion_matrix(y_pred, y_test)
acc = np.sum(conf_mat.diagonal()) / np.sum(conf_mat)
print('Overall accuracy: {} %'.format(acc*100))