在一张图上绘制Grid Search CV网格参数结果

时间:2019-06-18 01:43:24

标签: python-3.x dictionary machine-learning scikit-learn gridsearchcv

在sklearn 0.17.1中,有->> grid_scores_:已命名元组(https://scikit-learn.org/0.17/modules/generated/sklearn.grid_search.GridSearchCV.html#sklearn.grid_search.GridSearchCV)的列表

现在在sklearn 0.21.2中将其替换为->> cv_results_:numpy(带掩码的)ndarray(https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html)的字典

以前使用sklearn 0.17.1,我可以使用grid_scores_在单个图上绘制所有网格参数,但是现在我无法汇总从cv_results_获得的值,因为在新版本中没有“ mean_validation_score”。

我有一个现有代码,该代码在sklearn 0.17.1(https://scikit-learn.org/0.17/modules/generated/sklearn.grid_search.GridSearchCV.html#sklearn.grid_search.GridSearchCV)中绘制了所有参数得分,其中使用了grid_scores_并将其完美地绘制在一个图中。

在较新版本的slearn中,cv_results_已替换为grid_scores_。我试图将所有值附加到一个图形上,以绘制所有参数,但目前我无法添加正确的值以在图形上绘制。

{k: mean(itemgetter(*g)(dic)) for k, g in groupby(dic, key=lambda i: type(i))}
# {int: 20, str: 20}

此图像未填充,因为现在没有“ mean_validation_score”可用于每个子图的填充: https://ibb.co/Z6jwnMr

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.metrics import confusion_matrix
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.metrics import roc_curve, auc, roc_auc_score
from sklearn.metrics.ranking import precision_recall_curve
from sklearn.metrics import confusion_matrix
from sklearn.datasets import make_classification
from sklearn.tree import DecisionTreeClassifier, export_graphviz
from sklearn import tree
from sklearn.metrics import accuracy_score
import sklearn
import itertools
from pandas.tools.plotting import scatter_matrix
import os
import datetime as dt
from operator import itemgetter
from itertools import chain
import graphviz 

from sklearn.metrics import precision_recall_fscore_support
import scikitplot as skplt

X_train = np.random.randint(0,1, size=[500,5000])
y_train = np.random.randint(0,1, size=500)

print(X_train.shape, y_train.shape)
# (500, 5000) (500,)

#grid_search = GridSearchCV(clf, param_grid, cv=3) # 10 fold cross validation

### hyperparameter estimator
param_grid = {"criterion": ["gini", "entropy"], 
              "splitter": ["best", "random"],
              "max_depth": np.arange(1,9,7), 
              "min_samples_split": np.arange(2,150,90),
              "min_samples_leaf": np.arange(1,60,45), 
              "min_weight_fraction_leaf": np.arange(0.1,0.4, 0.3), 
              "max_features": [1000, 500, 5000],  
              "max_leaf_nodes": np.arange(2,60,45),
              "min_impurity_decrease": [0.0, 0.5], 
              }  


def evaluate_param(parameter, param_range, index):
    grid_search = GridSearchCV(clf, param_grid = {parameter: param_range}, cv=3) # 3 fold cross validation
    grid_search.fit(X_train, y_train) ### grid_search.fit(X_train[features], y_train)

    df = {}
    #for i, score in enumerate(grid_search.grid_scores_): # previously used methods
    for i, score in enumerate(grid_search.cv_results_["params"]):
        ## How do we save the correct values here for plotting
        df[parameter] = grid_search.cv_results_["params"][i][parameter]
        #df[parameter].update(grid_search.cv_results_["params"][i][parameter])
        #print("df : ", df)
        #df[parameter].append(grid_search.cv_results_["params"][i][parameter])

    #print("df : ", df) # the values are not appended to the keys
    df = pd.DataFrame.from_dict(df, orient='index')
    df.reset_index(level=0, inplace=True)
    df = df.sort_values(by='index')

    plt.subplot(5,2,index) # Change here according to the number of parameters
    plt.xlabel(parameter, color = "red")
    plt.ylabel("GridSearchCV Score", color= "blue")
    plot = plt.plot(df['index'], df[0])
    plt.title(parameter.capitalize(), color = "red")
    plt.savefig('DT_GridSearchCV_Score_Hyperparameter.png')
    return plot, df

clf = tree.DecisionTreeClassifier(random_state=99) # verbose=True, n_jobs=-1 :: Dt does not support it

### hyperparameter estimator
index = 1
plt.figure(figsize=(30,30))
for parameter, param_range in dict.items(param_grid):   
    evaluate_param(parameter, param_range, index)  ## 120 features
    index += 1

预期结果(应填写):https://ibb.co/Z6jwnMr

但是,图中的每个子图都应该有一条曲线,描绘出该参数的最佳值。键没有“ mean_validation_score”来绘制实际测试分数,该分数在sklearn 0.17.1中存在,而在sklearn 0.20.2中则没有

请让我知道是否仍然可以在单个图的子图上绘制所有测试分数。在此先感谢!

0 个答案:

没有答案