sklearn中的网格搜索交叉验证

时间:2015-07-01 12:41:17

标签: scikit-learn

可以使用网格搜索交叉验证来使用决策树分类器提取最佳参数吗? http://scikit-learn.org/stable/tutorial/statistical_inference/model_selection.html

2 个答案:

答案 0 :(得分:8)

为什么不呢?

我邀请您查看GridsearchCV的文档。

实施例

from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import roc_auc_score

param_grid = {'max_depth': np.arange(3, 10)}

tree = GridSearchCV(DecisionTreeClassifier(), param_grid)

tree.fit(xtrain, ytrain)
tree_preds = tree.predict_proba(xtest)[:, 1]
tree_performance = roc_auc_score(ytest, tree_preds)

print 'DecisionTree: Area under the ROC curve = {}'.format(tree_performance)

并提取最佳参数:

tree.best_params_
Out[1]: {'max_depth': 5}

答案 1 :(得分:0)

这是决策树网格搜索的代码

from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV

def dtree_grid_search(X,y,nfolds):
    #create a dictionary of all values we want to test
    param_grid = { 'criterion':['gini','entropy'],'max_depth': np.arange(3, 15)}
    # decision tree model
    dtree_model=DecisionTreeClassifier()
    #use gridsearch to test all values
    dtree_gscv = GridSearchCV(dtree_model, param_grid, cv=nfolds)
    #fit model to data
    dtree_gscv.fit(X, y)
    return dtree_gscv.best_params_