Question

我尝试使用SciKit-Learn的网格搜索来查找随机森林的最佳参数。我按照以下方式执行此操作：

from sklearn.metrics import classification_report
from sklearn.pipeline import Pipeline
from sklearn.grid_search import GridSearchCV

pipeline = Pipeline([('clf', RandomForestRegressor(random_state=50))])
parameters = {
'clf__n_estimators': (50, 100, 200),
'clf__max_depth': (50, 150, 250),
'clf__min_samples_split': (1, 2, 3, 4, 5),
'clf__min_samples_leaf': (1, 2, 3, 4, 5)
}

grid_search = GridSearchCV(pipeline, parameters, n_jobs=-1,verbose=1, scoring='neg_mean_squared_error')
grid_search.fit(X, Y)
print 'Best score: %0.3f' % grid_search.best_score_
print 'Best parameters set:'

best_parameters = grid_search.best_estimator_.get_params()
for param_name in sorted(parameters.keys()):
    print '\t%s: %r' % (param_name, best_parameters[param_name])

predictions = grid_search.predict(X)
print classification_report(Y, predictions)

不幸的是，我得到JobLibValueError指向：

---> 14 grid_search.fit(X, Y)

作为参考，我的X看起来像这样：

0   1   2   3   4   5   6   7   8   9   ... 76613   76614   76615   76616   76617   76618   76619   76620   _engaged_time   _title
0   0.0 0.000000    0.000000    0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 20000.0 54
1   0.0 0.000000    0.000000    0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 55000.0 40

我的Y值只是一堆参与时间（整数）。

感谢您的帮助！

Answer 1

尝试

1）替换：

from sklearn.grid_search import GridSearchCV

<强>与

from sklearn.model_selection import GridSearchCV

2）更新sklearn模块

pip install -U scikit-learn或conda install scikit-learn

解决方案1）解决了我遇到的类似问题。

JobLibValueError从SKLearn使用GridSearchCV时

1 个答案: