Question

试图让class_weight继续。我知道代码的其余部分是有效的，只有class_weight给出了错误：

    parameters_to_tune = ['min_samples_split':[2,4,6,10,15,25], 'min_samples_leaf':[1,2,4,10],'max_depth':[None,4,10,15],
                                             ^
SyntaxError: invalid syntax

这是我的代码

clf1 = tree.DecisionTreeClassifier()
 parameters_to_tune = ['min_samples_split':[2,4,6,10,15,25], 'min_samples_leaf':[1,2,4,10],'max_depth':[None,4,10,15],
 'splitter' : ('best','random'),'max_features':[None,2,4,6,8,10,12,14],'class_weight':{1:10}]
clf=grid_search.GridSearchCV(clf1,parameters_to_tune)
clf.fit(features,labels)
print clf.best_params_

有没有人发现我犯的错误？

Answer 1

~~我假设你想在不同的class_weight网格搜索工资＆＃39;类。~~

class_weight的值应为列表：

'class_weight':[{'salary':1}, {'salary':2}, {'salary':4}, {'salary':6}, {'salary':10}]

你可以用列表理解来简化它：

'class_weight':[{'salary': w} for w in [1, 2, 4, 6, 10]]

第一个问题是dict parameters_to_tune中的参数值应该是一个列表，而你传递了一个dict。可以通过传递一个dicts列表作为class_weight的值来修复它，每个dict包含一组class_weight DecisionTreeClassifier。

但更严重的问题是class_weight是与类相关联的权重，但在您的情况下，＆＃39; salary＆＃39;是功能的名称。您无法为要素指定权重。我一开始误解了你的意图，但现在我对你想要的东西感到困惑。

class_weight的格式为{class_label: weight}，如果您真的想在案例中设置class_weight，则class_label应为0.0,1.0等值，并且语法就像：

'class_weight':[{0: w} for w in [1, 2, 4, 6, 10]]

如果类的权重很大，则分类器更有可能预测该类中的数据。使用class_weight的一个典型案例是数据不平衡时。

这是一个example，虽然分类器是SVM。

更新

完整parameters_to_tune应该是：

parameters_to_tune = {'min_samples_split': [2, 4, 6, 10, 15, 25],
                      'min_samples_leaf': [1, 2, 4, 10],
                      'max_depth': [None, 4, 10, 15],
                      'splitter' : ('best', 'random'),
                      'max_features':[None, 2, 4, 6, 8, 10, 12, 14],
                      'class_weight':[{0: w} for w in [1, 2, 4, 6, 10]]}

Answer 2

下面的链接是关于不同class_weight值的使用。只需 Ctrl + F ＆＃34; class_weight＆＃34;到相关部分。它使用GridSearchCV为不同的优化目标找到最佳class_weight。

Optimizing a classifier using different evaluation metrics

Sklearn GridSearchCV，class_weight由于未知原因无效:(

2 个答案:

更新