Question

我想使用CV方法来调整随机森林分类器的超参数。在这一点上，我很乐意简单地调整理想的树木数量（n_estimators）。

我的输入变量是一个文本字符串（独立变量）和一个标签（dep变量）。我感到困惑的是，TfidfVectorizer在哪里发挥作用？

我在Google周围搜索了示例代码，但还没有发现任何有用的东西。

.mat-row:hover {
  background-color: red;
}
.mat-row mat-checkbox {
  display: none;
}
.mat-row:hover mat-checkbox {
 display: block;
}

我收到的错误如下。我不确定是什么触发了KeyError。

from sklearn.model_selection import cross_validate

pipeline_cv = Pipeline([
    ('bow', TfidfVectorizer(analyzer=text_process)),  
    ('tfidf', TfidfTransformer()),  
    ('classifier', RandomForestClassifier()),  
])

parameters = {
    'n_estimators'      : [10,50,200],
    'random_state'      : [108],
    'min_samples_leaf'  : [2,3,5],
    'min_samples_split' : [2,3,5] 
}

clf = cross_validate(pipeline_cv, parameters)
clf.fit(text_train, value_train)

如何使用文本数据调整随机森林分类器的超参数

0 个答案: