feval函数提供了意外的输出

时间:2018-06-30 09:14:06

标签: python machine-learning cross-validation multiclass-classification lightgbm

环境信息

操作系统:Ubuntu 16.04 LTS
CPU:使用Google Colab GPU运行时
C ++ / Python / R版本:Python 3


可复制的示例

我将lgb.cv用作:

cvresult = lgb.cv(alg.get_params(), lgbtrain, num_boost_round=3000, nfold=5,
                 early_stopping_rounds=80, verbose_eval=True, seed=42, metrics="multi_logloss",
                 feval=f1_scorer)

我已将我的f1_scorer(作为feval传递给lgv.cv)的功能定义为:

def f1_scorer(y_pred, y):
    y = y.get_label().astype("int")
    y_pred = y_pred.reshape((-1, 5)).argmax(axis=1) 
    return "F1_scorer", metrics.f1_score(y, y_pred, average="weighted"), True

我将y_pred重塑并调整为最大,因为我猜y_pred是简历上预测的概率。
我的多类别模型中有5个类别,y_pred的形状是y(真实标签)的5倍。
例如,如果我有10个示例,那么
y.shape(10,)。我希望y_pred.shape(10, 5),但它是(50,)。 (我知道是因为metrics.f1_score抛出错误,显示了这些形状不匹配的形状。)
因此,我认为将其重塑为(-1, 5)可以解决问题。
但是我定义f1_scorer似乎有些错误,因为f1_score没有增加,而multi_logloss却有很大的减少。这是我的输出:

[1] cv_agg's multi_logloss: 1.48182 + 0.000107528   cv_agg's F1_scorer: 0.208046 + 0.00171022
[2] cv_agg's multi_logloss: 1.37942 + 0.000209271   cv_agg's F1_scorer: 0.20873 + 0.0017343
[3] cv_agg's multi_logloss: 1.30399 + 0.000401368   cv_agg's F1_scorer: 0.209169 + 0.00158037
[4] cv_agg's multi_logloss: 1.23172 + 0.00047433    cv_agg's F1_scorer: 0.209576 + 0.00178056
[5] cv_agg's multi_logloss: 1.16928 + 0.000577606   cv_agg's F1_scorer: 0.209329 + 0.00187392
[6] cv_agg's multi_logloss: 1.11477 + 0.000623601   cv_agg's F1_scorer: 0.209316 + 0.001725
[7] cv_agg's multi_logloss: 1.06698 + 0.000639912   cv_agg's F1_scorer: 0.209314 + 0.00166868
[8] cv_agg's multi_logloss: 1.0246 + 0.000678841    cv_agg's F1_scorer: 0.209319 + 0.0018861
.
. # skipped some outputs
.
[150]   cv_agg's multi_logloss: 0.615405 + 0.00142301   cv_agg's F1_scorer: 0.209562 + 0.00165159
[151]   cv_agg's multi_logloss: 0.615341 + 0.00142724   cv_agg's F1_scorer: 0.209498 + 0.0015317
[152]   cv_agg's multi_logloss: 0.615274 + 0.0014286    cv_agg's F1_scorer: 0.209505 + 0.00161461
[153]   cv_agg's multi_logloss: 0.615205 + 0.00143131   cv_agg's F1_scorer: 0.209524 + 0.0016036
[154]   cv_agg's multi_logloss: 0.61514 + 0.00143731    cv_agg's F1_scorer: 0.20951 + 0.00160288
[155]   cv_agg's multi_logloss: 0.615072 + 0.00143254   cv_agg's F1_scorer: 0.209491 + 0.00158067

看,multi_logloss1.5降到0.6,但是f1_score是不变的。
cv由于early_stopping上的f1_scorer而最终被停止。
我在这里做错了什么? (我怀疑这是我在y_pred中重塑了f1_scorer的那部分,但不确定)

0 个答案:

没有答案