在CV网格中为Spark xgBoost模型设置scalePosWeight参数

时间:2018-06-29 00:27:50

标签: scala apache-spark cross-validation xgboost apache-spark-ml

我正在尝试使用Scala在Spark上调整xgBoost模型。我的XGb参数网格如下:

val xgbParamGrid = (new ParamGridBuilder()
                .addGrid(xgb.maxDepth, Array(8, 16))
                .addGrid(xgb.minChildWeight, Array(0.5, 1, 2))
                .addGrid(xgb.alpha, Array(0.8, 0.9, 1))
                .addGrid(xgb.lambda, Array(0.8, 1, 2))
                .addGrid(xgb.scalePosWeight, Array(1, 5, 9))
                .addGrid(xgb.subSample, Array(0.5, 0.8, 1))
                .addGrid(xgb.eta, Array(0.01, 0.1, 0.3, 0.5))
                .build())

对交叉验证器的调用如下:

val evaluator = (new BinaryClassificationEvaluator()
                      .setLabelCol("label")
                      .setRawPredictionCol("prediction")
                      .setMetricName("areaUnderPR"))

    val cv = (new CrossValidator()
              .setEstimator(pipeline_model_xgb)
              .setEvaluator(evaluator)
              .setEstimatorParamMaps(xgbParamGrid)
              .setNumFolds(10))

    val xgb_model = cv.fit(train)

仅对于scalePosWeight参数,我收到以下错误:

error: type mismatch;
found   : org.apache.spark.ml.param.DoubleParam
required: org.apache.spark.ml.param.Param[AnyVal]
Note: Double <: AnyVal (and org.apache.spark.ml.param.DoubleParam <:                      

    org.apache.spark.ml.param.Param[Double]), but class Param is invariant in type T.
You may wish to define T as +T instead. (SLS 4.5)
                              .addGrid(xgb.scalePosWeight, Array(1, 5, 9))
                                           ^

根据我的搜索,消息“您可能希望将T定义为+ T”很常见,但是我不确定如何在此处解决此问题。感谢您的帮助!

1 个答案:

答案 0 :(得分:1)

将Array设置为minChildWeight时遇到了相同的问题,并且该数组仅由Int类型组成。有效的解决方案(对于scalePosWeight和minChildWeight而言)都是传递一个Floats数组:

.addGrid(xgb.scalePosWeight, Array(1.0, 5.0, 9.0))
相关问题