无法通过spark开发人员代码检查,遇到二进制兼容性错误

时间:2016-06-17 02:40:32

标签: scala apache-spark-mllib

我是新的火花贡献者。我想为随机森林分类器添加类权重支持,如下所述:https://issues.apache.org/jira/browse/SPARK-9478

我已经完成了函数实现,我在这里遵循代码提供说明:https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-PreparingtoContributeCodeChanges

在说明中,它说“使用./dev/run-tests运行所有测试以验证代码是否仍然编译,通过测试并通过样式检查”。当我运行测试时,我的代码无法通过二进制兼容性检查。

日志说:

[error]  * method this(scala.Enumeration#Value,org.apache.spark.mllib.tree.impurity.Impurity,Int,Int,Int,scala.Enumeration#Value,scala.collection.immutable.Map,Int,Double,Int,Double,Boolean,Int)Unit 
in class org.apache.spark.mllib.tree.configuration.Strategy does not have a correspondent in current version
[error]    filter with: ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.mllib.tree.configuration.Strategy.this")

我更改了文件“org.apache.spark.mllib.tree.configuration.Strategy”,因为我需要更改此类的接口。我做的是添加一个新的输入参数,如下所示:

    class Strategy @Since("1.3.0") (
    @Since("1.0.0") @BeanProperty var algo: Algo,
    @Since("1.0.0") @BeanProperty var impurity: Impurity,
    @Since("1.0.0") @BeanProperty var maxDepth: Int,
    @Since("1.2.0") @BeanProperty var numClasses: Int = 2,
    @Since("1.0.0") @BeanProperty var maxBins: Int = 32,
    @Since("1.0.0") @BeanProperty var quantileCalculationStrategy: QuantileStrategy = Sort,
    @Since("1.0.0") @BeanProperty var categoricalFeaturesInfo: Map[Int, Int] = Map[Int, Int](),
    @Since("1.2.0") @BeanProperty var minInstancesPerNode: Int = 1,
    @Since("1.2.0") @BeanProperty var minInfoGain: Double = 0.0,
    @Since("1.0.0") @BeanProperty var maxMemoryInMB: Int = 256,
    @Since("1.2.0") @BeanProperty var subsamplingRate: Double = 1,
    @Since("1.2.0") @BeanProperty var useNodeIdCache: Boolean = false,
-    @Since("1.2.0") @BeanProperty var checkpointInterval: Int = 10) extends Serializable {
+    @Since("1.2.0") @BeanProperty var checkpointInterval: Int = 10,
+    @Since("2.0.0") @BeanProperty var classWeights: Array[Double] = Array(1, 1))

如何解决此问题或调试方向是什么?

--------------------------------更新------------- -----------------

我不是在JIRA中对此问题提出拉取请求的作者之一。我有一个新的实现,需要更少的内存来实现相同的目标。我的代码可以在这里找到:https://github.com/n-triple-a/spark,分支'weightedRandomForest'有上面提到的问题。

我现在可以通过在策略类中添加一个构造函数来解决这个问题,该类具有前13个参数(或参数列表中没有 classWeights ),如下所示: / p>

    this(var algo: Algo,
      impurity: Impurity,
      maxDepth: Int,
      numClasses: Int,
      maxBins: Int,
      quantileCalculationStrategy: QuantileStrategy,
      categoricalFeaturesInfo: Map[Int, Int],
      minInstancesPerNode: Int,
      minInfoGain: Double,
      maxMemoryInMB: Int,
      subsamplingRate: Double,
      useNodeIdCache: Boolean,
      checkpointInterval: Int) {
    this(algo, impurity, maxDepth, numClasses, maxBins, 
         quantileCalculationStrategy, categoricalFeaturesInfo, minInstancesPerNode,
         minInfoGain, maxMemoryInMB, subsamplingRate, useNodeIdCache, 
         checkpointInterval, Array(1.0, 1.0))
    }

我还改变了scalastyle对函数允许的最大参数数量的定义,默认情况下为10。但这对我来说很奇怪,因为有一个默认值绑定到 classWeights 。为什么我必须添加冗余构造函数?

1 个答案:

答案 0 :(得分:0)

与您的JIRA相关联的PR https://github.com/apache/spark/pull/9008/files似乎不包含Strategy类。请更新包含该文件的PR,然后告知我们是否还有其他问题。