在pyspark中,我想使用Scala UDF过滤具有任意成员类型的数组。
package com.example.spark.udf
import scala.collection.mutable.WrappedArray
import org.apache.spark.sql.api.java.UDF2
class ArrayFilterGt[T] extends UDF2[WrappedArray[T], T, WrappedArray[T]] {
override def call(x: WrappedArray[T], y: T): WrappedArray[T] = (x, y) match {
case (null, _) => null
case (_, null) => x
case (x, y) => x.filter(_ > y )
}
}
但是构建错误。
[info] Loading settings from plugin.sbt ...
[info] Loading project definition from /some_path/spark-scala-util/project
[info] Loading settings from build.sbt ...
[info] Set current project to spark-scala-util (in build file:/some_path/spark-scala-util/)
[success] Total time: 0 s, completed 2018/04/19 8:32:32
[info] Updating ...
[info] Done updating.
[info] Compiling 3 Scala sources to /some_path/spark-scala-util/target/scala-2.12/classes ...
[error] /some_path/spark-scala-util/src/main/scala/ArrayFilterGt.scala:13:31: value > is not a member of type parameter T
[error] case (x, y) => x.filter(_ > y )
[error] ^
[error] one error found
[error] (Compile / compileIncremental) Compilation failed
[error] Total time: 7 s, completed 2018/04/19 8:32:39
这将是一个非常基本的错误。 但是,由于我不熟悉Scala / Java,我无法解决它。有人能帮助我吗?
修改
我尝试了[T <% Ordered[T]]
并构建了它。
class ArrayFilterGt[T <% Ordered[T]] extends UDF2[WrappedArray[T], T, WrappedArray[T]] {
override def call(x: WrappedArray[T], y: T): WrappedArray[T] = (x, y) match {
case (null, _) => null
case (_, null) => x
case (x, y) => x.filter(_ > y )
}
}
但我无法在Pyspark上注册。
spark_session.udf.registerJavaFunction(
name='date_array_gt',
javaClassName='com.example.spark.udf.ArrayFilterGt',
returnType=ArrayType(elementType=DateType(), containsNull=True),
)
错误:
AnalysisException: 'Can not instantiate class com.example.spark.udf.ArrayFilterGt, please make sure it has public non argument constructor;'
此外,com.example.spark.udf.ArrayFilterGt[Date]
会引发同样的错误。
由于其他scala UDF可以成功注册和使用,--jars
选项不成问题。