在对象内定义的Apache-Spark UDF引发“No TypeTag可用于字符串”

时间:2018-05-22 17:30:23

标签: scala apache-spark user-defined-functions

对于在交互式会话期间复制粘贴函数而不是使用sbt编译,我获得了不同的行为。

Minimal, Complete, and Verifiable example用于互动会话:

$ sbt package 
[error] src/main/scala/xxyy.scala:6: No TypeTag available for String
[error]     val correctDiacritics = udf((s: scala.Predef.String) => {
[error]                                ^
[error] two errors found
[error] (compile:compileIncremental) Compilation failed
[error] Total time: 9 s, completed May 22, 2018 2:22:52 PM
$ cat src/main/scala/xxyy.scala 
package xxx.yyy
import org.apache.spark.sql.functions.udf
object DummyObject {
    val correctDiacritics = udf((s: scala.Predef.String) => {
            s.replaceAll("è","e")
            .replaceAll("é","e")
            .replaceAll("à","a")
            .replaceAll("ç","c")
            })
}

上述代码无法编译。但是在交互式会话期间:

// During the `spark-shell` session.
// Entering paste mode (ctrl-D to finish)
import org.apache.spark.sql.functions.udf
object DummyObject {
val correctDiacritics = udf((s: scala.Predef.String) => {
    s.replaceAll("è","e")
    .replaceAll("é","e")
    .replaceAll("à","a")
    .replaceAll("ç","c")
})
}
// Exiting paste mode, now interpreting.
// import org.apache.spark.sql.functions.udf
// defined object DummyObject
// Proceeds sucessfully.

版本:

  • 我正在使用Scala 2.11

  • 我正在使用Spark 2.1.0

  • built.sbt

    name := "my_app"
    
    version := "0.0.1"
    
    scalaVersion := "2.11.12"
    
    resolvers ++= Seq(
    Resolver sonatypeRepo "public",
    Resolver typesafeRepo "releases"
    )
    resolvers += "MavenRepository" at "https://mvnrepository.com/"
    
    libraryDependencies ++= Seq(
    // "org.apache.spark" %% "spark-core" % "2.1.0",
    // "org.apache.spark" %% "spark-sql" % "2.1.0",
    //"org.apache.spark" %% "spark-core_2.10" % "1.0.2",
    // "org.apache.spark" %
    "org.apache.spark" % "spark-sql_2.10" % "2.1.0",
    "org.apache.spark" % "spark-core_2.10" % "2.1.0",
    "org.apache.spark" % "spark-mllib_2.10" % "2.1.0"
    )
    

相关问题:

1 个答案:

答案 0 :(得分:2)

您的构建定义不正确:

  • 使用Scala 2.11.12构建项目
  • 但是使用Spark依赖项构建Scala 2.10

由于Scala在主要版本之间不是二进制兼容的,因此会出错。

相反,嵌入Scala版本最好使用%%

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-sql" % "2.1.0",
  "org.apache.spark" %% "spark-core" % "2.1.0",
  "org.apache.spark" %% "spark-mllib" % "2.1.0"
)

否则请确保使用正确的版本:

libraryDependencies ++= Seq(
  "org.apache.spark" % "spark-sql_2.11" % "2.1.0",
  "org.apache.spark" % "spark-core_2.11" % "2.1.0",
  "org.apache.spark" % "spark-mllib_2.11" % "2.1.0"
)
相关问题