Spark DF:不支持类型Unit的Schema

时间:2016-11-09 15:54:08

标签: scala function apache-spark dataframe

我是Scala和Spark的新手,并试图建立我发现的一些样本。基本上我试图从数据框中调用一个函数来使用Google API从邮政编码中获取State。 我有代码单独工作但不在一起;( 这段代码不起作用......

Exception in thread "main" java.lang.UnsupportedOperationException: Schema for type Unit is not supported
    at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:716)
    at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:654)
    at org.apache.spark.sql.functions$.udf(functions.scala:2837)
    at MovieRatings$.getstate(MovieRatings.scala:51)
    at MovieRatings$$anonfun$4.apply(MovieRatings.scala:48)
    at MovieRatings$$anonfun$4.apply(MovieRatings.scala:47)...
Line 51 starts with def getstate = udf {(zipcode:String)...

    // SQL statements can be run by using the sql methods provided by Spark
   val zipcodesDF = spark.sql("SELECT distinct zipcode, zipcode as state FROM Users")
 // => "zipcode: " + zipcodes.getAs[String]("zipcode") + getstate(zipcodes.getAs[String]("zipcode"))).show()
  val colNames = zipcodesDF.columns
val cols = => zipcodesDF.col(cName))
val theColumn = zipcodesDF("state")
val mappedCols = => 
  if (c.toString() == theColumn.toString()) getstate(c).as("transformed") else c)
  val newDF =*).show()
 def getstate = udf {(zipcode:String) => {
val url = ""+zipcode
val result =
val address = parse(result) 
val shortnames = for {
         JObject(address_components) <- address
         JField("short_name", short_name)  <- address_components
          } yield short_name
val state = shortnames(3)
//return state.toString()
val stater = state.toString()


1 个答案:

答案 0 :(得分:0)

感谢您的回复..我想我已经明白了。这是有效的代码。需要注意的一点是Google API有限制,因此一些有效的邮政编码没有状态信息..但对我来说不是问题。


