将新列附加到现有数据框Spark scala

时间:2017-08-09 16:47:51

标签: scala apache-spark

所以这次我遇到了。)将字符串值列表转换为Spark列的问题,以及b。)将此列附加到现有的DataFrame。我认为最大的问题是我不认为我理解转换为Spark Column所需的数据结构类型。无论如何,我想使用Scala将以下值列表(subjectIDs)附加到现有DF:

val subjectIDs = List("e03", "a01", "b03", "e01", "c02")

然后我跑这条线......:

val addSubjectIDs = udf(() => subjectIDs.toDF())

...我收到了错误:

java.lang.UnsupportedOperationException: Schema for type org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] is not supported
  at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:755)
  at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:693)
  at org.apache.spark.sql.functions$.udf(functions.scala:3176)
  ... 54 elided

理想情况下,在正确转换后,我想执行以下操作:

val dataset = examples.withColumn("subject_id", addSubjectIDs())

获得这种形状的DF:

dataset.show

+-------+-----------+
|  score| subject_id|
+-------+-----------+
|   5032|        e03|
|   1959|        a01|
|   5629|        b03|
|   5666|        e01|
|   9325|        c02|
+-------+-----------+

任何帮助都将不胜感激。

0 个答案:

没有答案
相关问题