所以这次我遇到了。)将字符串值列表转换为Spark列的问题,以及b。)将此列附加到现有的DataFrame。我认为最大的问题是我不认为我理解转换为Spark Column所需的数据结构类型。无论如何,我想使用Scala将以下值列表(subjectIDs
)附加到现有DF:
val subjectIDs = List("e03", "a01", "b03", "e01", "c02")
然后我跑这条线......:
val addSubjectIDs = udf(() => subjectIDs.toDF())
...我收到了错误:
java.lang.UnsupportedOperationException: Schema for type org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] is not supported
at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:755)
at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:693)
at org.apache.spark.sql.functions$.udf(functions.scala:3176)
... 54 elided
理想情况下,在正确转换后,我想执行以下操作:
val dataset = examples.withColumn("subject_id", addSubjectIDs())
获得这种形状的DF:
dataset.show
+-------+-----------+
| score| subject_id|
+-------+-----------+
| 5032| e03|
| 1959| a01|
| 5629| b03|
| 5666| e01|
| 9325| c02|
+-------+-----------+
任何帮助都将不胜感激。