如何在Spark(Scala)中将WrappedArray [WrappedArray [(String,String)]]转换为Array [String]

时间:2019-03-04 14:51:36

标签: scala apache-spark dataframe apache-spark-sql user-defined-functions

我正在使用Scala中的spark框架。我的数据框有一个具有以下结构和内容的列:

+---------------------------------------------------------------------------------------------+
|Email_Code                                                                                   |
+---------------------------------------------------------------------------------------------+
|[WrappedArray([3,spain]), WrappedArray([,]), WrappedArray([3,spain])]                        |
|[WrappedArray([3,spain]), WrappedArray([3,spain])]                                           |
+---------------------------------------------------------------------------------------------+

|-- Email_Code: array (nullable = true)
 |    |-- element: array (containsNull = false)
 |    |    |-- element: struct (containsNull = false)
 |    |    |    |-- Code: string (nullable = true)
 |    |    |    |-- Value: string (nullable = true)

我正在尝试开发一个udf函数,该函数采用数组中存在的``代码''结构的所有值。但是我不能...

我想要一个类似以下的出口:

+---------------------------------------------------------------------------------------------+
|Email_Code                                                                                   |
+---------------------------------------------------------------------------------------------+
|[3,,3]                                                                                       |
|[3,3]                                                                                        |
+---------------------------------------------------------------------------------------------+

请帮忙吗?

1 个答案:

答案 0 :(得分:0)

我必须解决它:

val transformation = udf((data: Seq[Seq[Row]]) => {data.flatMap(x => x).map{case Row(code:String, value:String) => code}})

df.withColumn("result", transformation($"columnName"))