将Spark DataFrame转换为Array / Map / List

时间:2017-08-23 13:26:59

标签: scala apache-spark

我有一个JSON,如下所示

            {"uniqueTranId":"12345", "age":25, "name":"Maichael"}, 
            {"uniqueTranId":"67891", "age":30,"name":"Andy"},
            {"uniqueTranId":"54326", "age":19, "name":"Justin" }

从Json我有一个DataFrame

                    +----+--------+------------+
                    | age|    name|uniqueTranId|
                    +----+--------+------------+
                    |  25|Maichael|       12345|
                    |  30|    Andy|       67891|
                    |  19|  Justin|       54326|
                    +----+--------+------------+

我想将此DataFrame转换为如下所示。

   List(
       ("12345"), Map["SomeConstant", Array[(uniqueTranId -> 12345, age -> 25, name -> Maichael)]] ,
       ("67891"), Map["SomeConstant", Array[(uniqueTranId -> 67891, age -> 30, name -> Andy)]],
       ("54326"), Map["SomeConstant", Array[(uniqueTranId -> 67891, age -> 19, name -> Justin)]] 
       )

以下是我正在寻找的类型。

List([uniqueTranId,  Map["SomeConstant", Array[(json_key -> json_value)]])])    

非常感谢任何直接的帮助。

1 个答案:

答案 0 :(得分:0)

应该这样做..

val data = sc.parallelize(List(
  """{"uniqueTranId":"12345", "age":25, "name":"Maichael"}""", 
  """{"uniqueTranId":"67891", "age":30,"name":"Andy"}""",
  """{"uniqueTranId":"54326", "age":19, "name":"Justin" }"""))

val df = spark.read.json(data)
val collected = df.collect

collected.map(row => {
  (row.getString(row.fieldIndex("uniqueTranId")),
   Map("someconstant" -> row.getValuesMap(df.columns).map(x => (x._1, x._2.toString)).toArray))
})
相关问题