我有一个JSON,如下所示
{"uniqueTranId":"12345", "age":25, "name":"Maichael"},
{"uniqueTranId":"67891", "age":30,"name":"Andy"},
{"uniqueTranId":"54326", "age":19, "name":"Justin" }
从Json我有一个DataFrame
+----+--------+------------+
| age| name|uniqueTranId|
+----+--------+------------+
| 25|Maichael| 12345|
| 30| Andy| 67891|
| 19| Justin| 54326|
+----+--------+------------+
我想将此DataFrame转换为如下所示。
List(
("12345"), Map["SomeConstant", Array[(uniqueTranId -> 12345, age -> 25, name -> Maichael)]] ,
("67891"), Map["SomeConstant", Array[(uniqueTranId -> 67891, age -> 30, name -> Andy)]],
("54326"), Map["SomeConstant", Array[(uniqueTranId -> 67891, age -> 19, name -> Justin)]]
)
以下是我正在寻找的类型。
List([uniqueTranId, Map["SomeConstant", Array[(json_key -> json_value)]])])
非常感谢任何直接的帮助。
答案 0 :(得分:0)
应该这样做..
val data = sc.parallelize(List(
"""{"uniqueTranId":"12345", "age":25, "name":"Maichael"}""",
"""{"uniqueTranId":"67891", "age":30,"name":"Andy"}""",
"""{"uniqueTranId":"54326", "age":19, "name":"Justin" }"""))
val df = spark.read.json(data)
val collected = df.collect
collected.map(row => {
(row.getString(row.fieldIndex("uniqueTranId")),
Map("someconstant" -> row.getValuesMap(df.columns).map(x => (x._1, x._2.toString)).toArray))
})