根据csv重命名spark数据框的列名

时间:2017-04-26 15:40:54

标签: scala apache-spark rename spark-dataframe

我无法根据csv重命名数据框的标头。

我得到了以下数据框:df1:

Att1   Att2     Att3   
23      m        0      
22      m        1      
42      f        0   
32      f        0    
45      m        1    

现在我想根据csv文件更改列名(第一行),如下所示:

Att1,age
Att2,gender      
Att3,employed 
...,...    
Att99,colnameY     
Att100,colnameZ

结果我期待一个数据框,女巫看起来像这样:

age   gender    employed   
23      m        0      
22      m        1      
42      f        0   
32      f        0    
45      m        1    

任何想法? 谢谢你的帮助:))

1 个答案:

答案 0 :(得分:2)

import scala.io.Source.fromFile

// read in the names map from old names to new names
val map = fromFile("names.csv").getLines.map(line => {
    val fields = line.split(",")
    (fields(0), fields(1)) 
}).toMap
// map: scala.collection.immutable.Map[String,String] = Map(Att1 -> age, Att2 -> gender, Att3 -> employed)

// rename columns using withColumnRenamed
df1.columns.foldLeft(df1){ 
    case (df, col) => df.withColumnRenamed(col, map.getOrElse(col, col)) 
}.show
+---+------+--------+
|age|gender|employed|
+---+------+--------+
| 23|     m|       0|
| 22|     m|       1|
| 42|     f|       0|  
| 32|     f|       0|
| 45|     m|       1|
+---+------+--------+