Question

我使用下面的代码链接来展平嵌套数据框Flatten a DataFrame in Scala with different DataTypes inside ....我收到以下错误：

线程“main”中的异常org.apache.spark.sql.AnalysisException：参考'alternateIdentificationQualifierCode'是不明确的，可以 be：alternateIdentificationQualifierCode＃2， alternateIdentificationQualifierCode＃11; 在org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve（LogicalPlan.scala：287）在org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveChildren（LogicalPlan.scala：171）在org.apache.spark.sql.catalyst.analysis.Analyzer $ ResolveReferences $$ anonfun $ apply $ 10 $$ anonfun $ applyOrElse $ 4 $$ anonfun $ 26.apply（Analyzer.scala：470）在org.apache.spark.sql.catalyst.analysis.Analyzer $ ResolveReferences $$ anonfun $ apply $ 10 $$ anonfun $ applyOrElse $ 4 $$ anonfun $ 26.apply（Analyzer.scala：470）在org.apache.spark.sql.catalyst.analysis.package $ .withPosition（package.scala：48）在org.apache.spark.sql.catalyst.analysis.Analyzer $ ResolveReferences $$ anonfun $ apply $ 10 $$ anonfun $ applyOrElse $ 4.applyOrElse（Analyzer.scala：470）在org.apache.spark.sql.catalyst.analysis.Analyzer $ ResolveReferences $$ anonfun $ apply $ 10 $$ anonfun $ applyOrElse $ 4.applyOrElse（Analyzer.scala：466）在org.apache.spark.sql.catalyst.trees.TreeNode $$ anonfun $ transformUp $ 1.apply（TreeNode.scala：335）在org.apache.spark.sql.catalyst.trees.TreeNode $$ anonfun $ transformUp $ 1.apply（TreeNode.scala：335） at org.apache.spark.sql.catalyst.trees.CurrentOrigin $ .withOrigin（TreeNode.scala：69）在org.apache.spark.sql.catalyst.trees.TreeNode.transformUp（TreeNode.scala：334）在org.apache.spark.sql.catalyst.trees.TreeNode $$ anonfun $ 5.apply（TreeNode.scala：332）在org.apache.spark.sql.catalyst.trees.TreeNode $$ anonfun $ 5.apply（TreeNode.scala：332）在org.apache.spark.sql.catalyst.trees.TreeNode $$ anonfun $ 4.apply（TreeNode.scala：281）在scala.collection.Iterator $$ anon $ 11.next（Iterator.scala：328）

有什么方法可以在scala的spark-dataframes中以编程方式重命名列表在此先感谢.. \

代码：

object flatten {

  def main(args: Array[String]) {

    if (args.length < 1) {
      System.err.println("Usage: XMLParser.jar <config.properties>")
      println("Please provide the Configuration File for the XML Parser Job")
      System.exit(1)
    }

    val sc = new SparkContext(new SparkConf().setAppName("Spark XML Process"))
    val sqlContext = new HiveContext(sc)
    val prop = new Properties()
    prop.load(new FileInputStream(args(0)))
    val dfSchema = sqlContext.read.format("com.databricks.spark.xml").option("rowTag",prop.getProperty("xmltag")).load(prop.getProperty("input"))
    val flattened_DataFrame=flattenDf(dfSchema)

   // flattened_DataFrame.printSchema()

  }

Answer 1

使用

val renamed_df = df.toDF(Seq("col1","col2","col3"))

重命名列

在以编程方式展平时重命名数据框中的列使用selectExpr

1 个答案: