使用Spark将DF写为具有空值的镶木地板

时间:2016-02-05 20:40:19

标签: java csv apache-spark-sql parquet spark-dataframe

我正在尝试使用结构类型读取CSV,结构字段多于CSV文件的列数。这导致无关列为空。当试图将这个文件写成镶木地板时,我得到一个java.lang.NumberFormatException:null。

java.lang.NumberFormatException: null
at java.lang.Long.parseLong(Long.java:404)
at java.lang.Long.parseLong(Long.java:483)
at     scala.collection.immutable.StringLike$class.toLong(StringLike.scala:230)
at scala.collection.immutable.StringOps.toLong(StringOps.scala:31)
at com.databricks.spark.csv.util.TypeCast$.castTo(TypeCast.scala:54)
at com.databricks.spark.csv.CsvRelation$$anonfun$buildScan$6.apply(CsvRelation.scala:181)
at com.databricks.spark.csv.CsvRelation$$anonfun$buildScan$6.apply(CsvRelation.scala:162)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:349)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

dataFrameWithNulls.coalesce(4).write.option(“mode”,“PERMISSIVE”)。mode(“append”)。partitionBy(“$ day”,“$ hour”)。parquet(s“$ writeDir / “)

0 个答案:

没有答案
相关问题