除了操作之外,如何解决spark数据帧中的java.lang.NullPointerException?

时间:2018-04-24 06:29:45

标签: apache-spark dataframe spark-dataframe

我有两个包含用户ID的数据帧。我想区分这些数据帧,因此使用except如下:

df1.except(df2);

但是得到以下错误:

java.lang.NullPointerException
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys$(Unknown Source)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
        at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:99)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

我不知道问题在哪里。

我也尝试过滤掉两个带空值的数据帧。

编辑:两个数据框的架构和示例数据:

架构:

df1.printSchema -

root
 |-- uid: string (nullable = true)

df2.printSchema

root
 |-- uid: string (nullable = true)

df1的数据:

+--------------------+
|                 uid|
+--------------------+
|               sss12|
|       ushadevi_8512|
|           babu57111|
|       gianchand-199|
|          rju-815423|

df2的数据:

+--------------------+
|                 uid|
+--------------------+
|        navratn-3131|
|          jaykumar-1|
|      vishwanath-666|
|     dharmendra-5623|

0 个答案:

没有答案
相关问题