为什么火花思维字面值作为笛卡尔交叉联接而联接?

时间:2018-12-11 11:09:17

标签: apache-spark pyspark apache-spark-sql pyspark-sql

我想将lit()列与非文字列连接起来。

rdd1 = spark.createDataFrame([('1', 'a'), ('2', 'b'), ('3', 'c')], ['id1', 'val'])
rdd1 = rdd1.withColumn('id2',lit('1'))

rdd2 = spark.createDataFrame([('1', 2, 1), ('2', 3, 0), ('3', 3, 1)], ['key1', 'key2', 'val'])

res = rdd1.join(rdd2, [rdd1['id2'] == rdd2['key1']],'left')

当我解释res数据帧时,即使不满足1 = 1条件,我也遇到了笛卡尔积问题。

>>> res.explain()
== Physical Plan ==
org.apache.spark.sql.AnalysisException: Detected cartesian product for LEFT OUTER join between logical plans
Project [id1#1456, val#1457, 1 AS id2#1460]
+- LogicalRDD [id1#1456, val#1457], false
and
Filter (isnotnull(key1#1464) && (1 = key1#1464))
+- LogicalRDD [key1#1464, key2#1465L, val#1466L], false
Join condition is missing or trivial.
Use the CROSS JOIN syntax to allow cartesian products between these relations.;

0 个答案:

没有答案