我有一个简单的连接查询:
test("SparkSQLTest 0005") {
val spark = SparkSession.builder().master("local").appName("SparkSQLTest 0005").getOrCreate()
spark.range(100, 100000).createOrReplaceTempView("t1")
spark.range(2000, 10000).createOrReplaceTempView("t2")
val df = spark.sql("select count(1) from t1 join t2 on t1.id = t2.id")
df.explain(true)
}
输出如下:
我在输出中询问了标记为Q0~Q4的5个问题,可以帮助解释一下吗?谢谢!
== Parsed Logical Plan ==
'Project [unresolvedalias('count(1), None)] //Q0, Why the first line has no +- or :-
+- 'Join Inner, ('t1.id = 't2.id) //Q1, What does +- mean
:- 'UnresolvedRelation `t1` //Q2 What does :- mean
+- 'UnresolvedRelation `t2`
== Analyzed Logical Plan ==
count(1): bigint
Aggregate [count(1) AS count(1)#9L]
+- Join Inner, (id#0L = id#2L)
:- SubqueryAlias t1
: +- Range (100, 100000, step=1, splits=Some(1)) //Q3 What does : +- mean?
+- SubqueryAlias t2
+- Range (2000, 10000, step=1, splits=Some(1))
== Optimized Logical Plan ==
Aggregate [count(1) AS count(1)#9L]
+- Project
+- Join Inner, (id#0L = id#2L)
:- Range (100, 100000, step=1, splits=Some(1)) //Q4 These two Ranges are both Join's children, why one is :- and the other is +-
+- Range (2000, 10000, step=1, splits=Some(1)) //Q4
== Physical Plan ==
*(2) HashAggregate(keys=[], functions=[count(1)], output=[count(1)#9L])
+- *(2) HashAggregate(keys=[], functions=[partial_count(1)], output=[count#11L])
+- *(2) Project
+- *(2) BroadcastHashJoin [id#0L], [id#2L], Inner, BuildRight
:- *(2) Range (100, 100000, step=1, splits=1)
+- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]))
+- *(1) Range (2000, 10000, step=1, splits=1)
答案 0 :(得分:2)
它们是简单地表示有序,嵌套操作的项目符号
将写为
Header
:- Child 1
: +- Grandchild 1
:- Child 2
: :- Grandchild 2
: +- Grandchild 3
+- Child 3
+-
直接的孩子,通常是最后一个:-
直接孩子的兄弟姐妹,但不是最后一个: +-
最后一个孙子,其父母有兄弟姐妹: :-
有兄弟姐妹的孙子,其父母不是最终的,也有兄弟姐妹