理解计划树字符串表示

时间:2018-06-01 01:51:57

标签: apache-spark

我有一个简单的连接查询:

  test("SparkSQLTest 0005") {
    val spark = SparkSession.builder().master("local").appName("SparkSQLTest 0005").getOrCreate()
    spark.range(100, 100000).createOrReplaceTempView("t1")
    spark.range(2000, 10000).createOrReplaceTempView("t2")
    val df = spark.sql("select count(1) from t1 join t2 on t1.id = t2.id")
    df.explain(true)
  }

输出如下:

我在输出中询问了标记为Q0~Q4的5个问题,可以帮助解释一下吗?谢谢!

== Parsed Logical Plan ==
'Project [unresolvedalias('count(1), None)] //Q0, Why the first line has no +- or :-
+- 'Join Inner, ('t1.id = 't2.id)    //Q1, What does +- mean
   :- 'UnresolvedRelation `t1`       //Q2 What does :- mean
   +- 'UnresolvedRelation `t2`

== Analyzed Logical Plan ==
count(1): bigint
Aggregate [count(1) AS count(1)#9L]
+- Join Inner, (id#0L = id#2L)
   :- SubqueryAlias t1
   :  +- Range (100, 100000, step=1, splits=Some(1)) //Q3 What does :  +- mean?
   +- SubqueryAlias t2
      +- Range (2000, 10000, step=1, splits=Some(1))

== Optimized Logical Plan ==
Aggregate [count(1) AS count(1)#9L]
+- Project
   +- Join Inner, (id#0L = id#2L)
      :- Range (100, 100000, step=1, splits=Some(1)) //Q4 These two Ranges are both Join's children, why one is :- and the other is +-
      +- Range (2000, 10000, step=1, splits=Some(1)) //Q4

== Physical Plan ==
*(2) HashAggregate(keys=[], functions=[count(1)], output=[count(1)#9L])
+- *(2) HashAggregate(keys=[], functions=[partial_count(1)], output=[count#11L])
   +- *(2) Project
      +- *(2) BroadcastHashJoin [id#0L], [id#2L], Inner, BuildRight
         :- *(2) Range (100, 100000, step=1, splits=1)
         +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]))
            +- *(1) Range (2000, 10000, step=1, splits=1)

1 个答案:

答案 0 :(得分:2)

它们是简单地表示有序,嵌套操作的项目符号

  • 标题
    • 孩子1
      • 孙子1
    • 孩子2
      • 孙子2
      • 孙子3
    • 儿童3

将写为

Header
:- Child 1
:  +- Grandchild 1
:- Child 2
:  :- Grandchild 2
:  +- Grandchild 3
+- Child 3
  • +-直接的孩子,通常是最后一个
  • :-直接孩子的兄弟姐妹,但不是最后一个
  • : +-最后一个孙子,其父母有兄弟姐妹
  • : :-有兄弟姐妹的孙子,其父母不是最终的,也有兄弟姐妹