以下最小示例
val df1 = spark.createDataFrame(Seq((0, "a"), (1, "b"))).toDF("foo", "bar")
val df2 = df1.select($"foo")
val df3 = df2.filter($"bar" === lit("a"))
df1.printSchema
df1.show
df2.printSchema
df2.show
df3.printSchema
df3.show
运行无错误:
root
|-- foo: integer (nullable = false)
|-- bar: string (nullable = true)
+---+---+
|foo|bar|
+---+---+
| 0| a|
| 1| b|
+---+---+
root
|-- foo: integer (nullable = false)
+---+
|foo|
+---+
| 0|
| 1|
+---+
root
|-- foo: integer (nullable = false)
+---+
|foo|
+---+
| 0|
+---+
但是,我希望有类似的东西
org.apache.spark.sql.AnalysisException: cannot resolve '`bar`' given input columns: [foo];
出于同样的原因,我明白了
org.apache.spark.sql.AnalysisException: cannot resolve '`asdasd`' given input columns: [foo];
当我这样做
val df4 = df2.filter($"asdasd" === lit("a"))
但是不会发生。为什么?
答案 0 :(得分:2)
我倾向于将其称为错误。 explain plan
可以提供更多信息:
val df1 = Seq((0, "a"), (1, "b")).toDF("foo", "bar")
df1.select("foo").where($"bar" === "a").explain(true)
// == Parsed Logical Plan ==
// 'Filter ('bar = a)
// +- Project [foo#4]
// +- Project [_1#0 AS foo#4, _2#1 AS bar#5]
// +- LocalRelation [_1#0, _2#1]
//
// == Analyzed Logical Plan ==
// foo: int
// Project [foo#4]
// +- Filter (bar#5 = a)
// +- Project [foo#4, bar#5]
// +- Project [_1#0 AS foo#4, _2#1 AS bar#5]
// +- LocalRelation [_1#0, _2#1]
//
// == Optimized Logical Plan ==
// LocalRelation [foo#4]
//
// == Physical Plan ==
// LocalTableScan [foo#4]
显然,parsed logical plan
和analyzed (or resolved) logical plan
仍在其{{1}中(即projections
)中仍由bar
组成,并且过滤操作继续遵守假定的删除的列。
在相关说明中,以下查询的逻辑计划也包含被删除的列,因此表现出类似的异常:
Project nodes