使用Filter参数创建视图

时间:2018-10-23 15:09:49

标签: sql hive hiveql

我正在蜂巢中创建一个视图,该视图将两个表合并在一起,并具有大量数据。有没有一种方法可以传递过滤器参数以在配置单元中查看,以便也将其应用于表。 我有

CREATE VIEW abc 
AS
SELECT * FROM 
(SELECT * FROM table_a
UNION 
SELECT * table_b) temp; 

如果我运行类似SELECT * FROM abc WHERE day='2018-10-22'的东西 它只应在所选日期返回工会,例如

SELECT * FROM table _a WHERE day='2018-10-22' UNION
SELECT * FROM table _b WHERE day='2018-10-22'

如何创建视图以执行此操作。

1 个答案:

答案 0 :(得分:1)

出于优化目的,无需显式添加过滤器。查询优化器可以下推谓词。看看这个

CREATE TABLE `t5`(`a` string);
CREATE TABLE `t6`(`a` string);


CREATE VIEW v1 
AS
SELECT * FROM 
(
SELECT * FROM t5
UNION ALL
SELECT * from t6
) temp; 

这是查询select * from v1 where a = "b"的解释,因为您可以看到有2个独立的表扫描,并且每个谓词都被应用。如果此时Hive提取所有数据并最后进行过滤,那真是令人失望:)

Explain
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: t5
            filterExpr: (a = 'b') (type: boolean)
            Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
            Filter Operator
              predicate: (a = 'b') (type: boolean)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
              Select Operator
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                Union
                  Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                  Select Operator
                    expressions: 'b' (type: string)
                    outputColumnNames: _col0
                    Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                    File Output Operator
                      compressed: false
                      Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                      table:
                          input format: org.apache.hadoop.mapred.TextInputFormat
                          output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                          serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
          TableScan
            alias: t6
            filterExpr: (a = 'b') (type: boolean)
            Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
            Filter Operator
              predicate: (a = 'b') (type: boolean)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
              Select Operator
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                Union
                  Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                  Select Operator
                    expressions: 'b' (type: string)
                    outputColumnNames: _col0
                    Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                    File Output Operator
                      compressed: false
                      Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                      table:
                          input format: org.apache.hadoop.mapred.TextInputFormat
                          output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                          serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink