优化配置单元查询

时间:2014-04-11 10:10:31

标签: mysql sql hadoop hive

我正在尝试优化配置单元查询。我已经将我的基表分区并存储为ORC文件,如下所示。

create table if not exists processed (
    plc string,
    direction string,
    table int,
    speed float,
    time string
) PARTITIONED BY (time_id bigint) STORED AS ORC;

我在上面的表格中触发了以下查询(包含500.000条记录)。我得到的最终结果存储为json。整个交易大约需要35秒。有没有办法可以减少这段时间。或者可能是,有人可能会建议我使用不同的框架而不是Hive。这是查询:

String finalQuery = "select plc,direction,AVG(speed) as speed ,COUNT(plc) as count,time_id from processed WHERE plc IN "
                + " "
                + "("
                + plcCSV
                + ")"
                + " " + " " + "AND" + " " + "time_id =" + " " + time_id + " " 
                + "group by plc,direction,time_id";

1 个答案:

答案 0 :(得分:0)

首先在plc列上创建索引,然后尝试。