Question

以下是我的配置单元查询

'select substr(ltrim(date_ts),0,10) date_ts,
 sum(if(col1 = 'type1', 1, 0)) as type_1,
 sum(if(col1 = 'type2', 1, 0)) as type_2,
 sum(if(col1 = 'type3', 1, 0)) as type_3
 from table1
 GROUP BY substr(ltrim(date_ts),0,10) 
 ORDER BY date_ts;'

我的table1（外部表）被分区为（年份字符串，月份字符串，日期字符串）

以下是我的分区

'year='2010',month='01',day='01'
 year='2010',month='01',day='02'
 year='2010',month='01',day='03'
 year='2010',month='01',day='04''

如果我在3个或更少的分区上运行它，查询完全正常。当我添加第4个分区时，它只会卡在map = 92％。无法弄清楚原因。它正在合作任何3个分区。我不知道以前是否有人面对这个问题。

我能够低于输出。

' date        | type1  | type2     |type3 |
------------------------------------------
 2011-10-01   |    1   |  0        |  0   |
 2011-10-02   |    1   |  0        |  0   |
 2011-10-03   |    0   |  1        |  1   |'

当我在第四天添加第四个分区时，地图会在90％左右停留，并且即使在1到2小时之后也会保持这样的状态。

预期输出

' date        | type1  | type2     |type3 |
------------------------------------------
 2011-10-01   |    1   |  0        |  0   |
 2011-10-02   |    1   |  0        |  0   |
 2011-10-03   |    0   |  1        |  1   |
 2011-10-05   |    0   |  1        |  0   |'

有什么建议吗？

Answer 1

四个分区是什么？看看你最终是否

__DEFAULT_PARTITION

表示分区列中有空值。这会导致歪斜导致......缓慢。

如何做到这一点：

use <your_database>;
show partitions table table1;

Hive查询不能用于超过3个分区

1 个答案: