查询以查找最大值

时间:2017-08-04 06:50:06

标签: hadoop hive hiveql

我有以下数据我想获取每个ID的最新分区时间

ID       time
12  10038446  201705102100
13  10038446  201706052100
14  10038446  201706060000
15  10038446  201706060100
16  10103517  201705101700
17  10103517  201705102100
18  10103517  201706052100
19  10103517  201706060100
20  10124464  201701310100
21  10124464  201702210500
22  10124464  201702220500
23  10124464  201703062100
24  10124464  201705102100
25  10124464  201706052100
26  10124464  201706060100

输出我期待如下

15  10038446  201706060100
19  10103517  201706060100
26  10124464  201706060100
37  1019933 201706052100

如何使用Hive查询实现此目的?

2 个答案:

答案 0 :(得分:0)

试试这个

select ID, time
from
(
  select 
    ID, 
    time, 
    row_number() over (partition by ID order by time desc) as time_rank
  from table_name
 ) x
where time_rank = 1
group by ID, time

没有子查询(较低的hive版本),临时表是一个选项。

create table tmp_table as
select 
  ID, 
  time, 
  row_number() over (partition by ID order by time desc) as time_rank
from table_name;

select ID, time
from tmp_table
where time_rank = 1
group by ID, time;

drop table tmp_table;

答案 1 :(得分:0)

使用简单聚合:

select  id, max(time) as time
  from table
group by id
order by id; --order if necessary

使用您的数据集进行演示:

select id, max(time) as time 
from
table
group by id

OK
10038446        201706060100
10103517        201706060100
10124464        201706060100
Time taken: 30.66 seconds, Fetched: 3 row(s)