hive查询迭代结果,选择top

时间:2015-12-17 12:42:16

标签: sql hadoop hive hql

我正在加入两个表USER和TOP_ITEMS_BY_CATEGORY。我的目标是实现以下目标的输出:

  • 显示user_id和top sku_id(如果cat_id有一个),然后是下一个cat_id的top sku_id,重复。

以下是表格:

USER 
 user_id | cat_id | rank 
 a123   | c123   |   1
 a123   | c234   |   2
 a123   | c345   |   3 
 a789   | c789   |   1
 a789   | c123   |   2 

TOP_ITEMS_BY_CATEGORY
 cat_id  | sku_id  | rank
 c123    | s_123   |  1
 c123    | s_234   |  2
 c234    | s_345   |  1 
 c345    | s_456   |  1
 c789    | s_567   |  1
 c789    | s_678   |  2  

How I'd want the OUTPUT:
OUTPUT
 user_id | sku_id
 a123    | s_123 
 a123    | s_345
 a123    | s_456
 a123    | s_234
 a789    | s_567
 a789    | s_123
 a789    | s_678
 a789    | s_234

1 个答案:

答案 0 :(得分:1)

只需一个简单的内部联接即可。

Select USER.USER_ID, TOP_ITEMS_BY_CATEGORY.sku_id
from USER
INNER JOIN TOP_ITEMS_BY_CATEGORY
ON USER.cat_id = TOP_ITEMS_BY_CATEGORY.cat_id
ORDER BY USER.USER_ID, TOP_ITEMS_BY_CATEGORY.RANK, USER.RANK;
相关问题