避免自我加入蜂巢

时间:2014-04-04 02:58:24

标签: hive hiveql

我正在使用在collect_set函数中构建的Hives。该表如下所示:

 cookie, events, keywords,pages 
 1234      1      'dress'  10
 1234      1      'dress'  10
 1235      2      'shoes'  14
 1234      5      'socks'  22

使用collect_set我可以得到以下结构

   select cookie, collect_set(events) as ev, collect_set(keywords) as kwords, 
   collect_set(pages)
    from table1 
    group by cookie

我需要做的是多次搜索收集的数组,例如:

 select cookie 
 ,array_contains(collect_set(events),2) as has_2
 ,array_contains(collect_set(keywords),1) as has_4
  from table1
  group by cookie) A 

根据我的理解,我无法投射超过1次的场地,最终不得不做类似的事情

 select a.cookie,a.has_2,b.has_4 from ( 
 select cookie 
 ,array_contains(collect_set(events),2) as has_2 
 from table1 group by cookie ) A
 inner join 
 select cookie 
 ,array_contains(collect_set(events),4) as has_4
 from table1 group by cookie) B 
on A.cookie = B. cookie

最终结果如下:

 cookie, has_2, has_4 
 1234     F      F 
 1235     T      T 

没有自我加入,有没有办法做到这一点?目前我必须自己加入30次以获得我需要的格式。

由于

2 个答案:

答案 0 :(得分:2)

select S.cookie, array_contains(S.events_set,2), array_contains(S.events_set,4) 
from
(select cookie, collect_set(events) as events_set
 from table1 group by cookie ) S

答案 1 :(得分:0)

您应该为您的SQL引入GROUP BY。

e.g。

select
    cookie,
    array_contains(collect_set(events),2) as has_2,
    array_contains(collect_set(keywords),1) as has_4
 from
    table1
 group by
    cookie;