Hive SQL:在JOIN中遇到左右别名

时间:2016-03-15 15:07:39

标签: sql hive

我有这个有效的T-SQL查询:

create table tmp_leads2 as  
select t1.*
    ,case when s1.period is not null then 'Y' else 'N' end as flag_cur
    ,case when s2.period is not null then 'Y' else 'N' end as flag_prev
    ,s1.cutoff_date as cutoff_date_cur ,s1.cutoff_dtkey as cutoff_dtkey_cur 
    ,s2.cutoff_date as cutoff_date_prev ,s2.cutoff_dtkey as cutoff_dtkey_prev 
from tmp_leads t1
left join param s1 on s1.period = '(a) Current'  and s1.begin_date <= t1.CreatedDate and t1.CreatedDate < s1.end_date 
left join param s2 on s2.period = '(b) Previous' and s2.begin_date <= t1.CreatedDate and t1.CreatedDate < s2.end_date ; 

我试图将其重新编写为Hive(v0.13):

Error occurred executing hive query: OK FAILED: SemanticException [Error 10017]: Line 8:53 Both left and right aliases encountered in JOIN 'CreatedDate'

但我收到错误:

{{1}}

我看到了它正在讨论的字段,但我不确定如何在保持查询结果相同的情况下重写它。

2 个答案:

答案 0 :(得分:3)

问题来自join中的不平等条件。这带来了一个问题。以下内容可能足以满足您的目的:

create table tmp_leads2 as  
    select t1.*,
           (case when s1.period is not null then 'Y' else 'N' end) as flag_cur,
           (case when s2.period is not null then 'Y' else 'N' end) as flag_prev,
           s1.cutoff_date as cutoff_date_cur, s1.cutoff_dtkey as cutoff_dtkey_cur ,
           s2.cutoff_date as cutoff_date_prev, s2.cutoff_dtkey as cutoff_dtkey_prev 
    from tmp_leads t1 left join
         param s1
         on s1.period = '(a) Current' left join  
         param s2
         on s2.period = '(b) Previous'
    where (s1.begin_date is null or s1.begin_date <= t1.CreatedDate and t1.CreatedDate < s1.end_date) or
          (s2.begin_date is null or s2.begin_date <= t1.CreatedDate and t1.CreatedDate < s2.end_date);

这不完全等同。它假设如果一个参数在表中,那么它就在所有日期的表中。这可能是一个合理的假设。如果没有,则需要更复杂的查询。

答案 1 :(得分:1)

这是不会导致内部联接或别名问题并在Hive中提供预期结果的内容

   create table tmp_leads2 as  
   select final.*
       ,case when s1period is not null then 'Y' else 'N' end as flag_cur
       ,case when s2period is not null then 'Y' else 'N' end as flag_prev

    from
    (select t1.*,
       max(case when  s1.begin_date <= t1.CreatedDate and t1.CreatedDate < s1.end_date then s1.peroid else null end) as s1period,
       max(case when  s1.begin_date <= t1.CreatedDate and t1.CreatedDate < s1.end_date then s1.cutoff_date else null end) as cutoff_date_cur,
       max(case when  s1.begin_date <= t1.CreatedDate and t1.CreatedDate < s1.end_date then s1.cutoff_dtkey else null end) as cutoff_dtkey_cur,

       max(case when  s2.begin_date <= t1.CreatedDate and t1.CreatedDate < s2.end_date then s2.peroid else null end) as s2period,
       max(case when  s2.begin_date <= t1.CreatedDate and t1.CreatedDate < s2.end_date then s2.cutoff_date else null end) as cutoff_date_prev,
       max(case when  s2.begin_date <= t1.CreatedDate and t1.CreatedDate < s2.end_date then s2.cutoff_dtkey else null end) as cutoff_dtkey_prev,

   from tmp_leads t1
   left join param s1 on s1.period = '(a) Current'  
   left join param s2 on s2.period = '(b) Previous' 
   group by t1.* /* type all column names required from t1*/
   ) final ;
相关问题