自我加入表与条件

时间:2017-03-09 07:53:50

标签: sql join apache-spark-sql

我有一个以下类型的表格:

表dummy1:

e_n  t_s  item
a     t1   c
a     t2   c
a     t3   c
a     t4   c
b     p1   c
b     p2   c
b     p3   c 
b     p4   c

t1,t2,t3,t4,p1,p2,p3,p4是按升序排列的时间戳。 t1,t2,t3,t4是event_name' a'按升序排列的时间戳。 p1,p2,p3,p4是event_name' b'的升序时间戳。

c是item_number,这些事件' a'和' b'正在发生。

我正在尝试编写一个查询,其结果应如下所示:

e_n1 e_n2  item  t_s_1 t_s_2
a     b     c     t1    p1
a     b     c     t2    p2 
a     b     c     t3    p3
a     b     c     t4    p4

我尝试过以下代码:

select l.e_n as e_n_1, m.e_n as e_n_2, l.item, l.t_s as t_s_a, 
m.t_s as t_s_b from (
(select * from  dummy where e_n = 'a') l 
join 
(select * from  dummy where e_n = 'b') m 
on l.item = m.item and l.t_s < m.t_s

需要连接l.item = m.item,因为有许多其他项目c1,c2,c3具有相同的结构

结果是:

   e_n1 e_n2  item  t_s_a t_s_b
    a     b     c     t1    p1
    a     b     c     t1    p2
    a     b     c     t1    p3
    a     b     c     t1    p4
    a     b     c     t2    p1 
    a     b     c     t2    p2
    a     b     c     t2    p3

so on

如何以有效的方式实现我的结果?

2 个答案:

答案 0 :(得分:3)

select      min (case when e_n = 'a' then 'a' end)  as e_n1
           ,min (case when e_n = 'b' then 'b' end)  as e_n2
           ,item
           ,min (case when e_n = 'a' then t_s end)  as t_s_1
           ,min (case when e_n = 'b' then t_s end)  as t_s_2

from       (select      d.*
                       ,row_number () over (partition by item,e_n order by t_s) as rn

            from        dummy as d
            ) d

group by    item
           ,rn
+------+------+------+-------+-------+
| e_n1 | e_n2 | item | t_s_1 | t_s_2 |
+------+------+------+-------+-------+
| a    | b    | c    | t1    | p1    |
| a    | b    | c    | t2    | p2    |
| a    | b    | c    | t3    | p3    |
| a    | b    | c    | t4    | p4    |
+------+------+------+-------+-------+

答案 1 :(得分:0)

首先,按每个事件的时间戳排序,然后加入已排序的表行号。

尝试以下代码。

select l.e_n as e_n_1, m.e_n as e_n_2, isnull(l.item,m.item) as item, l.t_s as t_s_a, 
    m.t_s as t_s_b from 
    (select *,(row_number() over (order by t_s)) as rn from  dummy where e_n = 'a') l 
    full join 
    (select *,(row_number() over (order by t_s)) as rn from  dummy where e_n = 'b') m 
    on l.item = m.item and l.rn=m.rn