如何从条件表中删除重复项

时间:2020-06-04 11:40:34

标签: sql apache-spark-sql

我具有下面的sql表,并且当object_idteam_name相匹配时,我想删除重复的条目。基本上,我想要object_id

的唯一值
session_id  object_id   team_name   user_name   user_desc
----------  ---------   ---------   ---------   -----------------
session1    user1       team1       user1       user1_description
session1    user2       team1       user2       user2_description
session1    team1       team1       user1       user1_description
session1    team1       team1       user2       user2_description

我想将上表转换如下

session_id  object_id   team_name   user_name   user_desc
----------  ---------   ---------   ---------   -----------------
session1    user1       team1       user1       user1_description
session1    user2       team1       user2       user2_description
session1    team1       team1       null        null

我该如何实现?

1 个答案:

答案 0 :(得分:1)

如果我理解正确,则可以使用聚合:

select (case when min(session_id) = max(session_id) then min(session_id) end) as session_id,
       object_id,
       (case when min(team_name) = max(team_name) then min(team_name) end) as team_name,
       (case when min(user_name) = max(user_name) then min(user_name) end) as user_name,
       (case when min(user_desc) = max(user_desc) then min(user_desc) end) as user_desc
from t
group by object_id;