SQL - 查找两列相同的所有实例

时间:2014-12-31 20:46:20

标签: sql hive hiveql

因此,我有一个简单的表格,其中包含comments中与user相关的post

id  |  user           |  post_id  |  comment
----------------------------------------------------------
0   | john@test.com   |  1001     |  great article
1   | bob@test.com    |  1001     |  nice post
2   | john@test.com   |  1002     |  I agree
3   | john@test.com   |  1001     |  thats cool
4   | bob@test.com    |  1002     |  thanks for sharing
5   | bob@test.com    |  1002     |  really helpful
6   | steve@test.com  |  1001     |  spam post about pills

我希望获得用户在同一帖子上两次评论的所有实例(意思是相同的user和相同的post_id)。在这种情况下,我会回来:

id  |  user           |  post_id  |  comment
----------------------------------------------------------
0   | john@test.com   |  1001     |  great article
3   | john@test.com   |  1001     |  thats cool
4   | bob@test.com    |  1002     |  thanks for sharing
5   | bob@test.com    |  1002     |  really helpful

我认为DISTINCT是我需要的,但这只是给了我独特的行。

3 个答案:

答案 0 :(得分:2)

您可以使用GROUP BYHAVING查找包含多个条目的userpost_id对:

  SELECT a.*
  FROM table_name a
  JOIN (SELECT user, post_id
        FROM table_name
        GROUP BY user, post_id
        HAVING COUNT(id) > 1
        ) b
  ON a.user = b.user
  AND a.post_id = b.post_id

答案 1 :(得分:0)

DISTINCT会删除所有重复的行,这就是您获取唯一行的原因。

您可以尝试使用CROSS JOIN(根据https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins提供的Hive 0.10):

SELECT mt.*
FROM MYTABLE mt
CROSS JOIN MYTABLE mt2
WHERE mt.user = mt2.user
AND mt.post_id = mt2.post_id

虽然表现可能不是最好的。如果您想对其进行排序,请使用SORT BYORDER BY

答案 2 :(得分:0)

DECLARE @MyTable TABLE (id int, usr varchar(50), post_id int, comment varchar(50))
INSERT @MyTable (id, usr, post_id, comment) VALUES (0,'john@test.com',1001,'great article')
INSERT @MyTable (id, usr, post_id, comment) VALUES (1,'bob@test.com',1001,'nice post')
INSERT @MyTable (id, usr, post_id, comment) VALUES (3,'john@test.com',1002,'I agree')
INSERT @MyTable (id, usr, post_id, comment) VALUES (4,'john@test.com',1001,'thats cool')
INSERT @MyTable (id, usr, post_id, comment) VALUES (5,'bob@test.com',1002,'thanks for sharing')
INSERT @MyTable (id, usr, post_id, comment) VALUES (6,'bob@test.com',1002,'really helpful')
INSERT @MyTable (id, usr, post_id, comment) VALUES (7,'steve@test.com',1001,'spam post about pills')

SELECT
    T1.id,
    T1.usr,
    T1.post_id,
    T1.comment
FROM
    @MyTable T1

    INNER JOIN @MyTable T2
    ON T1.usr = T2.usr AND T1.post_id = T2.post_id
GROUP BY
    T1.id,
    T1.usr,
    T1.post_id,
    T1.comment
HAVING
    Count(T2.id) > 1