在列中查找具有重复值的行

时间:2014-03-28 20:45:56

标签: sql postgresql duplicates aggregate-functions window-functions

我有一张表author_data

 author_id | author_name
 ----------+----------------
 9         | ernest jordan
 14        | k moribe
 15        | ernest jordan
 25        | william h nailon 
 79        | howard jason
 36        | k moribe

现在我需要结果:

 author_id | author_name                                                  
 ----------+----------------
 9         | ernest jordan
 15        | ernest jordan     
 14        | k moribe 
 36        | k moribe

也就是说,对于有重复出现的名称,我需要author_id。我试过这句话:

select author_id,count(author_name)
from author_data
group by author_name
having count(author_name)>1

但它没有用。我怎么能得到这个?

3 个答案:

答案 0 :(得分:9)

我建议在子查询中使用window function

SELECT author_id, author_name  -- omit the name here, if you just need ids
FROM (
   SELECT author_id, author_name
        , count(*) OVER (PARTITION BY author_name) AS ct
   FROM   author_data
   ) sub
WHERE  ct > 1;

您将识别基本聚合函数count()。可以通过附加OVER子句将其转换为窗口函数 - 就像任何其他聚合函数一样。

这样它会计算每个分区的行 。瞧。

在没有窗口功能(v.8.3或更早版本)的旧版本中 - 或者通常 - 这种替代方案执行速度非常快:

SELECT author_id, author_name  -- omit name, if you just need ids
FROM   author_data a
WHERE  EXISTS (
   SELECT 1
   FROM   author_data a2
   WHERE  a2.author_name = a.author_name
   AND    a2.author_id <> a.author_id
   );

如果您关注性能,请在author_name上添加索引。

答案 1 :(得分:2)

您可以将表连接到自身,这可以通过以下任一查询实现:

SELECT a1.author_id, a1.author_name
FROM authors a1
CROSS JOIN authors a2
  ON a1.author_id <> a2.author_id
  AND a1.author_name = a2.author_name;

-- 9 |ernest jordan
-- 15|ernest jordan
-- 14|k moribe
-- 36|k moribe

--OR

SELECT a1.author_id, a1.author_name
FROM authors a1
INNER JOIN authors a2
  WHERE a1.author_id <> a2.author_id
  AND a1.author_name = a2.author_name;

-- 9 |ernest jordan
-- 15|ernest jordan
-- 14|k moribe
-- 36|k moribe

答案 2 :(得分:1)

你已经到了一半了。您只需使用标识的Author_IDs并获取其余数据。

试试这个..

SELECT author_id, author_name
FROM author_data
WHERE author_id in (select author_id
        from author_data
        group by author_name
        having count(author_name)>1)