在SQL中识别重复记录以及主键

时间:2018-09-21 19:59:43

标签: sql sql-server tsql

我有一个业务案例场景,我需要查询我们的SQL“用户”表以找出重复的电子邮件地址。我可以通过以下查询来做到这一点:

SELECT
    user_email, COUNT(*) as DuplicateEmails
FROM
    Users
GROUP BY
    user_email
HAVING 
    COUNT(*) > 1
ORDER BY 
    DuplicateEmails DESC

我得到这样的输出:

user_email      DuplicateEmails  
--------------------------------
abc@gmail.com   2
xyz@yahoo.com   3

现在,我被要求列出所有重复记录,并在其自己的单行中显示一些其他属性,例如名字,姓氏和用户ID。所有这些信息都存储在该表“用户”中。我很难这么做。谁能帮助我或使我朝正确的方向前进?

我的输出应如下所示:

    user_email      DuplicateEmails  FirstName      LastName       UserID
    ------------------------------------------------------------------------------
    abc@gmail.com   2                Tim            Lentil         timLentil
    abc@gmail.com   2                John           Doe            johnDoe12
    xyz@yahoo.com   3                brian          boss           brianTheBoss
    xyz@yahoo.com   3                Thomas         Hood           tHood
    xyz@yahoo.com   3                Mark           Brown          MBrown12

5 个答案:

答案 0 :(得分:3)

有几种方法可以做到这一点。这是一个使用cte的人。

with FoundDuplicates as
(
    SELECT
         uter_email, COUNT(*) as DuplicateEmails
    FROM
        Users
    GROUP BY
         uter_email
    HAVING 
        COUNT(*) > 1
)

select fd.user_email
    , fd.DuplicateEmails  
    , u.FirstName      
    , u.LastName       
    , u.UserID
from Users u
join FoundDuplicates fd on fd.uter_email = u.uter_email
ORDER BY fd.DuplicateEmails DESC

答案 1 :(得分:0)

使用count() over( Partition by )documentation

答案 2 :(得分:0)

您可以通过以下方式解决它:

DECLARE @T TABLE
(
    UserID VARCHAR(20),
    FirstName NVARCHAR(45),
    LastName NVARCHAR(45),
    UserMail VARCHAR(45)
);

INSERT INTO @T (UserMail, FirstName, LastName, UserID) VALUES    
('abc@gmail.com', 'Tim',         'Lentil',         'timLentil'),
('abc@gmail.com', 'John',         'Doe',         'johnDoe12'),
('xyz@yahoo.com', 'brian',         'boss',         'brianTheBoss'),
('xyz@yahoo.com', 'Thomas',         'Hood',         'tHood'),
('xyz@yahoo.com', 'Mark',         'Brown',         'MBrown12');

SELECT *, COUNT (1) OVER (PARTITION BY UserMail) MailCount
FROM @T;

结果:

+--------------+-----------+----------+---------------+-----------+
|    UserID    | FirstName | LastName |   UserMail    | MailCount |
+--------------+-----------+----------+---------------+-----------+
| timLentil    | Tim       | Lentil   | abc@gmail.com |         2 |
| johnDoe12    | John      | Doe      | abc@gmail.com |         2 |
| brianTheBoss | brian     | boss     | xyz@yahoo.com |         3 |
| tHood        | Thomas    | Hood     | xyz@yahoo.com |         3 |
| MBrown12     | Mark      | Brown    | xyz@yahoo.com |         3 |
+--------------+-----------+----------+---------------+-----------+

答案 3 :(得分:0)

使用这样的窗口函数:

SELECT u.*
FROM (SELECT u.*, COUNT(*) OVER (PARTITION BY user_email) as numDuplicateEmails
      FROM Users
     ) u
WHERE numDuplicateEmails > 1
ORDER BY numDuplicateEmails DESC;

答案 4 :(得分:0)

我认为这也可以。

WITH cte (
     SELECT
         * 
         ,DuplicateEmails = ROW_NUMBER() OVER (Partition  BY user_email ORder by user_email)
    FROM Users
    )
    Select * from CTE 
    where DuplicateEmails > 1