在SQL表中查找重复值

时间:2010-04-07 18:17:29

标签: sql duplicates

使用一个字段很容易找到duplicates

SELECT name, COUNT(email) 
FROM users
GROUP BY email
HAVING COUNT(email) > 1

所以,如果我们有一张桌子

ID   NAME   EMAIL
1    John   asd@asd.com
2    Sam    asd@asd.com
3    Tom    asd@asd.com
4    Bob    bob@asd.com
5    Tom    asd@asd.com

此查询将为我们提供 John,Sam,Tom,Tom ,因为它们都具有相同的email

但是,我想要的是使用相同的emailname获取重复项。

也就是说,我想得到“汤姆”,“汤姆”。

我需要这个的原因:我犯了一个错误,并允许插入重复的nameemail值。现在我需要删除/更改重复项,因此我需要先找到它们。

35 个答案:

答案 0 :(得分:2610)

SELECT
    name, email, COUNT(*)
FROM
    users
GROUP BY
    name, email
HAVING 
    COUNT(*) > 1

只需在两个列上分组。

注意:较旧的ANSI标准是在GROUP BY中包含所有非聚合列,但这已经改变为"functional dependency"

  

在关系数据库理论中,函数依赖是数据库关系中两组属性之间的约束。换句话说,函数依赖是一种约束,它描述了关系中属性之间的关系。

支持不一致:

答案 1 :(得分:324)

试试这个:

declare @YourTable table (id int, name varchar(10), email varchar(50))

INSERT @YourTable VALUES (1,'John','John-email')
INSERT @YourTable VALUES (2,'John','John-email')
INSERT @YourTable VALUES (3,'fred','John-email')
INSERT @YourTable VALUES (4,'fred','fred-email')
INSERT @YourTable VALUES (5,'sam','sam-email')
INSERT @YourTable VALUES (6,'sam','sam-email')

SELECT
    name,email, COUNT(*) AS CountOf
    FROM @YourTable
    GROUP BY name,email
    HAVING COUNT(*)>1

输出:

name       email       CountOf
---------- ----------- -----------
John       John-email  2
sam        sam-email   2

(2 row(s) affected)

如果你想要复制的ID使用这个:

SELECT
    y.id,y.name,y.email
    FROM @YourTable y
        INNER JOIN (SELECT
                        name,email, COUNT(*) AS CountOf
                        FROM @YourTable
                        GROUP BY name,email
                        HAVING COUNT(*)>1
                    ) dt ON y.name=dt.name AND y.email=dt.email

输出:

id          name       email
----------- ---------- ------------
1           John       John-email
2           John       John-email
5           sam        sam-email
6           sam        sam-email

(4 row(s) affected)

删除重复项试试:

DELETE d
    FROM @YourTable d
        INNER JOIN (SELECT
                        y.id,y.name,y.email,ROW_NUMBER() OVER(PARTITION BY y.name,y.email ORDER BY y.name,y.email,y.id) AS RowRank
                        FROM @YourTable y
                            INNER JOIN (SELECT
                                            name,email, COUNT(*) AS CountOf
                                            FROM @YourTable
                                            GROUP BY name,email
                                            HAVING COUNT(*)>1
                                        ) dt ON y.name=dt.name AND y.email=dt.email
                   ) dt2 ON d.id=dt2.id
        WHERE dt2.RowRank!=1
SELECT * FROM @YourTable

输出:

id          name       email
----------- ---------- --------------
1           John       John-email
3           fred       John-email
4           fred       fred-email
5           sam        sam-email

(4 row(s) affected)

答案 2 :(得分:104)

试试这个:

SELECT name, email
FROM users
GROUP BY name, email
HAVING ( COUNT(*) > 1 )

答案 3 :(得分:56)

如果你想删除重复项,这里有一个更简单的方法,而不是在三重子选择中找到偶数/奇数行:

display:none

所以要删除:

SELECT id, name, email 
FROM users u, users u2
WHERE u.name = u2.name AND u.email = u2.email AND u.id > u2.id

更容易阅读和理解恕我直言

注意:唯一的问题是您必须执行请求,直到没有删除任何行,因为每次只删除每个副本中的一个

答案 4 :(得分:35)

尝试以下方法:

SELECT * FROM
(
    SELECT Id, Name, Age, Comments, Row_Number() OVER(PARTITION BY Name, Age ORDER By Name)
        AS Rank 
        FROM Customers
) AS B WHERE Rank>1

答案 5 :(得分:25)

 SELECT name, email 
    FROM users
    WHERE email in
    (SELECT email FROM users
    GROUP BY email 
    HAVING COUNT(*)>1)

答案 6 :(得分:18)

派对有点晚了,但我找到了一个非常酷的解决方法来找到所有重复的ID:

SELECT GROUP_CONCAT( id )
FROM users
GROUP BY email
HAVING ( COUNT(email) > 1 )

答案 7 :(得分:17)

试试这段代码

WITH CTE AS

( SELECT Id, Name, Age, Comments, RN = ROW_NUMBER()OVER(PARTITION BY Name,Age ORDER BY ccn)
FROM ccnmaster )
select * from CTE 

答案 8 :(得分:14)

如果您使用Oracle,这种方式更可取:

create table my_users(id number, name varchar2(100), email varchar2(100));

insert into my_users values (1, 'John', 'asd@asd.com');
insert into my_users values (2, 'Sam', 'asd@asd.com');
insert into my_users values (3, 'Tom', 'asd@asd.com');
insert into my_users values (4, 'Bob', 'bob@asd.com');
insert into my_users values (5, 'Tom', 'asd@asd.com');

commit;

select *
  from my_users
 where rowid not in (select min(rowid) from my_users group by name, email);

答案 9 :(得分:14)

这将选择/删除除每组重复项中的一条记录之外的所有重复记录。因此,删除会留下所有唯一记录+每个重复组中的一条记录。

选择重复项:

SELECT *
FROM table
WHERE
    id NOT IN (
        SELECT MIN(id)
        FROM table
        GROUP BY column1, column2
);

删除重复项:

DELETE FROM table
WHERE
    id NOT IN (
        SELECT MIN(id)
        FROM table
        GROUP BY column1, column2
);

了解大量记录,可能会导致性能问题。

答案 10 :(得分:8)

select id,name,COUNT(*) from India group by Id,Name having COUNT(*)>1

答案 11 :(得分:7)

我们如何计算重复的值? 要么重复2次,要么重复2次。 只计算他们,而不是小组。

就像

一样简单
select COUNT(distinct col_01) from Table_01

答案 12 :(得分:7)

这是我想出的容易的事情。它使用公用表表达式(CTE)和分区窗口(我认为这些功能在SQL 2008及更高版本中)。

此示例查找名称和dob重复的所有学生。要检查重复的字段是否在OVER子句中。您可以在投影中包含所需的任何其他字段。

[[1.0, 1.5], [1.5, 2.0], [2.0, 2.5], [2.5, 3.0]]
[[1.0, 1.5, 2.0], [1.5, 2.0, 2.5], [2.0, 2.5, 3.0]]

答案 13 :(得分:7)

如果您希望查看表格中是否有任何重复的行,我使用了以下查询:

create table my_table(id int, name varchar(100), email varchar(100));

insert into my_table values (1, 'shekh', 'shekh@rms.com');
insert into my_table values (1, 'shekh', 'shekh@rms.com');
insert into my_table values (2, 'Aman', 'aman@rms.com');
insert into my_table values (3, 'Tom', 'tom@rms.com');
insert into my_table values (4, 'Raj', 'raj@rms.com');


Select COUNT(1) As Total_Rows from my_table 
Select Count(1) As Distinct_Rows from ( Select Distinct * from my_table) abc 

答案 14 :(得分:6)

select name, email
, case 
when ROW_NUMBER () over (partition by name, email order by name) > 1 then 'Yes'
else 'No'
end "duplicated ?"
from users

答案 15 :(得分:6)

 select emp.ename, emp.empno, dept.loc 
          from emp
 inner join dept 
          on dept.deptno=emp.deptno
 inner join
    (select ename, count(*) from
    emp
    group by ename, deptno
    having count(*) > 1)
 t on emp.ename=t.ename order by emp.ename
/

答案 16 :(得分:6)

通过使用CTE,我们也可以找到像这样的重复值

with MyCTE
as
(
select Name,EmailId,ROW_NUMBER() over(PARTITION BY EmailId order by id) as Duplicate from [Employees]

)
select * from MyCTE where Duplicate>1

答案 17 :(得分:5)

这也应该有用,也许试一试。

  Select * from Users a
            where EXISTS (Select * from Users b 
                where (     a.name = b.name 
                        OR  a.email = b.email)
                     and a.ID != b.id)

特别适合您的情况如果您搜索具有某种前缀或一般更改的重复项,例如邮件中的新域名。那么你可以在这些列上使用replace()

答案 18 :(得分:5)

这里最重要的是拥有最快的功能。还应确定重复的索引。自联接是一个不错的选择,但是要具有更快的功能,最好先查找具有重复项的行,然后与原始表联接以查找重复行的ID。最后按ID以外的任何列排序,以使彼此之间有重复的行。

SELECT u.*
FROM users AS u
JOIN (SELECT username, email
      FROM users
      GROUP BY username, email
      HAVING COUNT(*)>1) AS w
ON u.username=w.username AND u.email=w.email
ORDER BY u.email;

答案 19 :(得分:4)

Example

答案 20 :(得分:4)

如果要查找重复数据(通过一个或多个标准)并选择实际行。

with MYCTE as (
    SELECT DuplicateKey1
        ,DuplicateKey2 --optional
        ,count(*) X
    FROM MyTable
    group by DuplicateKey1, DuplicateKey2
    having count(*) > 1
) 
SELECT E.*
FROM MyTable E
JOIN MYCTE cte
ON E.DuplicateKey1=cte.DuplicateKey1
    AND E.DuplicateKey2=cte.DuplicateKey2
ORDER BY E.DuplicateKey1, E.DuplicateKey2, CreatedAt

http://developer.azurewebsites.net/2014/09/better-sql-group-by-find-duplicate-data/

答案 21 :(得分:2)

请试试

SELECT UserID, COUNT(UserID) 
FROM dbo.User
GROUP BY UserID
HAVING COUNT(UserID) > 1

答案 22 :(得分:1)

要删除名称重复的记录

;WITH CTE AS    
(

    SELECT ROW_NUMBER() OVER (PARTITION BY name ORDER BY name) AS T FROM     @YourTable    
)

DELETE FROM CTE WHERE T > 1

答案 23 :(得分:1)

您也可以使用分析功能尝试此操作的另一种简便方法:

SELECT * from 

(SELECT name, email,

COUNT(name) OVER (PARTITION BY name, email) cnt 

FROM users)

WHERE cnt >1;

答案 24 :(得分:1)

SELECT name, email,COUNT(email) FROM users where email IN(select email from users GROUP BY email HAVING COUNT(email) > 1)

答案 25 :(得分:1)

SELECT column_name,COUNT(*) FROM TABLE_NAME GROUP BY column1, HAVING COUNT(*) > 1;

答案 26 :(得分:0)

您可以使用SELECT DISTINCT关键字消除重复项。您还可以按名称过滤,并在桌子上获得具有该名称的所有人。

答案 27 :(得分:0)

您可能想尝试一下

SELECT NAME, EMAIL, COUNT(*)
FROM USERS
GROUP BY 1,2
HAVING COUNT(*) > 1

答案 28 :(得分:0)

要检查表中的重复记录。

select * from users s 
where rowid < any 
(select rowid from users k where s.name = k.name and s.email = k.email);

select * from users s 
where rowid not in 
(select max(rowid) from users k where s.name = k.name and s.email = k.email);

要删除表中的重复记录。

delete from users s 
where rowid < any 
(select rowid from users k where s.name = k.name and s.email = k.email);

delete from users s 
where rowid not in 
(select max(rowid) from users k where s.name = k.name and s.email = k.email);

答案 29 :(得分:0)

确切的代码会有所不同,具体取决于您是要查找重复的行还是要查找具有相同电子邮件和名称的不同ID。如果id是主键或具有唯一约束,则不存在这种区别,但是问题并未指定。在前一种情况下,您可以使用其他几个答案中给出的代码:

SELECT name, email, COUNT(*)
FROM users
GROUP BY name, email
HAVING COUNT(*) > 1

在后一种情况下,您将使用:

SELECT name, email, COUNT(DISTINCT id)
FROM users
GROUP BY name, email
HAVING COUNT(DISTINCT id) > 1
ORDER BY COUNT(DISTINCT id) DESC

答案 30 :(得分:0)

我们可以在这里使用具有如下所示的聚合函数的功能

create table #TableB (id_account int, data int, [date] date)
insert into #TableB values (1 ,-50, '10/20/2018'),
(1, 20, '10/09/2018'),
(2 ,-900, '10/01/2018'),
(1 ,20, '09/25/2018'),
(1 ,-100, '08/01/2018')  

SELECT id_account , data, COUNT(*)
FROM #TableB
GROUP BY id_account , data
HAVING COUNT(id_account) > 1

drop table #TableB

这里,id_account和data这两个字段与Count(*)一起使用。因此,它将为所有记录提供两列中相同值超过一倍的记录。

由于某些原因,我们错误地错过了在SQL Server表中添加任何约束的条件,并且该记录已在前端应用程序的所有列中重复插入。然后我们可以使用下面的查询从表中删除重复的查询。

SELECT DISTINCT * INTO #TemNewTable FROM #OriginalTable
TRUNCATE TABLE #OriginalTable
INSERT INTO #OriginalTable SELECT * FROM #TemNewTable
DROP TABLE #TemNewTable

在这里,我们已提取原始表的所有不同记录,并删除了原始表的记录。再次,我们将所有新表中的不同值插入到原始表中,然后删除新表。

答案 31 :(得分:0)

表结构:

ID   NAME   EMAIL
1    John   asd@asd.com
2    Sam    asd@asd.com
3    Tom    asd@asd.com
4    Bob    bob@asd.com
5    Tom    asd@asd.com

解决方案1:

SELECT *,
       COUNT(*)
FROM users t1
INNER JOIN users t2
WHERE t1.id > t2.id
  AND t1.name = t2.name
  AND t1.email=t2.email

解决方案2:

SELECT name,
         email,
       COUNT(*)
FROM users
GROUP BY name,
         email
HAVING COUNT(*) > 1

答案 32 :(得分:0)

如果您使用Microsoft Access,则可以使用这种方式:

CREATE TABLE users (id int, name varchar(10), email varchar(50));

INSERT INTO users VALUES (1, 'John', 'asd@asd.com');
INSERT INTO users VALUES (2, 'Sam', 'asd@asd.com');
INSERT INTO users VALUES (3, 'Tom', 'asd@asd.com');
INSERT INTO users VALUES (4, 'Bob', 'bob@asd.com');
INSERT INTO users VALUES (5, 'Tom', 'asd@asd.com');

SELECT name, email, COUNT(*) AS CountOf
FROM users
GROUP BY name, email
HAVING COUNT(*)>1;

DELETE *
FROM users
WHERE id IN (
    SELECT u1.id 
    FROM users u1, users u2 
    WHERE u1.name = u2.name AND u1.email = u2.email AND u1.id > u2.id
);

感谢Tancrede Chazallet提供的删除代码。

答案 33 :(得分:0)

您使用我使用的以下查询:

   select *
        FROM TABLENAME
        WHERE PrimaryCoumnID NOT IN
        (
            SELECT MAX(PrimaryCoumnID)
            FROM  TABLENAME
            GROUP BY AnyCoumnID
        );

答案 34 :(得分:-2)

如何获取表中的重复记录

 SELECT COUNT(EmpCode),EmpCode FROM tbl_Employees WHERE Status=1 
 GROUP BY EmpCode HAVING COUNT(EmpCode) > 1