在所有列都不相同的表中查找重复值

时间:2019-05-20 13:37:46

标签: sql sql-server

我正在处理表中的一组数据。 为简单起见,我有如下表和一些示例数据:

enter image description here

此表中的某些数据来自不同的来源,例如具有cqmRecordID != null的数据

我需要在此表中找到重复的值,并删除从其他来源(带有cqmRecordID的那些)传来的重复值 如果这些列的值相同,则认为该记录是重复的:

  • [名称]
  • 发布([CreatedDate]作为日期)
  • [CreatedBy]

因此在我上面的示例数据中,记录#5和记录#6将被视为重复记录。

作为解决方案,我提出了以下两个查询:

查询#1:

 select * from (
  select recordid, cqmrecordid, ROW_NUMBER() over (partition by name, cast(createddate as date), createdby 
                                                   order by cqmrecordid, recordid) as rownum
  from vmsNCR  ) A
  where cqmrecordid is not null   
  order by recordid

enter image description here

查询2:

  select A.recordID, A.cqmRecordID, B.RecordID, B.cqmRecordID 
  from vmsNCR A 
  join vmsNCR B
    on A.Name = B.Name 
    and cast(A.CreatedDate as date) = cast(B.CreatedDate as date) 
    and A.CreatedBy = B.CreatedBy
    and A.RecordID != B.RecordID 
    and A.cqmRecordID is not null 
  order by A.RecordID

enter image description here

是否有更好的方法?一个比另一个在性能上更好吗?

3 个答案:

答案 0 :(得分:1)

如果要获取所有没有重复的行,则:

select t.*  -- or all columns except seqnum
from (select t.*,
             row_number() over (partition by name, cast(createddate as date), createdby
                                order by (case when cqmRecordId is not null then 1 else 2 end)
                               ) as seqnum
      from t
     ) t
where seqnum = 1;

如果要提高性能,请先创建一个列,然后创建一个索引:

alter table t add cqmRecordId_flag as (case when cqmRecordId is null then 0 else 1 end) persisted;
alter table t add createddate_date as (cast(createddate as date)) persisted;

然后是一个索引:

create index idx_t_4 on t(name, createddate_date, createdby, cqmRecordId_flag desc);

编辑:

如果您实际上只想从表中删除NULL值,则可以使用:

delete t from t
    where t.cqmRecordId is null and
          exists (select 1
                  from t t2
                  where t2.name = t.name and
                        convert(date, t2.createddate_date) =convert(date, t.createddate_date) and
                        t2.createdby = t.createdby and
                        t2.cqmRecordId is not null
                 );

您可以对select使用相同的逻辑来选择重复项。

答案 1 :(得分:0)

使用以下代码消除重复

;WITH CTE
AS
(
   SELECT ROW_NUMBER() OVER(
              PARTITION BY [Name],Cast([CreatedDate] as Date),[CreatedBy] 
              ORDER BY cqmRecordId
           ) AS Rnk
   ,*
)
DELETE FROM CTE
WHERE Rnk <> 1

答案 2 :(得分:0)

在下面尝试使用“查询它可能对您有用”

;WITH TestCTE
AS
(
   SELECT *,ROW_NUMBER() OVER(
              PARTITION BY [Name],Cast([CreatedDate] as Date),[CreatedBy] 
              ORDER BY RecordId
            ) AS RowNumber
)
DELETE FROM TestCTE
WHERE RowNumber > 1