从temp-table中删除重复的行

时间:2017-11-28 12:53:17

标签: sql sql-server tsql sql-server-2012

我有一个包含重复行的临时表。我想从这个表中删除重复的行:

DELETE FROM #Payments 
LEFT OUTER JOIN 
    (
    SELECT 
        CONVERT(uniqueidentifier, MIN(CONVERT(char(36), DocumentNo))) as RowId
       ,[ClearingDoc]
       , [PaymentType]
       , [DocDate]
    FROM #Payments 
    GROUP BY [DocumentNo], [ClearingDoc], [PaymentType], [DocDate]
) 
as KeepRows ON #Payments.RowId = KeepRows.RowId
WHERE KeepRows.RowId IS NULL
;

但我一直收到错误Incorrect syntax near LEFT。我可能只是盯着它看,但我做错了什么?

3 个答案:

答案 0 :(得分:2)

目前尚不清楚你究竟想做什么。您的查询采用的是group by键列的最小值。

但是,如果您想保留每个文档的最新行,可以使用row_number()

with todelete as (
      select p.*,
             row_number() over (partition by DocumentNo
                                order by DocDate desc
                               ) as seqnum
      from #payments p
     )
delete from todelete
    where seqnum > 1;

答案 1 :(得分:0)

选择要保留到新临时表中的行将会减少资源集成。像这样:

  SELECT distinct [DocumentNo], [ClearingDoc], [PaymentType], [DocDate]
  into #P2 from #Payments 

如果并非所有字段都是唯一的,您也可以添加GROUP BY。

答案 2 :(得分:0)

我想主要的问题是什么是自然键以及记录之间的区别是什么?

让我们做一个简单的T-SQL编码测试来清除它。

下面的代码使用整数,日期和字符创建一个简单的表。

-- Use a default db
USE [model];
GO

-- Create temp table
CREATE TABLE #Payments
(
DocumentNo int,
DocDate date,
ClearingDoc int,
PaymentType char(1)
);
GO

-- Clear table
TRUNCATE TABLE #Payments
GO

-- Add data
INSERT INTO #Payments VALUES (1, dateadd(d, -5, getdate()), 1, 'A');
INSERT INTO #Payments VALUES (2, dateadd(d, -4, getdate()), 1, 'B');
INSERT INTO #Payments VALUES (2, dateadd(d, -3, getdate()), 1, 'B');
INSERT INTO #Payments VALUES (1, dateadd(d, -2, getdate()), 1, 'A');
GO

-- Show data
SELECT * FROM #Payments
GO

执行语句后,我们的数据如下所示。

自然键是凭证号,清算凭证和付款方式。我们希望查找具有最早文档日期的记录。

enter image description here

我喜欢使用旧的时尚组by和having子句使用公用表表达式。

以下代码返回具有最早文档日期的重复记录。

-- Find the oldest records
SELECT DocumentNo, ClearingDoc, PaymentType, MIN(DocDate) AS OldestDate
FROM #Payments
GROUP BY DocumentNo, ClearingDoc, PaymentType
HAVING COUNT(*) > 1

enter image description here

最后但并非最不重要的是,将此代码打包在CTE / DELETE语句中。

-- Remove duplicate data by oldest date
;
WITH CTE_DELETE_LIST AS
(
SELECT 
    DocumentNo, 
    ClearingDoc, 
    PaymentType, 
    MIN(DocDate) AS OldestDate
FROM 
    #Payments
GROUP BY 
    DocumentNo, ClearingDoc, PaymentType
HAVING 
    COUNT(*) > 1
)
DELETE
FROM #Payments
FROM #Payments AS P
JOIN 
    CTE_DELETE_LIST AS C
ON P.DocumentNo = C.DocumentNo and 
   P.ClearingDoc = C.ClearingDoc and
   P.PaymentType = C.PaymentType and 
   P.DocDate = C.OldestDate

这是删除最旧的两行的表的快照。

enter image description here