在SQL Server中查找重复项的更快捷方式

时间:2013-05-02 17:52:05

标签: sql-server performance tsql duplicates i2b2

我正在尝试找到一种在SQL Server中查找重复项的更好方法。在SSMS结果窗口中显示结果之前,这需要超过20分钟才能运行,只有超过3亿条记录。在坠毁之前又过了22分钟。

然后SSMS在显示16,777,216条记录后抛出此错误:

An error occurred while executing batch. Error message is: Exception of type 'System.OutOfMemoryException' was thrown.

架构:

ENCOUNTER_NUM - numeric(22,0)
CONCEPT_CD - varchar(50)
PROVIDER_ID - varchar(50)
START_DATE - datetime
MODIFIER_CD - varchar(100)
INSTANCE_NUM - numeric(18,0)


SELECT
    ROW_NUMBER() OVER (ORDER BY f1.[ENCOUNTER_NUM],f1.[CONCEPT_CD],f1.[PROVIDER_ID],f1.[START_DATE],f1.[MODIFIER_CD],f1.[INSTANCE_NUM]),
    f1.[ENCOUNTER_NUM], 
    f1.[CONCEPT_CD], 
    f1.[PROVIDER_ID], 
    f1.[START_DATE], 
    f1.[MODIFIER_CD], 
    f1.[INSTANCE_NUM]
FROM
    [dbo].[I2B2_OBSERVATION_FACT] f1
    INNER JOIN [dbo].[I2B2_OBSERVATION_FACT] f2 ON
        f1.[ENCOUNTER_NUM] = f2.[ENCOUNTER_NUM] 
        AND f1.[CONCEPT_CD] = f2.[CONCEPT_CD]
        AND f1.[PROVIDER_ID] = f2.[PROVIDER_ID]
        AND f1.[START_DATE] = f2.[START_DATE]
        AND f1.[MODIFIER_CD] = f2.[MODIFIER_CD]
        AND f1.[INSTANCE_NUM] = f2.[INSTANCE_NUM]

1 个答案:

答案 0 :(得分:8)

不确定这是多快多少,但值得一试。

SELECT
    COUNT(*) AS Dupes,
    f1.[ENCOUNTER_NUM], 
    f1.[CONCEPT_CD], 
    f1.[PROVIDER_ID], 
    f1.[START_DATE], 
    f1.[MODIFIER_CD], 
    f1.[INSTANCE_NUM]
FROM
    [dbo].[I2B2_OBSERVATION_FACT] f1
GROUP BY
    f1.[ENCOUNTER_NUM], 
    f1.[CONCEPT_CD], 
    f1.[PROVIDER_ID], 
    f1.[START_DATE], 
    f1.[MODIFIER_CD], 
    f1.[INSTANCE_NUM]
HAVING
    COUNT(*) > 1