SQL表,包含要查找的重复值(重复项)列表

时间:2013-11-21 22:52:34

标签: sql sql-server hash aggregate checksum

我正在尝试从表中识别重复项列表,我的表格如下所示:

列1-列2

  1. 1-1
  2. 1-2
  3. 1-3
  4. 2-1
  5. 2-2
  6. 2-3
  7. 3-1
  8. 3-2
  9. 3-4
  10. 4-1
  11. 4-2
  12. 4-3
  13. 4-4
  14. 5-1
  15. 5-2
  16. 5-4

    • 1有一组{1,2,3}
    • 2有一组{1,2,3}
    • 并且是重复的
    • 3有一组{1,2,4}
    • 5有一组{1,2,4}
    • 并且是重复的
    • 4有一组{1,2,3,4}
    • 没有朋友;)
  17. 第2列真的是一个varchar列,但是为了简单起见,我把所有数字都用掉了。

    我一直在玩CheckSum_Agg,但它有误报。 :(

    我的输出看起来像这样:

    • 1,2
    • 3,5

    我选择第一列的最小ID和第二列的所有其他值。省略重复。

    另一个例子可能如下:

    • 1,2
    • 1,6
    • 3,5
    • 3,7
    • 3,8
    • (注意列表中没有“4”,我只是添加了其他“对”,表示1和3是最低的。如果4在列表中,如4,0或4,null,我可以使其工作太。)

    我正在使用SQL Server 2012.谢谢!

3 个答案:

答案 0 :(得分:0)

WITH t AS (
  SELECT
    column1,
    COUNT(*) c
  FROM MyTable
  GROUP BY column1
)
SELECT
  t1.column1,
  t2.column1
FROM t t1
INNER JOIN t t2 ON (
  t1.c = t2.c AND
  t2.column1 > t1.column1
)
WHERE NOT EXISTS (
  SELECT column2 FROM MyTable WHERE column1 = t1.column1
  EXCEPT
  SELECT column2 FROM MyTable WHERE column1 = t2.column1
)

答案 1 :(得分:0)

select column1,column2 from my_table
group by column1,column2
having COUNT(*) > 1

将为您提供重复记录列表。

答案 2 :(得分:0)

--This code produced the results I was looking for in the original post.  

WITH t AS (
  SELECT
    column1,
    COUNT(*) c
  FROM #tbl
  GROUP BY column1
),
tt AS(
SELECT
  t1.column1 as 'winner',
  t2.column1 as 'loser'
FROM t t1
INNER JOIN t t2 ON (
  t1.c = t2.c AND
  t1.column1 < t2.column1
)
WHERE NOT EXISTS (
  SELECT column2 FROM #tbl WHERE column1 = t1.column1
  EXCEPT
  SELECT column2 FROM #tbl WHERE column1 = t2.column1
)
)
SELECT fullList.winner, fullList.loser
FROM
(  SELECT winner FROM tt tt1
   EXCEPT
   SELECT loser FROM tt tt2
) winnerList
JOIN tt fullList on winnerList.winner = fullList.winner
ORDER BY fullList.winner, fullList.loser