Question

我有一个包含四个字段的表：ID自动增量，一个字符串和两个整数。我想做点什么：

     select count(*) from table group by string

然后使用结果合并所有大于1的计数。

也就是说，取所有计数大于1的行，并用一行代替数据库中的所有这些行（具有相同的字符串），ID无关紧要，两个整数是总和计数大于1的所有行的所有行。

使用一些简单的查询可以吗？

感谢。

Answer 1

我建议插入临时表数据，按字符串AND分组，并附带min（id），其中有重复项。然后使用其中id = min（id）的sum更新原始表，并删除字符串匹配但id不匹配的位置。

 insert into temp
 select string, min(id) id, sum(int1) int1, sum(int2) int2
   from table
  group by string
 having count(*) > 1

 update table, temp
   set table.int1 = temp.int1,
       table.int2 = temp.int2
 where table.id = temp.id
-- Works because there is only one record given a string in temp
 delete table
  where exists (select null from temp where temp.string = table.string and temp.id <> table.id)

备份是强制性的:-)和交易。

Answer 2

有一种简单的方法可以做到这一点。只需放置类似

的内容

id NOT IN (select id from table group by string)

在你的where语句中，它只会选择重复项

Answer 3

首先选择count > 0的选项，然后选择所需的总和：

select * from (
    select count(*), string_col, sum(int_col_1), sum(int_col_2)
    from my_table
    group by string_col
) as foo where count > 1

之后，我会将该数据放入临时表中，删除您不想要的行，并将临时表中的数据插入到原始表中。

Answer 4

你可以在两个查询中完成所有操作，没有临时表。但是您需要重复运行DELETE查询，因为它一次只能删除1个重复项。因此，如果一行有3个副本，则需要运行两次。但是你可以运行它直到没有更多的结果。

更新您要保留的重复行以包含计数/总和。

UPDATE tablename JOIN (
   SELECT min(id) id,sum(int1) int1,sum(int2) int2 
   FROM tablename GROUP BY string HAVING c>1
) AS dups ON tablename.id=dups.id
SET tablename.int1=dups.int1, tablename.int2

然后，您可以使用多表语法在DELETE查询中使用相同的SELECT查询。

DELETE tablename FROM tablename 
JOIN (SELECT max(id) AS id,count(*) c FROM tablename GROUP BY string HAVING c>1) dups
ON tablename.id=dups.id

只需运行DELETE，直到没有返回任何行（0个受影响的行）。

Answer 5

如果您可以阻止其他用户更新表格，那么这很容易。

-- We're going to add records before deleting old ones, so keep track of which records are old.
DECLARE @OldMaxID INT
SELECT @OldMaxID = MAX(ID) FROM table

-- Combine duplicate records into new records
INSERT table (string, int1, int2)
SELECT string, SUM(int1), SUM(int2)
FROM table
GROUP BY string
HAVING COUNT(*) > 1

-- Delete records that were used to make combined records.
DELETE FROM table
WHERE ID <= @OldMaxID
GROUP BY string
HAVING COUNT(*) > 1

Answer 6

您可以在VIEW中获取此信息：

 CREATE VIEW SummarizedData (StringCol, IntCol1, IntCol2, OriginalRowCount) AS
    SELECT StringCol, SUM(IntCol1), SUM(IntCol2), COUNT(*)
    FROM TableName
    GROUP BY StringCol

这将创建一个包含所需信息的虚拟表。它将包括只有一个StringCol值实例的行 - 如果你真的不希望那些将短语HAVING COUNT(*) > 1添加到查询的末尾。

使用此方法，您可以维护原始表并只读取汇总数据，也可以创建一个空表结构，其中包含相应的列和INSERT SummarizedData到您的新表中以获取带有数据的“真实”表。

如何从数据库中删除重复项？

6 个答案: