删除除了一个重复行之外的所有行,其中datetime列值在彼此的秒内?

时间:2010-09-17 19:02:14

标签: sql-server-2005 datetime duplicate-removal

由于系统中的错误,跟踪日志反复触发,导致应该有一个日志条目实际上是数百个。这已经解决但数据仍然存在并且需要用于报告(我不能只删除它)。但是我只想要一个数据实例。我认为这很棘手,这里是表格中的相关字段:

int UserID,int ActorID,nvarchar(50)ActorType,int BoxID,datetime CreateDate,nvarchar(50)Query

现在,对于所有这些都相同且CreateDate的差异在30秒内的每一行,我想删除除了一行之外的所有行。

因此,列出的字段中的所有数据都将完全匹配,CreateDate的范围如下:

2010-08-17 14:50:11.620
2010-08-17 14:50:11.823
2010-08-17 14:50:12.057
2010-08-17 14:50:12.277
2010-08-17 14:50:12.527
2010-08-17 14:50:12.730
2010-08-17 14:50:12.980
2010-08-17 14:50:13.340
2010-08-17 14:50:13.450
2010-08-17 14:50:13.667
2010-08-17 14:50:13.887
2010-08-17 14:50:14.120
2010-08-17 14:50:14.323
2010-08-17 14:50:14.730
2010-08-17 14:50:14.807
2010-08-17 14:50:15.010
2010-08-17 14:50:15.357
...
2010-08-17 14:51:09.810
2010-08-17 14:51:10.047
2010-08-17 14:51:10.250
2010-08-17 14:51:10.500
2010-08-17 14:51:10.890
2010-08-17 14:51:10.953
2010-08-17 14:51:11.263
2010-08-17 14:51:11.437
2010-08-17 14:51:11.920
2010-08-17 14:51:12.170
2010-08-17 14:51:12.217
2010-08-17 14:51:12.420
2010-08-17 14:51:12.670
2010-08-17 14:51:12.873
2010-08-17 14:51:13.123
2010-08-17 14:51:13.373
2010-08-17 14:51:13.577
2010-08-17 14:51:13.797
2010-08-17 14:51:14.030
2010-08-17 14:51:14.280
2010-08-17 15:29:19.180
2010-08-17 15:32:32.497
2010-08-17 15:32:32.733
2010-08-17 15:32:32.967
2010-08-17 15:32:33.263
2010-08-17 15:32:33.513
2010-08-17 15:32:33.623
2010-08-17 15:32:33.857
2010-08-17 15:32:34.140
2010-08-17 15:32:34.327
2010-08-17 15:32:34.560
2010-08-17 15:32:34.780
2010-08-17 15:32:35.043
2010-08-17 15:32:35.247
2010-08-17 15:32:35.483
2010-08-17 15:32:35.717

但我只想保留一个,我希望这是足够的信息。

2 个答案:

答案 0 :(得分:1)

以下是如何从按30秒范围分组的每组记录中获取一行。此查询可用于查看您将在表中保留哪些行。

WITH cte AS
    ( SELECT UserID, ActorID, ActorType, BoxID, Query, CreateDate,
        DATEDIFF(ss, '1/1/2000', CreateDate) / 30 AS CreateDateGroup,
        ROW_NUMBER() OVER (PARTITION BY UserID, ActorID, ActorType, BoxID, Query,
                                     DATEDIFF(ss, '1/1/2000', CreateDate) / 30
                           ORDER BY CreateDate ASC) AS sequence
    FROM TrackingLog
    )

SELECT UserID, ActorID, ActorType, BoxID, Query, CreateDate, CreateDateGroup, sequence
FROM cte
WHERE sequence = 1

在公用表表达式(CTE)中生成两列。通过将CreateDate值转换为自“1/1/2000”以来的秒数并除以30(以秒为单位)来计算CreateDateGroup列。结果是一个整数,因此小数部分被截断。

序列列是组内的行号,按CreateDate按升序排序。因此,每个组中最早的日期将是序列1。

主查询包含WHERE sequence = 1,表示您希望查看每个组中的第一行。

当您准备删除不需要的行时,您将更改主查询,如下所示:

WITH cte AS
    ( SELECT UserID, ActorID, ActorType, BoxID, Query, CreateDate,
        DATEDIFF(ss, '1/1/2000', CreateDate) / 30 AS CreateDateGroup,
        ROW_NUMBER() OVER (PARTITION BY UserID, ActorID, ActorType, BoxID, Query,
                                     DATEDIFF(ss, '1/1/2000', CreateDate) / 30
                           ORDER BY CreateDate ASC) AS sequence
    FROM TrackingLog
    )

DELETE
FROM cte
WHERE sequence > 1
;

此命令将删除表中不是每组第一行的所有行。

答案 1 :(得分:0)

除了时间戳之外的所有字段分组并获取max(timestamp_field)值?