使用SQL缩短时间段

时间:2016-02-19 05:56:12

标签: sql sql-server sql-server-2012

我有一个大数据集,为了这个问题的目的有3个字段:

  • 组标识符
  • 从日期
  • 至今

在任何给定的行上,From Date将始终小于To Date但在每个组中,由日期对表示的时间段(没有特定顺序)可以重叠,包含一个在另一个,甚至是相同的。

我最终想要的是一个查询,它将每个组的结果压缩到连续的时间段。例如,一个看起来像这样的组:

| Group ID | From Date  | To Date    |
--------------------------------------
| A        | 01/01/2012 | 12/31/2012 |
| A        | 12/01/2013 | 11/30/2014 |
| A        | 01/01/2015 | 12/31/2015 |
| A        | 01/01/2015 | 12/31/2015 |
| A        | 02/01/2015 | 03/31/2015 |
| A        | 01/01/2013 | 12/31/2013 |

会导致:

| Group ID | From Date  | To Date    |
--------------------------------------
| A        | 01/01/2012 | 11/30/2014 |
| A        | 01/01/2015 | 12/31/2015 |

我已经阅读了很多关于日期包装的文章,但我无法弄清楚如何将其应用到我的数据集中。

如何构建一个能够给我这些结果的查询?

4 个答案:

答案 0 :(得分:3)

“Microsoft®SQLServer®2012使用窗口函数的高性能T-SQL”一书中提供的解决方案

;with C1 as(
select GroupID, FromDate as ts, +1 as type, 1 as sub
  from dbo.table_name
union all
select GroupID, dateadd(day, +1, ToDate) as ts, -1 as type, 0 as sub
  from dbo.table_name),
C2 as(
select C1.*
     , sum(type) over(partition by GroupID order by ts, type desc
                      rows between unbounded preceding and current row) - sub as cnt
  from C1),
C3 as(
select GroupID, ts, floor((row_number() over(partition by GroupID order by ts) - 1) / 2 + 1) as grpnum
  from C2
  where cnt = 0)

select GroupID, min(ts) as FromDate, dateadd(day, -1, max(ts)) as ToDate
  from C3
  group by GroupID, grpnum;

创建表格:

if object_id('table_name') is not null
  drop table table_name
create table table_name(GroupID varchar(100), FromDate datetime,ToDate datetime)
insert into table_name
select 'A', '01/01/2012', '12/31/2012' union all
select 'A', '12/01/2013', '11/30/2014' union all
select 'A', '01/01/2015', '12/31/2015' union all
select 'A', '01/01/2015', '12/31/2015' union all
select 'A', '02/01/2015', '03/31/2015' union all
select 'A', '01/01/2013', '12/31/2013'

答案 1 :(得分:2)

; with 
cte as
(
    select  *, rn = row_number() over (partition by [Group ID] order by [From Date])
    from    tbl
),
rcte as
(
    select  rn, [Group ID], [From Date], [To Date], GrpNo = 1, GrpFrom = [From Date], GrpTo = [To Date]
    from    cte
    where   rn  = 1

    union all

    select  c.rn, c.[Group ID], c.[From Date], c.[To Date], 
        GrpNo = case    when    c.[From Date] between r.GrpFrom and dateadd(day, 1, r.GrpTo)
                or  c.[To Date]   between r.GrpFrom and r.GrpTo
                then    r.GrpNo
                else    r.GrpNo + 1
                end,
        GrpFrom= case   when    c.[From Date] between r.GrpFrom and dateadd(day, 1, r.GrpTo)
                or  c.[To Date]   between r.GrpFrom and r.GrpTo
                then    case when c.[From Date] > r.GrpFrom then c.[From Date] else r.GrpFrom end
                else    c.[From Date] 
                end,
        GrpTo  = case   when    c.[From Date] between r.GrpFrom and dateadd(day, 1, r.GrpTo)
                or  c.[To Date]   between r.GrpFrom and dateadd(day, 1, r.GrpTo)
                then    case when c.[To Date] > r.GrpTo then c.[To Date] else r.GrpTo end
                else    c.[To Date]  
                end

    from    rcte r
        inner join cte c    on  r.[Group ID]    = c.[Group ID]
                    and r.rn        = c.rn - 1
)
select  [Group ID], min(GrpFrom), max(GrpTo)
from    rcte
group by [Group ID], GrpNo

答案 2 :(得分:2)

我使用Calendar表。这张表只列出了几十年的日期列表。

CREATE TABLE [dbo].[Calendar](
    [dt] [date] NOT NULL,
CONSTRAINT [PK_Calendar] PRIMARY KEY CLUSTERED 
(
    [dt] ASC
))

populate such table的方法有很多种。

例如,从1900-01-01开始的100K行(~270年):

INSERT INTO dbo.Calendar (dt)
SELECT TOP (100000) 
    DATEADD(day, ROW_NUMBER() OVER (ORDER BY s1.[object_id])-1, '19000101') AS dt
FROM sys.all_objects AS s1 CROSS JOIN sys.all_objects AS s2
OPTION (MAXDOP 1);

获得Calendar表后,以下是如何使用它。

每个原始行都与Calendar表连接,以返回与From和To之间的日期一样多的行。

然后删除可能的重复项。

然后通过在两个序列中对行进行编号来获得经典的间隙和岛屿。

然后将找到的岛屿分组在一起以获得新的From和To。

示例数据

我添加了第二组。

DECLARE @T TABLE (GroupID int, FromDate date, ToDate date);
INSERT INTO @T (GroupID, FromDate, ToDate) VALUES
(1, '2012-01-01', '2012-12-31'),
(1, '2013-12-01', '2014-11-30'),
(1, '2015-01-01', '2015-12-31'),
(1, '2015-01-01', '2015-12-31'),
(1, '2015-02-01', '2015-03-31'),
(1, '2013-01-01', '2013-12-31'),
(2, '2012-01-01', '2012-12-31'),
(2, '2013-01-01', '2013-12-31');

<强>查询

WITH
CTE_AllDates
AS
(
    SELECT DISTINCT
        T.GroupID
        ,CA.dt
    FROM
        @T AS T
        CROSS APPLY
        (
            SELECT dbo.Calendar.dt
            FROM dbo.Calendar
            WHERE
                dbo.Calendar.dt >= T.FromDate
                AND dbo.Calendar.dt <= T.ToDate
        ) AS CA
)
,CTE_Sequences
AS
(
    SELECT
        GroupID
        ,dt
        ,ROW_NUMBER() OVER(PARTITION BY GroupID ORDER BY dt) AS Seq1
        ,DATEDIFF(day, '2001-01-01', dt) AS Seq2
        ,DATEDIFF(day, '2001-01-01', dt) - 
            ROW_NUMBER() OVER(PARTITION BY GroupID ORDER BY dt) AS IslandNumber
    FROM CTE_AllDates
)
SELECT
    GroupID
    ,MIN(dt) AS NewFromDate
    ,MAX(dt) AS NewToDate
FROM CTE_Sequences
GROUP BY GroupID, IslandNumber
ORDER BY GroupID, NewFromDate;

<强>结果

+---------+-------------+------------+
| GroupID | NewFromDate | NewToDate  |
+---------+-------------+------------+
|       1 | 2012-01-01  | 2014-11-30 |
|       1 | 2015-01-01  | 2015-12-31 |
|       2 | 2012-01-01  | 2013-12-31 |
+---------+-------------+------------+

答案 3 :(得分:0)

一种几何方法

在这里和elsewhere,我注意到该日期打包问题 没有提供解决此问题的几何方法。毕竟, 任何范围(包括日期范围)都可以解释为一行。 那么为什么不将它们转换为sql几何类型并利用 geometry::UnionAggregate合并范围。所以我刺了一下 在您的帖子中看到它。

代码说明

在“数字”中:

  • 我建立了一个表示序列的表
  • 以您最喜欢的方式将其替换成数字表。
  • 对于联合操作,您将永远不需要更多的行 您原始的表格,所以我只是以它为基础来构建它。

在“ mergeLines”中:

  • 我将日期转换为浮点数并使用这些浮点数 创建几何点。
  • 在这个问题上,我们正在 “整数空间”,这意味着没有时间因素,因此 与结束日期相隔一天的某个范围内的开始日期 在另一个应该与另一个合并。为了要做 合并发生后,我们需要转换为“真实空间”,因此我们 在所有范围的末尾加1(我们稍后会撤消)。
  • 然后我通过STUnion和STEnvelope连接这些点。
  • 最后,我通过UnionAggregate合并所有这些行。所结果的 “线”几何对象可能包含多条线,但是如果它们 重叠,它们变成一行。

在外部查询中:

  • 我使用数字CTE提取“行”中的各个行。
  • 我将这些行包裹起来,以确保在此处存储行 仅作为其两个端点。
  • 我读取了端点x值并将其转换回其时间 表示形式,确保将它们放回“整数空间”。

代码

with

    numbers as (

        select  row_number() over (order by (select null)) i 
        from    @spans -- Where I put your data

    ),

    mergeLines as (

        select      groupId,
                    lines = geometry::UnionAggregate(line)
        from        @spans
        cross apply (select 
                        startP = geometry::Point(convert(float,fromDate), 0, 0),
                        stopP = geometry::Point(convert(float,toDate) + 1, 0, 0)
                    ) pointify
        cross apply (select line = startP.STUnion(stopP).STEnvelope()) lineify
        group by    groupId 

    )

    select      groupId, fromDate, toDate 
    from        mergeLines ml
    join        numbers n on n.i between 1 and ml.lines.STNumGeometries()
    cross apply (select line = ml.lines.STGeometryN(i).STEnvelope()) l
    cross apply (select 
                    fromDate = convert(datetime, l.line.STPointN(1).STX),
                    toDate = convert(datetime, l.line.STPointN(3).STX) - 1
                ) unprepare
    order by    groupId, fromDate;