将日期范围过滤到连续范围而不重叠

时间:2018-02-07 09:57:42

标签: sql sql-server tsql

假设我的范围代表天,周,月,季和年。我希望从中获得非重叠范围,涵盖最大总周期数,同时也使用最小范围。

例如,我可能有四个星期的一月,二月和三月,以及2,3,4的季度加起来一年,这将是好的。但是,如果我错过了2月份的月度数据,那么我将不得不使用第一季度,如果那也缺少那么年份数据。

示例输入

PeriodName  StartDate   EndDate
Jan 05  2005-01-01 00:00:00.000 2005-01-31 00:00:00.000
Q1 05   2005-01-01 00:00:00.000 2005-03-31 00:00:00.000
Yr 2005 2005-01-01 00:00:00.000 2005-12-31 00:00:00.000
Feb 05  2005-02-01 00:00:00.000 2005-02-28 00:00:00.000
Mar 05  2005-03-01 00:00:00.000 2005-03-31 00:00:00.000
Apr 05  2005-04-01 00:00:00.000 2005-04-30 00:00:00.000
Q2 05   2005-04-01 00:00:00.000 2005-06-30 00:00:00.000
May 05  2005-05-01 00:00:00.000 2005-05-31 00:00:00.000
Jul 05  2005-07-01 00:00:00.000 2005-07-31 00:00:00.000
Q3 05   2005-07-01 00:00:00.000 2005-09-30 00:00:00.000
Q4 05   2005-10-01 00:00:00.000 2005-12-31 00:00:00.000

输出:

PeriodName  StartDate   EndDate
Jan 05  2005-01-01 00:00:00.000 2005-01-31 00:00:00.000
Feb 05  2005-02-01 00:00:00.000 2005-02-28 00:00:00.000
Mar 05  2005-03-01 00:00:00.000 2005-03-31 00:00:00.000
Q2 05   2005-04-01 00:00:00.000 2005-06-30 00:00:00.000
Q3 05   2005-07-01 00:00:00.000 2005-09-30 00:00:00.000
Q4 05   2005-10-01 00:00:00.000 2005-12-31 00:00:00.000

3 个答案:

答案 0 :(得分:0)

@IvanAnatolievich,这当然有点棘手。

我不是在我的台式电脑上测试这个,但我接近这样的事情:

WITH step1 AS
(
    SELECT
        *
        ,DATEDIFF(DAY, StartDate, EndDate) AS days_in_period
    FROM [YOUR_SOURCE_TABLE]
)
,step2 AS
(
    SELECT
        *
        ,CASE 
            WHEN days_in_period BETWEEN 365 AND 366 THEN 
                'Y'
            WHEN days_in_period BETWEEN 90 AND 92 THEN 
                'Q'
            WHEN days_in_period BETWEEN 28 AND 31 THEN 
                'M'
            ELSE 
                'X' --this should indicate something wrong
            END AS period_type      
    FROM step1
)
,step3 AS
(
    SELECT
        *
        ,CASE 
            WHEN period_type IN ('Y') THEN 
                DATEPART(YEAR, StartDate) 
            ELSE 
                NULL 
            END AS yr
        ,CASE 
            WHEN period_type IN ('Y', 'Q') THEN 
                DATEPART(QUARTER, StartDate) 
            ELSE 
                NULL 
            END AS qtr
        ,CASE 
            WHEN period_type IN ('Y', 'Q', 'M') THEN    
                DATEPART(MONTH, StartDate)
            ELSE 
                NULL 
            END AS mth
    FROM step2
)
,step4 AS
(
    SELECT
        *
        ,IIF(period_type = 'Y', COUNT(qtr) OVER (PARTITION BY yr), NULL) AS qtrs_in_yr
        ,IIF(period_type = 'Q', COUNT(mth) OVER (PARTITION BY yr, qtr), NULL) AS mths_in_qtr
    FROM step3
)

SELECT
    COALESCE(mths.period_type, qtrs.period_type, yrs.period_type) AS period_type
    ,COALESCE(mths.PeriodName, qtrs.PeriodName, yrs.PeriodName) AS PeriodName
    ,COALESCE(mths.StartDate, qtrs.StartDate, yrs.StartDate) AS StartDate
    ,COALESCE(mths.EndDate, qtrs.EndDate, yrs.EndDate) AS EndDate

FROM
    (SELECT * FROM step4 WHERE period_type = 'Y') AS yrs

LEFT JOIN
    (SELECT * FROM step4 WHERE period_type = 'Q') AS qtrs
    ON (qtrs.yr = yrs.yr)
    AND (yrs.qtrs_in_yr = 4)

LEFT JOIN
    (SELECT * FROM step4 WHERE period_type = 'M') AS mths
    AND (mths.yr = yrs.yr)
    AND (mths.qtr = qtrs.qtr)
    AND (qtrs.mths_in_qtr = 3)

我假设源表中的较高范围涵盖了每个较低范围。换句话说,所有年份都有一个全年入学,也许每年都有一些单独的宿舍,也许每个季度可能有一些个月。

前三个步骤只是为了建立我们正在处理什么样的时期并提取一些参数的仪式,但我在step_4中所做的是计算每个更高的子范围范围有。我们不需要计算月份范围,因为这些不会进一步细分。

step_5中,我从一个仅包含全年行的表开始。然后,我有条件地加入了今年的季度,如果这一年包括4个季度。

如果没有宿舍或少于4宿舍,那么它只会将年份行保留为单行。如果有4个季度,则年份行将扩展为4个季度行。我对每个季度的月份再次遵循相同的逻辑。这给我们留下了3组相同的列。

最后,在选择中,我从右向左移动以将这三个列组压缩成一个,这应该给我们留下所需的结果。

如果有任何问题,请告诉我。

答案 1 :(得分:0)

我的脚本仅适用于月度,季度和年度。

脚本可以整齐地编写,并且可以在&#34之后进行优化;要求输出清晰"。 请抛出真实的表结构和数据。

示例数据,

create table #maintable(PeriodName varchar(20)
,StartDate datetime,EndDate datetime)

insert into #maintable VALUES
('Jan 05'  ,'2005-01-01 00:00:00.000','2005-01-31 00:00:00.000')
,('Q1 05 '  ,'2005-01-01 00:00:00.000','2005-03-31 00:00:00.000')
,('Yr 2005' ,'2005-01-01 00:00:00.000','2005-12-31 00:00:00.000')
,('Feb 05'  ,'2005-02-01 00:00:00.000','2005-02-28 00:00:00.000')
,('Mar 05'  ,'2005-03-01 00:00:00.000','2005-03-31 00:00:00.000')
,('Apr 05'  ,'2005-04-01 00:00:00.000','2005-04-30 00:00:00.000')
,('Q2 05'   ,'2005-04-01 00:00:00.000','2005-06-30 00:00:00.000')
,('May 05'  ,'2005-05-01 00:00:00.000','2005-05-31 00:00:00.000')
,('Jul 05'  ,'2005-07-01 00:00:00.000','2005-07-31 00:00:00.000')
,('Q3 05'   ,'2005-07-01 00:00:00.000','2005-09-30 00:00:00.000')
,('Q4 05'   ,'2005-10-01 00:00:00.000','2005-12-31 00:00:00.000')

条件,

DECLARE @StartDate date='2005-01-01'
DECLARE @EndDate date='2005-12-31'


;with CTE as
(
select PeriodName ,StartDate ,EndDate 
,case when DATEDIFF(month,startdate,enddate)=0 then 
case when month(startdate) in (1,2,3) then 'Q1 ' +cast(right(year(@StartDate),2) as char(2))
when month(startdate) in (4,5,6) then 'Q2 ' +cast(right(year(@StartDate),2) as char(2))
when month(startdate) in (7,8,9) then 'Q3 ' +cast(right(year(@StartDate),2) as char(2))
when month(startdate) in (10,11,12) then 'Q4 ' +cast(right(year(@StartDate),2) as char(2))
END
when DATEDIFF(month,startdate,enddate)=2 
then
case when month(startdate)=1
then 'Q1 '+cast(right(year(@StartDate),2) as char(2))
when month(startdate)=4
then 'Q2 '+cast(right(year(@StartDate),2) as char(2))
when month(startdate)=7
then 'Q3 '+cast(right(year(@StartDate),2) as char(2))
when month(startdate)=10
then 'Q4 '+cast(right(year(@StartDate),2) as char(2))
END
when DATEDIFF(month,startdate,enddate)=11 
then 'Yr '+ cast(year(@StartDate) as char(4))
END MonthGroup
,case when DATEDIFF(month,startdate,enddate)=2 
then
'Q'
when DATEDIFF(month,startdate,enddate)=11 
then 'Yr ' + cast(year(@StartDate) as char(4))
END QuarterGroup
,case when DATEDIFF(month,startdate,enddate)=11 
then
'Yr ' + cast(year(@StartDate) as char(4))

END YrGroup
from #maintable
where StartDate>=@StartDate and EndDate<=@EndDate
--and PeriodName not in('Feb 05','Q2 05')
)
,GroupCTE AS
(
select MonthGroup,count(*) cnt
from CTE C
group by MonthGroup
having count(*)=4
)
,QuarterCTE AS
(
select QuarterGroup,count(*) Qcnt
from CTE C
where QuarterGroup is not null
group by QuarterGroup
--having count(*)=4

答案 2 :(得分:0)

此答案仅适用于月/季/年时间粒度,因为这是所提供的样本数据的范围。这可以很容易地修改为包括日/周的考虑因素。

示例数据:

包含问题中提供的示例数据的create / insert逻辑。

create table #sample_data
    (
        PeriodName varchar(10)
        , StartDate datetime
        , EndDate datetime
    )

insert into #sample_data
values ('Jan 05', '2005-01-01 00:00:00.000', '2005-01-31 00:00:00.000')
    , ('Q1 05', '2005-01-01 00:00:00.000', '2005-03-31 00:00:00.000')
    , ('Yr 2005', '2005-01-01 00:00:00.000', '2005-12-31 00:00:00.000')
    , ('Feb 05', '2005-02-01 00:00:00.000', '2005-02-28 00:00:00.000')
    , ('Mar 05', '2005-03-01 00:00:00.000', '2005-03-31 00:00:00.000')
    , ('Apr 05', '2005-04-01 00:00:00.000', '2005-04-30 00:00:00.000')
    , ('Q2 05', '2005-04-01 00:00:00.000', '2005-06-30 00:00:00.000')
    , ('May 05', '2005-05-01 00:00:00.000', '2005-05-31 00:00:00.000')
    , ('Jul 05', '2005-07-01 00:00:00.000', '2005-07-31 00:00:00.000')
    , ('Q3 05', '2005-07-01 00:00:00.000', '2005-09-30 00:00:00.000')
    , ('Q4 05', '2005-10-01 00:00:00.000', '2005-12-31 00:00:00.000')

<强>答案:

以下查询中的一个假设是:

  • 如果PeriodName级别为n,则无论在预期输出中使用是什么,都会在PeriodName级别封装n+1,其中级别为n &LT; kk是最高级别。

上述假设的例子如下:

  • 由于存在Jan 05,因此会有相应的Q1 05记录。
  • 由于存在Jul 05,因此会有相应的Q3 05记录。
  • 由于Yr 2005存在于最高级别,因此没有封装PeriodName

查询以及评论解释如下。

; with base as
    (
        --Calculate a couple column values for use in later steps
        select sd.PeriodName
        , sd.StartDate
        , sd.EndDate
        , datediff(d, sd.StartDate, sd.EndDate) + 1 as DayCnt
        , case when datediff(d, sd.StartDate, sd.EndDate) + 1 between 28 and 31 then 1 --month record
               when left(sd.PeriodName, 1) = 'Q' then 2 --quarter record
               when left(sd.PeriodName, 1) = 'Y' then 3 --annual record
          end as PrefOrd
        from #sample_data as sd 
    )
    , check_upward as
    (
        --Determine which level n records, account for the duration of PeriodName at level n + 1 
        select a.PeriodName
        , a.StartDate
        , a.EndDate
        , a.PrefOrd
        , a.DayCnt
        , case when b.DayCnt is null then 1 --highest possible level
                when b.DayCnt = sum(a.DayCnt) over (partition by b.PeriodName) then 1
                else 0
            end as EligibleRecFlag
        from base as a
        left join base as b on a.PrefOrd + 1 = b.PrefOrd
                            and a.StartDate >= b.StartDate
                            and a.EndDate <= b.EndDate  
    )
    , check_downard as
    (
        --Determine if there are eligible records that 
        select distinct a.PeriodName, case when sum(b.DayCnt) over (partition by a.PeriodName) >= a.DayCnt then 0 else 1 end as EligibleRecFlag
        from check_upward as a
        inner join check_upward as b on a.PrefOrd > b.PrefOrd
                                 and a.StartDate <= b.StartDate
                                 and a.EndDate >= b.EndDate
        where a.EligibleRecFlag = 1
        and b.EligibleRecFlag = 1   
    )
--Final Select statement
select a.PeriodName
, a.StartDate
, a.EndDate
from check_upward as a
left join check_downard as b on a.PeriodName = b.PeriodName
                            and b.EligibleRecFlag = 0
where b.PeriodName is NULL
and a.EligibleRecFlag = 1
order by 2, 3

<强>输出:

+------------+-------------------------+-------------------------+
| PeriodName |        StartDate        |         EndDate         |
+------------+-------------------------+-------------------------+
| Jan 05     | 2005-01-01 00:00:00.000 | 2005-01-31 00:00:00.000 |
| Feb 05     | 2005-02-01 00:00:00.000 | 2005-02-28 00:00:00.000 |
| Mar 05     | 2005-03-01 00:00:00.000 | 2005-03-31 00:00:00.000 |
| Q2 05      | 2005-04-01 00:00:00.000 | 2005-06-30 00:00:00.000 |
| Q3 05      | 2005-07-01 00:00:00.000 | 2005-09-30 00:00:00.000 |
| Q4 05      | 2005-10-01 00:00:00.000 | 2005-12-31 00:00:00.000 |
+------------+-------------------------+-------------------------+