Question

我花了很多时间处理以下内容：

想象一下，您有 N 组，每组记录有多条记录，每条记录都有唯一 starting和ending点。

换句话说：

ID|GroupName|StartingPoint|EndingPoint|seq(row_number)|desired_seq
__|_________|_____________|___________|_______________|____________
1 | Grp1    |2014-01-06   |2014-01-07 |1              |1
__|_________|_____________|___________|_______________|____________
2 | Grp1    |2014-01-07   | 2014-01-08|2              |2
__|_________|_____________|___________|_______________|____________
3 | Grp2    |2014-01-08   | 2014-01-09|1              |1
__|_________|_____________|___________|_______________|____________
4 | Grp1    |2014-01-09   | 2014-01-10|3              |1
__|_________|_____________|___________|_______________|____________
5 | Grp2    |2014-01-10   | 2014-01-11|2              |1
__|_________|_____________|___________|_______________|____________

如您所见，每个连续记录的starting point与前一个记录的ending point相同。

基本上，我想根据日期获得每组的minimumS and maximumS。一旦出现具有新组名的记录，则将其视为新组并重置排序。

单row_number()函数不足以完成此任务，因为它不反映组名的变化。（我在样本数据中包含了一个seq列，表示行号生成的值）

基于样本数据的期望结果：

1  Grp1    |2014-01-06   |  2014-01-08  
2  Grp2    |2014-01-08   |  2014-01-09
3  Grp1    |2014-01-09   |  2014-01-10
4  Grp2    |2014-01-10   |  2014-01-11

我尝试过：

;with cte as(
select *
, row_number() over (partition by GroupName order by startingpoint) as seq
from table1
)
select * 
into #temp2
from cte t1
left join cte t2 on t1.id=t2.id and t1.seq= t2.seq-1

select * 
,(select startingPoint from #temp2 t2 where t1.id=t2.id and t2.seq= (select MIN(seq) from #temp2) as Oldest
(select startingPoint from #temp2 t2 where t1.id=t2.id and t2.seq= (select MAX(seq) from #temp2) as MostRecent
from #temp2 t1

Answer 1

这是子组的gaps-and-islands问题。诀窍是按两个ROW_NUMBER（）值之间的差异进行分组，一个是分区的，一个是未分区的。

WITH t AS (
  SELECT
    GroupName,
    StartingPoint,
    EndingPoint,
    ROW_NUMBER() OVER(PARTITION BY GroupName ORDER BY StartingPoint)
      - ROW_NUMBER() OVER(ORDER BY StartingPoint) AS SubGroupId
  FROM #test
)
SELECT
  ROW_NUMBER() OVER (ORDER BY MIN(StartingPoint)) AS SortOrderId,
  GroupName                                       AS GroupName,
  MIN(StartingPoint)                              AS GroupStartingPoint,
  MAX(EndingPoint)                                AS GroupEndingPoint
FROM t
GROUP BY GroupName, SubGroupId
ORDER BY SortOrderId

Answer 2

不确定，但也许：

SELECT DISTINCT 
    GroupName, 
    MIN(StartingPoint) OVER (PARTITION BY GroupName ORDER BY Id), 
    MAX(EndingPoint) OVER (PARTITION BY GroupName ORDER BY Id)
FROM table1

由于partition不会导致行数减少，因此原始重复的条目会被distinct删除。

Answer 3

使用SQL Server 2012中的lag()功能，所以更容易。我解决这些问题的方法是找到组开始的位置，为每个组分配1或0的标志行。然后获取1 s的累积总和以获得新的组ID。

在SQL Server 2008中，您可以使用相关子查询（或联接）执行此操作：

with table1_flag as (
      select t1.*,
             isnull((select top 1 1
                     from table1 t2
                     where t2.groupname = t1.groupname and
                           t2.endingpoint = t1.startingpoint
                    ), 0) as groupstartflag
      from table1 t1
     ),
     table1_flag_cum as (
      select tf.*,
             (select sum(groupstartflag)
              from table1_flag tf2
              where tf2.groupname = tf.groupname and
                    tf2.startingpoint <= tf.startingpoint
             ) as groupnum
      from table1_flag tf
     )
select groupnum, groupname,
       min(startingpoint) as startingpoint, max(endingpoint) as endingpoint
from table1_flag_cum
group by groupnum, groupname;

确定N组的边界

3 个答案: