查找跨越多列的数字范围中的差距

时间:2012-08-31 02:35:04

标签: sql-server tsql query-optimization

我正在尝试识别一系列数字(SQL Server)中的差距。我的情况如下......

ID   Start   End
1      1      4
2      1      6
3      2      4
4      8     10
5     13     14

Visual
-------------------------------
1-2-3-4
1-2-3-4-5-6
  2-3-4
           - -8-9-10
                    - - -13-14

这样做的结果可能是:

Table
-------------------------------
ID   Start   End   Gap
4      8     10    -1
5     13     14    -2

最终,我希望有差距范围,但我应该能够从上面看出来......

Missing
7
11-12

我提出的解决方案要么太慢,要么不考虑范围内的重叠(例如ID 2)

CREATE TABLE #Docs (
  [Rank] INT, --DENSE_RANK () OVER(ORDER BY BegProd)
  ControlNumber BIGINT,
  BegProd INT,
  EndProd  INT
)

SELECT
  T1.ControlNumber,
  T1.BegProd,
  T1.EndProd,
  MAX(T2.EndProd) AS [PreviousEndProd],
  [Gap] = T1.BegProd - MAX(T2.EndProd) - 1
FROM #Docs T1
INNER JOIN #Docs T2
  ON T1.[Rank] = T2.[Rank] + 1
  AND T1.EndProd > T2.EndProd
GROUP BY T1.ControlNumber, T1.BegProd, T1.EndProd
HAVING T1.BegProd - MAX(T2.EndProd) > 1

此表中有超过200万行,范围跨度为1到10亿

修改的 修复了“遗失”表格。 间隙列表示在该起始编号之前有多少间隙。 (缺少#7是1号)

1 个答案:

答案 0 :(得分:1)

试试这个:

create table #docs(id int, start int, [end] int)
insert #docs values(1,1,4),(2,1,6),(3,2,4),(4,8,10),(5,13,14)

;with a as
(
select start, dense_rank() over (order by start) rn
from #docs t where not exists (select 1 from #docs where t.start > start and t.start < [end])
group by start
), b as
(
select [end], dense_rank() over (order by [end]) rn
from #docs t where not exists (select 1 from #docs where t.[end] > start and t.[end] < [end])
group by [end]
)
select 
case when a.[start]= b.[end]+2 then cast(a.start-1 as varchar(21)) 
else cast(b.[end]+1 as varchar(10)) +'-' +  cast(a.start - 1 as varchar(10)) end missing
from a join b on a.rn - 1 = b.rn
and a.[start] <> b.[end] + 1

结果:

Missing
7
11-12