使用RANK()进行计数时跳过空值

时间:2018-08-13 17:32:18

标签: sql sql-server tsql

给出一组行,其中一个字段有时为null,有时不为:

SELECT 
   Date, TheThing
FROM MyData
ORDER BY Date


Date                     TheThing
-----------------------  --------
2016-03-09 08:17:29.867  a
2016-03-09 08:18:33.327  a
2016-03-09 14:32:01.240  NULL
2016-10-21 19:53:49.983  NULL
2016-11-12 03:25:21.753  b
2016-11-24 07:43:24.483  NULL
2016-11-28 16:06:23.090  b
2016-11-28 16:09:07.200  c
2016-12-10 11:21:55.807  c

我想要一个排名列来计算非空值:

Date                     TheThing  DesiredTotal
-----------------------  --------  ------------
2016-03-09 08:17:29.867  a         1
2016-03-09 08:18:33.327  a         2
2016-03-09 14:32:01.240  NULL      2 <---notice it's still 2 (good)
2016-10-21 19:53:49.983  NULL      2 <---notice it's still 2 (good)
2016-11-12 03:25:21.753  b         3
2016-11-24 07:43:24.483  NULL      3 <---notice it's still 3 (good)
2016-11-28 16:06:23.090  b         4
2016-11-28 16:09:07.200  c         5
2016-12-10 11:21:55.807  c         6

我尝试明显的方法:

SELECT 
   Date, TheThing, 
   RANK() OVER(ORDER BY Date) AS Total
FROM MyData
ORDER BY Date

但是RANK()计数为空:

Date                     TheThing  Total
-----------------------  --------  -----
2016-03-09 08:17:29.867  a         1
2016-03-09 08:18:33.327  a         2
2016-03-09 14:32:01.240  NULL      3 <--- notice it is 3 (bad)
2016-10-21 19:53:49.983  NULL      4 <--- notice it is 4 (bad)
2016-11-12 03:25:21.753  b         5 <--- and all the rest are wrong (bad)
2016-11-24 07:43:24.483  NULL      7
2016-11-28 16:06:23.090  b         8
2016-11-28 16:09:07.200  c         9
2016-12-10 11:21:55.807  c         10

我如何指示RANK()(或DENSE_RANK())不计算空值?

您尝试过使用PARTITION吗?

为什么是!更糟糕的是:

SELECT 
   Date, TheThing, 
   RANK() OVER(PARTITION BY(CASE WHEN TheThing IS NOT NULL THEN 1 ELSE 0 END) ORDER BY Date) AS Total
FROM MyData
ORDER BY Date

但是RANK()计数为空:

Date                     TheThing  Total
-----------------------  --------  -----
2016-03-09 08:17:29.867  a         1
2016-03-09 08:18:33.327  a         2
2016-03-09 14:32:01.240  NULL      1 <--- reset to 1?
2016-10-21 19:53:49.983  NULL      2 <--- why go up?
2016-11-12 03:25:21.753  b         3 
2016-11-24 07:43:24.483  NULL      3 <--- didn't reset?
2016-11-28 16:06:23.090  b         4 
2016-11-28 16:09:07.200  c         5
2016-12-10 11:21:55.807  c         6

现在我随机输入东西-疯狂的弹奏。

SELECT 
   Date, TheThing, 
   RANK() OVER(PARTITION BY(CASE WHEN TheThing IS NOT NULL THEN 1 ELSE NULL END) ORDER BY Date) AS Total
FROM MyData
ORDER BY Date

SELECT 
   Date, TheThing, 
   DENSE_RANK() OVER(PARTITION BY(CASE WHEN TheThing IS NOT NULL THEN 1 ELSE NULL END) ORDER BY Date) AS Total
FROM MyData
ORDER BY Date

编辑:有了所有答案,花了很多次迭代才能找到我不需要的所有边缘情况。最后,我概念上想要的是OVER(),以便计数。我不知道OVER是否适用于RANK(和DENSE_RANK)以外的任何事物。

http://sqlfiddle.com/#!18/c6d87/1

奖金阅读

6 个答案:

答案 0 :(得分:4)

我认为您正在寻找累计数量:

SELECT Date, TheThing, 
       COUNT(theThing) OVER (ORDER BY Date) AS Total
FROM MyData
ORDER BY Date;

答案 1 :(得分:3)

尝试一下:

declare @tbl table (dt datetime, col int);
insert into @tbl values
('2016-03-09 08:17:29.867', 1),
('2016-03-09 08:18:33.327', 1),
('2016-03-09 14:32:01.240', NULL),
('2016-10-21 19:53:49.983', NULL),
('2016-11-12 03:25:21.753', 1),
('2016-11-24 07:43:24.483', NULL),
('2016-11-28 16:06:23.090', 1),
('2016-11-28 16:09:07.200', 1),
('2016-12-10 11:21:55.807', 1);

select dt,
       col,
       sum(case when col is null then 0 else 1 end) over (order by dt) rnk
from @tbl

这个想法真的很简单:如果您将1分配给非null值,将0分配给该列为null的位置,则按日期排序的累积总和与排除null的排名完全一样。

其他方法是将RANKROW_NUMBER结合使用,这将尊重Date列中的联系,并且与RANK尊重NULL的工作方式完全相同:

select dt,
       col,
       case when col is not null then 
           rank() over (order by dt)
       else 
           rank() over (order by dt) - row_number() over (partition by rnDiff order by dt)
       end rnk
from (
    select dt,
           col,
           row_number() over (order by dt) -
               row_number() over (partition by coalesce(col, 0) order by dt) rnDiff
    from @tbl
) a
order by dt

答案 2 :(得分:1)

我的蜥蜴脑把我带到这里... sum()vs rank()

Select *
       ,NewCol = sum(sign(TheThing)) over (Order by Date)
       ,OrEven = sum(TheThing/TheThing) over (Order by Date)  
 From  MyData

返回

enter image description here

答案 3 :(得分:1)

NULL中减去rank()的当前计数怎么办?

SELECT date,
       thething,
       rank() OVER (ORDER BY date)
       -
       sum(CASE
             WHEN thething IS NULL THEN
               1
             ELSE
               0
           END) OVER (ORDER BY date) desiredtotal
       FROM mydata;

db<>fiddle

这还应保留rank()产生的重复项和空白,并且不需要子查询。

答案 4 :(得分:0)

我会使用subquery

SELECT [Date], TheThing,
       (SELECT COUNT(*)
        FROM MyData m
        WHERE m.[Date] <= m1.[Date] AND m.TheThing IS NOT NULL
       ) AS DesiredTotal
FROM MyData m1;

以类似的方式,您也可以尝试使用apply

SELECT *
FROM MyData m1 CROSS APPLY
    (SELECT COUNT(*) AS DesiredTotal
     FROM MyData m
     WHERE m.[Date] <= m1.[Date] AND m.TheThing IS NOT NULL
    ) m2;

答案 5 :(得分:0)

我使用CTE首先获取正确的日期,然后将排名应用于修改后的日期:

CREATE TABLE #tmp(dt datetime, TheThing int)

INSERT INTO #tmp VALUES('2016-03-09 08:17:29.867',  1)
INSERT INTO #tmp VALUES('2016-03-09 08:18:33.327',  1)
INSERT INTO #tmp VALUES('2016-03-09 14:32:01.240',  NULL)
INSERT INTO #tmp VALUES('2016-10-21 19:53:49.983',  NULL)
INSERT INTO #tmp VALUES('2016-11-12 03:25:21.753',  1)
INSERT INTO #tmp VALUES('2016-11-24 07:43:24.483',  NULL)
INSERT INTO #tmp VALUES('2016-11-28 16:06:23.090',  1)
INSERT INTO #tmp VALUES('2016-11-28 16:09:07.200',  1)
INSERT INTO #tmp VALUES('2016-12-10 11:21:55.807',  1)


;WITH CTE as (
SELECT 
CASE WHEN TheThing IS NULL THEN (SELECT MAX(dt) from #tmp OrigTbl where OrigTbl.dt <     SubTbl.dt and OrigTbl.TheThing IS NOT NULL) ELSE dt end dtMod,
SubTbl.dt,SubTbl.TheThing
   from #tmp SubTbl)
SELECT dt, TheThing, DENSE_RANK() over(ORDER BY dtMod) from CTE