使用排名函数查找重复出现的事件

时间:2012-06-25 10:12:48

标签: sql sql-server sql-server-2008 tsql ranking-functions

请帮我生成以下查询,我已经挣扎了一段时间了。让我们说我有一个简单的表格,其中包含月份数和信息,表明这个月是否有任何失败事件

在脚本下面生成样本数据:

WITH DATA(Month, Success) AS
(
    SELECT  1, 0 UNION ALL
    SELECT  2, 0 UNION ALL
    SELECT  3, 0 UNION ALL
    SELECT  4, 1 UNION ALL
    SELECT  5, 1 UNION ALL
    SELECT  6, 0 UNION ALL
    SELECT  7, 0 UNION ALL
    SELECT  8, 1 UNION ALL
    SELECT  9, 0 UNION ALL
    SELECT 10, 1 UNION ALL
    SELECT 11, 0 UNION ALL
    SELECT 12, 1 UNION ALL
    SELECT 13, 0 UNION ALL
    SELECT 14, 1 UNION ALL
    SELECT 15, 0 UNION ALL
    SELECT 16, 1 UNION ALL
    SELECT 17, 0 UNION ALL
    SELECT 18, 0
)

鉴于“重复失败”的定义:

如果在任何6个月内至少4个月内发生事件失败,那么这个失败的最后一个月是“重复失败”我的查询应该返回以下输出

Month   Success RepeatedFailure
1       0   
2       0   
3       0   
4       1   
5       1   
6       0       R1
7       0       R2
8       1   
9       0   
10      1   
11      0       R3
12      1   
13      0   
14      1   
15      0   
16      1   
17      0
18      0       R1

其中:

  • R1 -1在第6个月重复失败(过去6个月失败4次)。
  • R2在第7个月重复失败(过去6个月失败4次)。
  • R3第3次重复失败,第11个月(过去6个月失败4次)。

R1 - 在第18个月第1次重复失败,因为重复失败应该在最近6个报告期内第一次发生新的重复失败时从头开始再次编号

重复失败是连续计算的,因为根据其编号,我必须应用适当的乘数:

  • 第一次重复失败 - X2
  • 第二次重复失败 - X4
  • 第3次,多次重复失败-X5。

2 个答案:

答案 0 :(得分:2)

我确信这可以改进,但它确实有效。我们基本上做两次通过 - 第一次建立重复失败,第二次建立重复失败的。请注意,Intermediate2绝对可以取消,我只是为了清晰起见而将其分开。所有代码都是一个语句,我的解释是交错的:

;WITH DATA(Month, Success) AS
-- assuming your data  as defined (with my edit)
,Intermediate AS 
(
SELECT
    Month,
    Success,
    -- next column for illustration only
    (SELECT SUM(Success) 
     FROM DATA hist 
     WHERE curr.Month - hist.Month BETWEEN 0 AND 5) 
        AS SuccessesInLastSixMonths,
    -- next column for illustration only
    6 - (SELECT SUM(Success) 
     FROM DATA hist 
     WHERE curr.Month - hist.Month BETWEEN 0 AND 5) 
        AS FailuresInLastSixMonths,
    CASE WHEN 
            (6 - (SELECT SUM(Success) 
                    FROM DATA hist 
                    WHERE curr.Month - hist.Month BETWEEN 0 AND 5)) 
            >= 4 
            THEN 1
            ELSE 0 
    END AS IsRepeatedFailure
FROM DATA curr 
-- No real data until month 6
WHERE curr.Month > 5
)

此时,我们已经确定,每个月,它是否是一次重复失败,通过计算六个月内的失败直至并包括它。

,Intermediate2 AS
(
SELECT 
    Month,
    Success,
    IsRepeatedFailure,
    (SELECT SUM(IsRepeatedFailure) 
        FROM Intermediate hist 
        WHERE curr.Month - hist.Month BETWEEN 0 AND 5) 
        AS RepeatedFailuresInLastSixMonths
FROM Intermediate curr
)

现在我们已经计算了到现在为止的六个月中重复失败的次数

SELECT
    Month,
    Success,
    CASE IsRepeatedFailure 
        WHEN 1 THEN 'R' + CONVERT(varchar, RepeatedFailuresInLastSixMonths) 
        ELSE '' END
    AS RepeatedFailureText
FROM Intermediate2

所以我们可以说,如果这个月是一次又一次的失败,那么重复失败的基数是什么。

结果:

Month       Success     RepeatedFailureText
----------- ----------- -------------------------------
6           0           R1
7           0           R2
8           1           
9           0           
10          1           
11          0           R3
12          1           
13          0           
14          1           
15          0           
16          1           
17          0           
18          0           R1

(13 row(s) affected)

性能考虑取决于您实际拥有的数据量。

答案 1 :(得分:2)

;WITH DATA(Month, Success) AS
(
    SELECT  1, 0 UNION ALL
    SELECT  2, 0 UNION ALL
    SELECT  3, 0 UNION ALL
    SELECT  4, 1 UNION ALL
    SELECT  5, 1 UNION ALL
    SELECT  6, 0 UNION ALL
    SELECT  7, 0 UNION ALL
    SELECT  8, 1 UNION ALL
    SELECT  9, 0 UNION ALL
    SELECT 10, 1 UNION ALL
    SELECT 11, 0 UNION ALL
    SELECT 12, 1 UNION ALL
    SELECT 13, 0 UNION ALL
    SELECT 14, 1 UNION ALL
    SELECT 15, 0 UNION ALL
    SELECT 16, 1 UNION ALL
    SELECT 17, 0 UNION ALL
    SELECT 18, 0
)

SELECT DATA.Month,DATA.Success,Isnull(convert(Varchar(10),b.result),'') +         
Isnull(CONVERT(varchar(10),b.num),'') RepeatedFailure
FROM (
SELECT *, ROW_NUMBER() over (order by Month) num FROM 
( Select * ,(case  when (select sum(Success) 
from DATA where MONTH>(o.MONTH-6) and MONTH<=(o.MONTH)  ) <= 2 
and o.MONTH>=6 then 'R' else  '' end) result
from DATA o
) a where result='R'
) b 
right join DATA on DATA.Month = b.Month
order by DATA.Month