SQL Server查询优化:自我内部连接太多

时间:2015-11-17 11:07:25

标签: sql sql-server sql-server-2012 query-optimization

我目前正在尝试改进SQL Server上的SQL查询。

我的工作表如下:

CAT_HISTORY

DATE        ID          CATEGORY
----------- ----------- -----------
20121201    A           1
20121201    A           1
20121201    B           1
20121201    C           2
20131201    A           2
20131201    B           4
20131201    C           3
20141201    A           3
20141201    B           2
20141201    B           2
20141201    C           1

我的目标是检索其类别的历史记录。 到目前为止,我这样做:

 SELECT   A.DATE
         ,COUNT(DISTINCT A.ID) AS NB_CLIENTS
         ,A.CATEGORY           AS STARTING_CAT
         ,B.CATOGORY           AS ENDING_CAT

FROM CAT_HISTORY A
INNER JOIN CAT_HISTORY B 
ON (
     A.ID= B.ID
 AND
 ( 
    ( 
          A.DATE = 20121201
      AND B.DATE = 20131201 
    )
  OR  
    (
          A.DATE  = 20131201
      AND B.DATE  = 20141201
    )

  WHERE A.DATE>= 20121201 AND B.DATE<= 20141201
  GROUP BY A.DATE, A.CATEGORY,B.CATEGORY
  ORDER BY A.DATE, A.CATEGORY,B.CATEGORY

结果是:

DATE_KEY   STARTING_CAT ENDING_CAT     NB_CLIENTS 
-----------  -----------  -----------  -----------
20121201     1            2            1
20121201     1            4            1
20121201     2            3            1
20131201     2            3            1
20131201     4            2            1
20131201     2            3            1

但问题是我有更多日期,我为每个日期添加一个OR(大约15个不同的日期),我有很多用户。这意味着查询有时需要15分钟才能获得结果。

我相信我对我的内部联盟感到残忍,并且可能有更优雅和有效的方法来获得预期的结果。

我的最终目标是让Sankey随着时间的推移看到从一个类别到另一个类别的演变,我需要在日期之间从一个类别移动到另一个类别的用户数量。

使用Gordon Linoff的答案,它运作良好,但正在计算重复

SELECT DISTINCT DATE, CATEGORY,NEXT_CATEGORY, COUNT(*) AS NB_CLIENTS
FROM (  
        SELECT DISTINCT CH.*, LEAD(CATEGORY) OVER (PARTITION BY CH.ID ORDER BY DATE) AS NEXT_CATEGORY
        FROM CAT_HISTORY CH 
        ) CH
        WHERE  NEXT_CATEGORY IS NOT NULL
        GROUP BY DATE, CATEGORY,NEXT_CATEGORY

示例: 预期

DATE_KEY     STARTING_CAT ENDING_CAT   NB_CLIENTS 
-----------  -----------  -----------  -----------
20121201     1            2            1
20121201     1            4            1
20121201     2            3            1
20131201     2            3            1
20131201     4            2            1
20131201     2            3            1

使用您的解决方案:

DATE_KEY     STARTING_CAT ENDING_CAT   NB_CLIENTS 
-----------  -----------  -----------  -----------
20121201     1            1            1
20121201     1            2            1
20121201     1            4            1
20121201     2            3            1
20131201     2            3            1
20131201     4            2            1
20131201     2            3            1
20141201     2            2            1

上次修改:

我设法找到了解决方法:

 SELECT DISTINCT DATE, CATEGORY,NEXT_CATEGORY, COUNT(*) AS NB_CLIENTS
    FROM (  
            SELECT DISTINCT CH.*, LEAD(CATEGORY) OVER (PARTITION BY CH.ID ORDER BY DATE) AS NEXT_CATEGORY
            FROM (SELECT DISTINCT * FROM CAT_HISTORY) CH 
            ) CH
            WHERE  NEXT_CATEGORY IS NOT NULL
            GROUP BY DATE, CATEGORY,NEXT_CATEGORY

2 个答案:

答案 0 :(得分:0)

如果您想查看成对更改,请使用lead()而不是固定日期。在SQL Server 2012+中,您可以执行以下操作:

select date, category, next_category, count(*)
from (select ch.*,
             lead(category) over (partition by id order by date) as next_category
      from cat_history ch
     ) ch
group by date, category, next_category;

在早期版本的SQL Server中,您可以将相似的逻辑与相关子查询或apply一起使用。

答案 1 :(得分:0)

请检查此问题,我将date field替换为datefield

declare @t table(datefield date , id varchar(10) , category int )

insert into @t values
(cast( '20121201' as date) , 'A', 1),
(cast( '20121201' as date) , 'B', 1),
(cast( '20121201' as date) , 'C', 2),
(cast( '20131201' as date) , 'A', 2),
(cast( '20131201' as date) , 'B', 4),
(cast( '20131201' as date) , 'C', 3),
(cast( '20141201' as date) , 'A', 3),
(cast( '20141201' as date) , 'B', 2),
(cast( '20141201' as date) , 'C', 1)

SELECT   A.datefield
         ,COUNT(DISTINCT A.ID) AS NB_CLIENTS
         ,A.CATEGORY           AS STARTING_CAT
         ,isnull(B.CATEGORY ,0)       AS ENDING_CAT
FROM @T A
left JOIN @T B 
ON 
    ( 
        A.ID= B.ID   AND 
        ( b.datefield =  dateadd( yy, 1 , a.datefield ) ) 
    )
 -- WHERE A.datefield>= '20121201' AND ( B.datefield<= '20141201' or B.datefield is null)
  GROUP BY A.datefield, A.CATEGORY,B.CATEGORY
  ORDER BY A.datefield, A.CATEGORY,B.CATEGORY