使用子查询提高CTE的性能

时间:2018-05-03 00:23:12

标签: tsql subquery pivot common-table-expression

我有一个具有这种结构的表:

WorkerID    Value           GroupID Sequence    Validity
1           '20%'           1       1           2018-01-01
1           '10%'           1       1           2017-06-01
1           'Yes'           1       2           2017-06-01
1           '2018-01-01'    2       1           2017-06-01
1           '17.2'          2       2           2017-06-01
2           '10%'           1       1           2017-06-01
2           'No'            1       2           2017-06-01
2           '2016-03-01'    2       1           2017-06-01
2           '15.9'          2       2           2017-06-01

创建此结构,以便客户端可以为worker创建自定义数据。例如,Group 1可以是“Salary”,Sequence是属于Group的一个值,如“加班补偿”。列ValueVARCHAR(150)字段,正确的验证和对话在应用程序的其他部分完成。

Validity列的存在主要是出于历史原因。

现在我想为不同的工作人员展示网格中的信息,其中每行应该是一个工人(显示具有最新有效性的工人):

Worker  1_1     1_2     2_1         2_2
1       20%     Yes     2018-01-01  17.2
2       10%     No      2016-03-01  15.9

为了实现这一目标,我创建了一个如下所示的CTE:

WITH CTE_worker_grid
    AS
    (
    SELECT
        worker,

        /* 1 */
        (
            SELECT top 1 w.Value
                FROM worker_values AS w
                WHERE w.GroupID = 1
                AND w.Sequence = 1
                ORDER BY w.Validity DESC
        ) AS 1_1,
        (
            SELECT top 1 w.Value
                FROM worker_values AS w
                WHERE w.GroupID = 1
                AND w.Sequence = 2
                ORDER BY w.Validity DESC
        ) AS 1_2,

        /* 2 */
        (
            SELECT top 1 w.Value
                FROM worker_values AS w
                WHERE w.GroupID = 2
                AND w.Sequence = 1
                ORDER BY w.Validity DESC
        ) AS 2_1,
        (
            SELECT top 1 w.Value
                FROM worker_values AS w
                WHERE w.GroupID = 2
                AND w.Sequence = 2
                ORDER BY w.Validity DESC
        ) AS 2_2
    )
GO

这会产生正确的结果,但它非常慢,因为它为超过18,000名工作人员创建了这个网格,每个Groups几乎有30 Sequences和最多20 Group

如何才能加速如此规模的CTE过程?应该使用CTE吗?可以更改或重新分解子查询以加快执行速度吗?

1 个答案:

答案 0 :(得分:1)

使用PIVOT!

+----------+---------+---------+------------+---------+
| WorkerId | 001_001 | 001_002 |  002_001   | 002_002 |
+----------+---------+---------+------------+---------+
|        1 | 20%     | Yes     | 2018-01-01 |    17.2 |
|        2 | 10%     | No      | 2016-03-01 |    15.9 |
+----------+---------+---------+------------+---------+

SQL小提琴:http://sqlfiddle.com/#!18/6e768/1

CREATE TABLE WorkerAttributes
    (
    WorkerID INT NOT NULL
    , [Value] VARCHAR(50) NOT NULL
    , GroupID INT NOT NULL
    , [Sequence] INT NOT NULL
    , Validity DATE NOT NULL
    )

INSERT INTO WorkerAttributes
    (WorkerID, Value, GroupID, Sequence, Validity)
VALUES
    (1, '20%', 1, 1, '2018-01-01')
    , (1, '10%', 1, 1, '2017-06-01')
    , (1, 'Yes', 1, 2, '2017-06-01')
    , (1, '2018-01-01', 2, 1, '2017-06-01')
    , (1, '17.2', 2, 2, '2017-06-01')
    , (2, '10%', 1, 1, '2017-06-01')
    , (2, 'No', 1, 2, '2017-06-01')
    , (2, '2016-03-01', 2, 1, '2017-06-01')
    , (2, '15.9', 2, 2, '2017-06-01')


;WITH CTE_WA_RANK
AS
(
SELECT
    ROW_NUMBER() OVER (PARTITION BY WorkerID, GroupID, [Sequence] ORDER BY Validity DESC) AS VersionNumber
    , WA.WorkerID
    , WA.GroupID
    , WA.[Sequence]
    , WA.[Value]
FROM
    WorkerAttributes AS WA
),
CTE_WA
AS
(
SELECT
    WA_RANK.WorkerID
    , RIGHT('000' + CAST(WA_RANK.GroupID AS VARCHAR(3)), 3)
        + '_'
        + RIGHT('000' + CAST(WA_RANK.[Sequence] AS VARCHAR(3)), 3) AS SMART_KEY
    , WA_RANK.[Value]
FROM
    CTE_WA_RANK AS WA_RANK
WHERE
    WA_RANK.VersionNumber = 1
)
SELECT
    WorkerId
    , [001_001] AS [001_001]
    , [001_002] AS [001_002]
    , [002_001] AS [002_001]
    , [002_002] AS [002_002]
FROM
(
SELECT
    CTE_WA.WorkerId
    , CTE_WA.SMART_KEY
    , CTE_WA.[Value]
FROM
    CTE_WA
) AS WA
PIVOT
(
MAX([Value])
FOR
    SMART_KEY IN 
        (
        [001_001]
        , [001_002]
        , [002_001]
        , [002_002]
        )
) AS PVT