加入和子查询的慢SQL查询

时间:2016-01-15 09:21:49

标签: sql sql-server tsql

我正在编写一个网站用户行为分析工具作为业余爱好项目。它会跟踪用户链接点击次数以及他们最终来自这些链接的页面。它区分用户会话与点击内的唯一UIN标识符。

我正在写一个里程碑并点击数据报告,但查询速度非常慢。我还没有找到一种方法来提高性能,以便它运行得相当快(5秒以下的执行时间),所以如果有人能帮助我,我会非常感激。

下面的查询部分非常快。运行时间接近0.05秒:

declare @startDate date = '2013-01-01'
declare @endDate date = '2016-01-14'
declare @user int = 4
declare @country int = 224

select
    p.PageId,
    p.Name,

    -- count of successful page landings
    SUM(CASE WHEN m.MileStoneTypeId = 1 AND m.UserId = @user
        THEN 1
        ELSE 0
        END) AS [Successful landings],

    -- count of failed page landings
    SUM(CASE WHEN m.MileStoneTypeId = 2 AND m.UserId = @user
        THEN 1
        ELSE 0
        END) AS [Failed landings],

    -- count of unfinished page landings
    SUM(CASE WHEN m.MileStoneTypeId = 3 AND m.UserId = @user
        THEN 1
        ELSE 0
        END) AS [Unfinished landings],

from
    Page as p
inner join
    Milestone as m
        ON p.PageId = m.CampaignId 
        AND m.UserId = @user
        AND m.Created >= @startDate
        AND m.Created < @endDate
where
    p.PageCountryId = @country
group by
    p.PageId,
    p.PageName

这是完整的查询,执行非常缓慢。运行时间在45-60秒之间。不同之处在于我正在尝试收集针对特定页面里程碑生成的点击次数:

declare @startDate date = '2013-01-01'
declare @endDate date = '2016-01-14'
declare @user int = 4
declare @country int = 224

select
    p.PageId,
    p.Name,

    -- Unique clicks
    (SELECT 
        COUNT(DISTINCT click.UIN)
     FROM 
        Click as click 
     WHERE 
        click.PageId = p.PageId AND
        click.Created >= @startDate AND
        click.Created < @endDate AND
        click.UserId = @user
    ) as [Unique clicks],

    -- Total clicks
    (SELECT 
        COUNT(click.UIN)
     FROM 
        Click as click 
     WHERE 
        click.PageId = p.PageId AND
        click.Created >= @startDate AND
        click.Created < @endDate AND
        click.User = @user
     ) as [Total clicks],

    -- count of successful page landings
    SUM(CASE WHEN m.MileStoneTypeId = 1 AND m.UserId = @user
        THEN 1
        ELSE 0
        END) AS [Successful landings],

    -- count of failed page landings
    SUM(CASE WHEN m.MileStoneTypeId = 2 AND m.UserId = @user
        THEN 1
        ELSE 0
        END) AS [Failed landings],

    -- count of unfinished page landings
    SUM(CASE WHEN m.MileStoneTypeId = 3 AND m.UserId = @user
        THEN 1
        ELSE 0
        END) AS [Unfinished landings],

from
    Page as p
inner join
    Milestone as m
        ON p.PageId = m.CampaignId 
        AND m.UserId = @user
        AND m.Created >= @startDate
        AND m.Created < @endDate
where
    p.PageCountryId = @country
group by
    p.PageId,
    p.PageName

执行单击计数查询作为独立查询的速度相当快。每个(DISTINCT和非不同)查询的运行时间接近1秒。

这是一个独立的查询“快”:

-- Unique clicks
(SELECT 
    COUNT(DISTINCT click.UIN)
 FROM 
    Click as click 
 WHERE 
    click.PageId = p.PageId AND
    click.Created >= @startDate AND
    click.Created < @endDate AND
    click.UserId = @user
) as [Unique clicks],

这也是一个独立查询的“快速”:

-- Total clicks
(SELECT 
    COUNT(click.UIN)
 FROM 
    Click as click 
 WHERE 
    click.PageId = p.PageId AND
    click.Created >= @startDate AND
    click.Created < @endDate AND
    click.User = @user
 ) as [Total clicks],

当我尝试将所有内容组合在一个大型查询中时,会出现问题。由于某些原因,独立查询运行速度非常快,但组合查询执行时间非常慢。

带有点击的表格中有一列“UIN”,为每个用户到达网站时分配。当他们单击链接时,会在具有用户ID和UIN的Click -table中插入一行。 UIN区分用户会话,因此具有UIN abcdef123的UserId 4可以具有多个相同的行。此UIN用于计算用户会话中的唯一点击次数和总点击次数。

Page表格大约有1000行。 Milestone表有大约200 000行,Click表有大约10 000 000行。

知道如何通过包含唯一和总点击次数来提高完整查询的效果吗?

这是表格内容和目标输出

来自Page table的数据

+--------+-----------------------+-----------+
| PageId |         Name          | CountryId |
+--------+-----------------------+-----------+
|   3095 | Registration          |        77 |
|   3110 | Customer registration |        77 |
|   5174 | View user details     |        77 |
+--------+-----------------------+-----------+

用户表格中的数据

+--------+------+
| UserId | Name |
+--------+------+
|      1 | Dan  |
|      2 | Mike |
|      3 | John |
+--------+------+

来自点击次数表的数据

+---------+--------------------------------------+--------+-------------------------+--------+
| ClickId |                 Uin                  | UserId |         Created         | PageId |
+---------+--------------------------------------+--------+-------------------------+--------+
| 1296600 | B420D0F4-20BE-49BE-AAC9-47DD858B68DD |   4301 | 2016-01-14 12:08:03:723 |   8603 |
| 1296599 | DA5877BA-8FF5-4671-8DF9-CCCBF1555BA1 |   4418 | 2016-01-14 12:07:46:930 |   2009 |
| 1296598 | C6790CB9-6DA6-4A8B-84AA-7D2D3A4B5787 |   4276 | 2016-01-14 12:07:43:563 |   8678 |
+---------+--------------------------------------+--------+-------------------------+--------+

来自里程碑表的数据

+-------------+-----------------+------------+--------+-------------------------+--------+
| MilestoneId | MilestoneTypeId | CampaignId | UserId |         Created         | PageId |
+-------------+-----------------+------------+--------+-------------------------+--------+
|           1 |               1 |       1001 |      4 | 2014-02-06 13:18:04:487 |     52 |
|           2 |               1 |       1001 |      4 | 2014-02-06 13:41:01:257 |   9642 |
|           3 |               1 |       1001 |      4 | 2014-02-07 09:52:29:373 |   2393 |
+-------------+-----------------+------------+--------+-------------------------+--------+

以下是我想要实现的输出数据:

+---------+-----------------------+---------------+--------------+----------------------+-----------------+---------------------+
| Page Id |       Page Name       | Unique clicks | Total clicks | Successfull Landings | Failed Landings | Unfinished Landings |
+---------+-----------------------+---------------+--------------+----------------------+-----------------+---------------------+
|    3095 | Registration          |           102 |          116 |                    2 |               0 |                   0 |
|    3110 | Customer registration |             3 |            6 |                    1 |               1 |                   0 |
|    5174 | View user details     |            13 |           13 |                    0 |               1 |                   0 |
|    5178 | Edit content page     |            11 |           11 |                    1 |               0 |                   0 |
|    6217 | Add new vehicle       |            18 |           18 |                    2 |               0 |                   0 |
+---------+-----------------------+---------------+--------------+----------------------+-----------------+---------------------+

4 个答案:

答案 0 :(得分:1)

这很慢,因为你制作了&#34;点击&#34;为查询中的每一行选择两次。

尝试像使用里程碑表一样加入它并添加group by user子句。

UPD。 拜托,您可以在下一个例子中提供表格结构和数据吗?

declare @Page as table ( 
  PageId int, 
  etc
)
insert into @page (PageId, etc) values (3095, etc)

答案 1 :(得分:1)

点击流数据可能很难处理,通常是由于生成的记录量。但在这种情况下,我认为问题是由于在SELECT子句中使用correlated subqueries。如果你不熟悉;相关子查询是引用外部查询的任何子查询。这些损害性能是因为SQL引擎被强制为返回的每一行评估一次查询。这破坏了基于set的SQL特性。

我对您的示例数据进行了一些更改。提供后,我无法返回任何记录来验证我的结果集。我已在连接字段中更新了值以解决此问题:

示例数据

DECLARE @Page TABLE
    (
        PageId        INT,
        Name        VARCHAR(50),
        CountryId    INT
    )
;

DECLARE @User TABLE
    (
        UserId        INT,
        Name        VARCHAR(50)
    )
;

DECLARE @Clicks TABLE
    (
        ClickId        INT,
        Uin            UNIQUEIDENTIFIER,
        UserId        INT,
        Created        DATETIME,
        PageId        INT
    )
;

DECLARE @Milestone TABLE
    (
        MiestoneId        INT,
        MilestoneTypeId    INT,
        CampaignId        INT,
        UserId            INT,
        Created            DATETIME,
        PageId            INT
    )
;




INSERT INTO @Page 
    (
        PageId,
        Name,
        CountryId
    )
VALUES
    (3095, 'Registration', 77),
    (3110, 'Customer registration', 77),
    (5174, 'View user details', 77)
;

INSERT INTO @User 
    (
        UserId,
        Name
    )
VALUES
    (4301, 'Dan'),
    (2, 'Mike'),
    (3, 'John')
;

INSERT INTO @Clicks 
    (
        ClickId,
        Uin,
        UserId,
        Created,
        PageId
    )
VALUES
    (1296600, 'B420D0F4-20BE-49BE-AAC9-47DD858B68DD', 4301, '2016-01-14 12:08:03:723', 3095),
    (1296600, 'B420D0F4-20BE-49BE-AAC9-47DD858B68DD', 4301, '2016-01-14 12:08:03:723', 3095),
    (1296599, 'DA5877BA-8FF5-4671-8DF9-CCCBF1555BA1', 4301, '2016-01-14 12:07:46:930', 3110),
    (1296598, 'C6790CB9-6DA6-4A8B-84AA-7D2D3A4B5787', 4301, '2016-01-14 12:07:43:563', 5174)
;

INSERT INTO @Milestone 
    (
        MiestoneId,
        MilestoneTypeId,
        CampaignId,
        UserId,
        Created,
        PageId
    )
VALUES
    (1, 1, 1001, 4301, '2014-01-06 13:18:04:487', 3095),
    (2, 1, 1001, 4301, '2014-01-06 13:41:01:257', 3110),
    (3, 3, 1001, 4301, '2014-01-07 09:52:29:373', 5174)
;

正如您在原始查询中发现的那样,您无法直接将Milestone加入Click,因为每个表都有不同的粒度。在我的查询中,我使用CTEs返回每个表中的总计。我的查询的主体加入了结果。

示例

DECLARE @StartDate  date = '2013-01-01';
DECLARE @EndDate    date = '2016-01-15';
DECLARE @UserId     int = 4301;
DECLARE @CountryId  int = 77;


WITH Click AS
    (
        SELECT
            UserId,
            PageId,
            COUNT(DISTINCT Uin)       AS [Distinct Clicks],
            COUNT(ClickId)            AS [Total Clicks]
        FROM
            @Clicks
        WHERE
            UserId = @UserId
            AND Created BETWEEN @StartDate AND @EndDate
        GROUP BY
            UserId,
            PageId
    ),
    Milestone AS
    (
        SELECT
            UserId,
            PageId,
            SUM(CASE WHEN MileStoneTypeId = 1 THEN 1 ELSE 0 END) AS [Successful Landings],
            SUM(CASE WHEN MileStoneTypeId = 2 THEN 1 ELSE 0 END) AS [Failed Landings],
            SUM(CASE WHEN MileStoneTypeId = 3 THEN 1 ELSE 0 END) AS [Unfinished Landings]
        FROM
            @Milestone
        WHERE
            UserId = @UserId
            AND Created BETWEEN @StartDate AND @EndDate
        GROUP BY
            UserId,
            PageId
    )
SELECT
    p.PageId,
    p.Name,
    c.[Distinct Clicks],
    c.[Total Clicks],
    ms.[Successful Landings],
    ms.[Failed Landings],
    ms.[Unfinished Landings]
FROM
    @Page AS p
        INNER JOIN Click AS c            ON  c.PageId    = p.PageId
        INNER JOIN Milestone AS ms       ON  ms.PageId    = c.PageId
                                         AND ms.UserId    = c.UserId
WHERE
    p.CountryId = @CountryId
;

答案 2 :(得分:0)

不要使用count(不同),它会排序然后计数,这种排序真的会耗费你很多时间。  您可以先在表格中区分,然后计算

像这样:

select count(1) from (select distinct column from table);

如果您想查看最多的费用,可以使用以下模式

     set showplan_all on

检查查询的说明 或者您只需单击Microsoft SQL Server Management Studio中的显示估计执行计划

希望这可以帮助你:)

答案 3 :(得分:0)

您应该将“点击”转换为函数并通过查询调用这些函数。使用“点击”作为subquerys会慢慢运行,因为它会为每一行运行很多次。