查询加入数百万条记录的速度很慢,请帮我优化

时间:2012-12-03 19:45:36

标签: mysql

这是我的疑问:

SELECT SQL_BUFFER_RESULT SQL_BIG_RESULT users.id, users.email, 
        COUNT(av.user_id) AS article_views_count,
        COUNT(af.id) AS article_favorites_count,
        COUNT(lc.user_id) AS link_clicks_count,
        COUNT(ai.user_id) AS ad_impressions_count,
        COUNT(ac.user_id) AS ad_clicks_count
          FROM users
            LEFT JOIN article_views AS av     ON (av.user_id = users.id AND av.created_at >= '2012-11-28 00:00:00' AND av.created_at <= '2012-11-30 23:59:59')
            LEFT JOIN article_favorites AS af ON (af.user_id = users.id AND af.created_at >= '2012-11-28 00:00:00' AND af.created_at <= '2012-11-30 23:59:59')
            LEFT JOIN link_clicks AS lc       ON (lc.user_id = users.id AND lc.created_at >= '2012-11-28 00:00:00' AND lc.created_at <= '2012-11-30 23:59:59')
            LEFT JOIN ad_impressions AS ai    ON (ai.user_id = users.id AND ai.created_at >= '2012-11-28 00:00:00' AND ai.created_at <= '2012-11-30 23:59:59')
            LEFT JOIN ad_clicks AS ac         ON (ac.user_id = users.id AND ac.created_at >= '2012-11-28 00:00:00' AND ac.created_at <= '2012-11-30 23:59:59')
          GROUP BY users.id
          HAVING (article_views_count + article_favorites_count + link_clicks_count + ad_impressions_count + ad_clicks_count) > 0

为您提供上下文的一些统计信息:

  1. 用户:1,474,348行
  2. article_views:32,603,637行
  3. article_favorites:10,199行
  4. link_clicks:4,258,901行
  5. ad_impressions:66,758,573行
  6. ad_clicks:324,125行
  7. 每个加入的表都有一个user_id和created_at的复合索引(按此顺序)。

    我们正在运行Mysql 5,每个表都是MyISAM引擎。

    以下是查询的解释:https://gist.github.com/4197482

    目标是仅返回在该时间段内有任何活动(查看,收藏,点击,展示,广告点击)的用户。

    有什么想法来优化这个坏孩子?

2 个答案:

答案 0 :(得分:1)

您的查询似乎是一个分析查询,可以根据大量数据进行一些分析(因为它包含聚合函数和GROUP BY子句)。

为了提高此类查询的性能,您可以创建一个实体化视图结果,然后使用以下方式加入JOIN:

CREATE TABLE my_view AS SELECT ... FROM ... JOIN ...

通过这样做,下一个查询将更加有效,因为MySQL只需要计算聚合

然后,您只需实施一个策略来刷新表格(例如通过时间戳)

另一种解决方案是在DBMS中导入您的数据,这种数据库在这种查询中非常有效:面向列的数据库。例如,InfiniDB是一个基于MySQL的开源dbms,具有针对分析查询优化的存储引擎。

答案 1 :(得分:0)

尝试将查询与每个表格分割为INNER JOIN,并将其与UNION合并。 像

SELECT users.id, users.email, COUNT(av.user_id) AS article_views_count
FROM users
JOIN article_views AS av ON (av.user_id = users.id AND av.created_at >= '2012-11-28 00:00:00' AND av.created_at <= '2012-11-30 23:59:59')
GROUP BY users.id, users.email

UNION

....