MySQL子查询中的聚合似乎比它应该更慢

时间:2015-03-12 17:57:04

标签: mysql sql performance aggregation

我一直致力于重构一堆困扰我们系统的应用程序逻辑和SQL。

我设法摆脱了大部分应用程序层逻辑,并在SQL查询中完成了所有操作,但似乎有些滞后,我不确定原因。

SELECT
  st.id                    ownerId,
  st.display_name          ownerLabel,
  COALESCE((score.mean_score / score.num_responses) * 100, 0)         meanScore,
  COALESCE((score.top_box_percentage / score.num_responses) * 100, 0) topBoxPercentage,
  COALESCE(score.num_responses, 0)      sampleSize
FROM question q
  CROSS JOIN store st
  LEFT JOIN
  (SELECT
     COUNT(ch.id)                  num_responses,
     SUM(ans.mean_score_weight)    mean_score,
     SUM(ans.is_top_box)           top_box_percentage,
     q.id                          question_id,
     q.category_id                 category_id,
     st.id                         store_id
   FROM choice ch
     INNER JOIN response r ON r.id = ch.response_id
     INNER JOIN answer ans ON ans.id = ch.answer_id
     INNER JOIN store st ON st.id = r.store_id
     INNER JOIN question q ON q.id = ans.question_id
   WHERE r.survey_id = 96  AND r.created_at BETWEEN '2015-01-01' AND '2015-03-01' AND q.is_scorable AND ans.is_scorable
   GROUP BY q.id, st.id
    ) score ON score.question_id = q.id AND score.store_id = st.id
WHERE q.survey_id = 96 AND q.is_scorable 
GROUP BY q.id, st.id;

此查询的预期执行计划如下:

+----+-------------+------------+--------+------------------------------------------------+------------------------+---------+-----------------------+------+----------+----------------------------------------------+
| id | select_type | table      | type   | possible_keys                                  | key                    | key_len | ref                   | rows | filtered | Extra                                        |
+----+-------------+------------+--------+------------------------------------------------+------------------------+---------+-----------------------+------+----------+----------------------------------------------+
|  1 | PRIMARY     | q          | ref    | question_FI_6                                  | question_FI_6          | 4       | const                 |   77 |   100.00 | Using where; Using temporary; Using filesort |
|  1 | PRIMARY     | st         | ALL    | NULL                                           | NULL                   | NULL    | NULL                  |  339 |   100.00 | Using join buffer                            |
|  1 | PRIMARY     | <derived2> | ALL    | NULL                                           | NULL                   | NULL    | NULL                  | 3505 |   100.00 |                                              |
|  2 | DERIVED     | r          | ref    | PRIMARY,response_FI_3,response_FI_4            | response_FI_3          | 5       |                       | 5179 |   100.00 | Using where; Using temporary; Using filesort |
|  2 | DERIVED     | st         | eq_ref | PRIMARY                                        | PRIMARY                | 4       | titan.r.store_id      |    1 |   100.00 | Using index                                  |
|  2 | DERIVED     | ch         | ref    | unique_response_answer,choice_FI_1,choice_FI_3 | unique_response_answer | 4       | titan.r.id            |   35 |   100.00 | Using index                                  |
|  2 | DERIVED     | ans        | eq_ref | PRIMARY,answer_FI_1                            | PRIMARY                | 4       | titan.ch.answer_id    |    1 |   100.00 | Using where                                  |
|  2 | DERIVED     | q          | eq_ref | PRIMARY                                        | PRIMARY                | 4       | titan.ans.question_id |    1 |   100.00 | Using where                                  |
+----+-------------+------------+--------+------------------------------------------------+------------------------+---------+-----------------------+------+----------+----------------------------------------------+

在我看来,查询速度慢的原因是response上的filesort +临时排序表。我对MySQL的经验相当有限,所以我不确定如何解决这个问题。任何帮助将不胜感激。

response索引:

+----------+------------+--------------------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table    | Non_unique | Key_name                       | Seq_in_index | Column_name    | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+--------------------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| response |          0 | PRIMARY                        |            1 | id             | A         |       53911 |     NULL | NULL   |      | BTREE      |         |               |
| response |          1 | response_FI_3                  |            1 | survey_id      | A         |         104 |     NULL | NULL   | YES  | BTREE      |         |               |
| response |          1 | response_FI_4                  |            1 | store_id       | A         |         523 |     NULL | NULL   | YES  | BTREE      |         |               |
| response |          1 | fk_response_competition_id_idx |            1 | competition_id | A         |          13 |     NULL | NULL   | YES  | BTREE      |         |               |
+----------+------------+--------------------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

虽然我写这篇文章是因为我可以利用所使用的密钥(response_FI_3 = r.survey_id)来排除文件分区,这会产生更好的结果,但我还是认为可以采取更多措施来改进这一查询。

感谢所提供的任何输入。

2 个答案:

答案 0 :(得分:1)

如果您看到Using filesort表示您的查询正在使用where子句中没有索引的列。从我所看到的,created_at列可能是罪魁祸首。您在WHERE中使用该列,但您没有索引。你在question表中遇到了类似的问题但没有该表上的索引列表我无法告诉你它在哪里。

答案 1 :(得分:0)

LEFT可能导致问题。你能摆脱它吗?请注意,优化程序(在EXPLAIN中查看)无法以子查询开头。

survey_id = 96 AND r.created_atr需要&#39;复合索引&#39; INDEX(survey_id, created_at)。请做SHOW CREATE TABLE。

你真的没有CROSS JOIN,所以让我建议改写:

SELECT ...
    FROM ( SELECT ... ) AS score
    JOIN store AS st  ON score.store_id = st.id
    JOIN question q   ON score.question_id = q.id
    WHERE q.survey_id = 96 AND q.is_scorable
    GROUP BY q.id, st.id;

BETWEEN '2015-01-01' AND '2015-03-01' - 如果该列是&#34; DATE&#34;,则该范围错误地(?)包括在3月1日。如果它是ID&#34; DATETYPE&#34;,则包含和额外午夜。

您是否自己尝试过子查询?没关系,还是我们应该调查一下?请为每个表提供SHOW CREATE TABLE。

相关问题