Question

我遇到了MySQL问题，我似乎无法解决。为了能够快速执行GROUP BY查询以进行报告，我已经将几个表非规范化为以下表（该表由其他表上的触发器维护，我已经对此保持平静）：

DROP TABLE IF EXISTS stats;
CREATE TABLE stats (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `datetime` datetime NOT NULL,
  `datetime_hour` datetime NOT NULL,
  `datetime_day` datetime NOT NULL,
  `step_id` int(11) NOT NULL,
  `check_id` int(11) NOT NULL,
  `probe_id` int(11) NOT NULL,

  `execution_step_id` int(11) NOT NULL,

  `value_of_interest` int(11) DEFAULT NULL,
  `internal` tinyint(1) NOT NULL DEFAULT '0',

  PRIMARY KEY (`id`),
  UNIQUE KEY `index_stats_on_execution_step_id` (`execution_step_id`),

  CONSTRAINT `stats_step_id_fk` FOREIGN KEY (`step_id`) REFERENCES `steps` (`id`) ON DELETE CASCADE,
  CONSTRAINT `stats_check_id_fk` FOREIGN KEY (`check_id`) REFERENCES `checks` (`id`) ON DELETE CASCADE,
  CONSTRAINT `stats_probe_id_fk` FOREIGN KEY (`probe_id`) REFERENCES `probes` (`id`) ON DELETE CASCADE,
  CONSTRAINT `stats_execution_step_id_fk` FOREIGN KEY (`execution_step_id`) REFERENCES `execution_steps` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

无论我放在桌面上的索引是什么，下面的查询仍将最终得到Using where; Using temporary; Using filesort或其任意组合的解释（这些都会导致查询以不可接受的性能运行）：

SELECT
  datetime_day,
  step_id,
  CAST(AVG(value_of_interest) AS UNSIGNED) AS value_of_interest
FROM
  stats
WHERE
  check_id = 78
  AND probe_id = 1
  AND (datetime_day >= '2014-03-28 15:58:00' AND datetime_day <= '2014-10-28 15:58:00')
  AND (internal = 0)
GROUP BY
  datetime_day, step_id
ORDER BY
  datetime_day, step_id

我需要在表定义中设置哪些索引和/或如何修改查询以便使用合理的查询执行计划执行此操作？

环境规格：

Fedora release 19 (Schrödinger’s Cat)
mysql Ver 15.1 Distrib 5.5.34-MariaDB, for Linux (x86_64) using readline 5.1
6G RAM，30M行

非常感谢你的帮助！

PS：首次发布海报，对于任何违反最佳做法的行为感到抱歉。我很高兴学习......

修改

其中一个答案表明

ALTER TABLE `stats` ADD INDEX newindex (check_id, probe_id, internal, datetime_day, step_id);

稍微改善了一下情况。我之前已经尝试过这个索引并得到以下结果：

+------+-------------+---------------------------+-------+---------------+----------+---------+------+--------+------------------------------------+
| id   | select_type | table                     | type  | possible_keys | key      | key_len | ref  | rows   | Extra                              |
+------+-------------+---------------------------+-------+---------------+----------+---------+------+--------+------------------------------------+
|    1 | SIMPLE      | stats                     | range | newindex      | newindex | 17      | NULL | 605682 | Using index condition; Using where |
+------+-------------+---------------------------+-------+---------------+----------+---------+------+--------+------------------------------------+

但是，不应该有办法通过“松散/紧密索引扫描”来执行查询。正如link中提到的那样？我似乎无法让它发挥作用，但我不确定我是否正确理解了上述文章。

Answer 1

您有600K行要扫描，因此无法立即运行。

为什么需要CAST(AVG(value_of_interest) AS UNSIGNED)？可以通过在插入之前清理数据来避免吗？

这个索引会使它＆＃34;使用索引＆＃34;，这会使它更快。但是，如果这不是您唯一的查询，那么添加它似乎很愚蠢。

INDEX newindex (check_id, probe_id, internal, datetime_day, step_id, value_of_interest)

奇怪的开始/结束时间有什么原因吗？（十五时58分00秒）

真实的＆＃39;总结数据仓库表的解决方案是构建和维护＆＃34;汇总表＆＃34;。对于有问题的查询，这样的表将具有check_id，probe_id，internal，step_id，datetime_hour，SUM（value_of_interest），COUNT（*）。前5个是PRIMARY KEY。您可以每小时向表中添加另一行。报告（小时，天，周，月）将通过执行SUM（总和）/ SUM（计数）来获得AVG。

my Summary Table blog中的更多讨论。

Answer 2

order by子句因在查询中导致性能降低而臭名昭着。然而，话虽如此，拥有更好的索引以更好地匹配您的标准和分组条款将有所帮助。

我建议将复合索引（在多个字段上）作为

（check_id，probe_id，internal，datetime_day，step_id）

这样，您的WHERE子句已经过优化，然后您的最后两列都匹配group / order子句以优化它。

优化＆＃39; GROUP BY＆＃39; -Query，消除＆＃39;使用where;使用临时;使用filesort＆＃39;

2 个答案: