如何优化此mysql查询以查找最大同时调用?

时间:2014-05-14 16:41:22

标签: mysql

我试图计算最大同时通话次数。我的相信准确的查询在给定〜250,000行时花费的时间太长。 cdrs表看起来像这样:

+---------------+-----------------------+------+-----+---------+----------------+
| Field         | Type                  | Null | Key | Default | Extra          |
+---------------+-----------------------+------+-----+---------+----------------+
| id            | bigint(20) unsigned   | NO   | PRI | NULL    | auto_increment |
| CallType      | varchar(32)           | NO   |     | NULL    |                |
| StartTime     | datetime              | NO   | MUL | NULL    |                |
| StopTime      | datetime              | NO   |     | NULL    |                |
| CallDuration  | float(10,5)           | NO   |     | NULL    |                |
| BillDuration  | mediumint(8) unsigned | NO   |     | NULL    |                |
| CallMinimum   | tinyint(3) unsigned   | NO   |     | NULL    |                |
| CallIncrement | tinyint(3) unsigned   | NO   |     | NULL    |                |
| BasePrice     | float(12,9)           | NO   |     | NULL    |                |
| CallPrice     | float(12,9)           | NO   |     | NULL    |                |
| TransactionId | varchar(20)           | NO   |     | NULL    |                |
| CustomerIP    | varchar(15)           | NO   |     | NULL    |                |
| ANI           | varchar(20)           | NO   |     | NULL    |                |
| ANIState      | varchar(10)           | NO   |     | NULL    |                |
| DNIS          | varchar(20)           | NO   |     | NULL    |                |
| LRN           | varchar(20)           | NO   |     | NULL    |                |
| DNISState     | varchar(10)           | NO   |     | NULL    |                |
| DNISLATA      | varchar(10)           | NO   |     | NULL    |                |
| DNISOCN       | varchar(10)           | NO   |     | NULL    |                |
| OrigTier      | varchar(10)           | NO   |     | NULL    |                |
| TermRateDeck  | varchar(20)           | NO   |     | NULL    |                |
+---------------+-----------------------+------+-----+---------+----------------+

我有以下索引:

+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name        | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| cdrs  |          0 | PRIMARY         |            1 | id          | A         |      269622 |     NULL | NULL   |      | BTREE      |         |               |
| cdrs  |          1 | id              |            1 | id          | A         |      269622 |     NULL | NULL   |      | BTREE      |         |               |
| cdrs  |          1 | call_time_index |            1 | StartTime   | A         |      269622 |     NULL | NULL   |      | BTREE      |         |               |
| cdrs  |          1 | call_time_index |            2 | StopTime    | A         |      269622 |     NULL | NULL   |      | BTREE      |         |               |
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

我正在运行的查询是:

SELECT MAX(cnt) AS max_channels FROM
  (SELECT cl1.StartTime, COUNT(*) AS cnt
    FROM cdrs cl1
    INNER JOIN cdrs cl2
    ON cl1.StartTime
    BETWEEN cl2.StartTime  AND cl2.StopTime
    GROUP BY cl1.id)
  AS counts;

似乎我可能每天都要对这些数据进行分块,并将结果存储在一个单独的表中,如simultaneous_calls

2 个答案:

答案 0 :(得分:2)

我确定您不仅要知道最大同时通话次数,还要知道发生时的

我会创建一个包含每个分钟的时间戳的表

CREATE TABLE times (ts DATETIME UNSIGNED AUTO_INCREMENT PRIMARY KEY);
INSERT INTO times (ts) VALUES ('2014-05-14 00:00:00');
. . . until 1440 rows, one for each minute .  . .

然后将其加入电话会议。

SELECT ts, COUNT(*) AS count FROM times 
JOIN cdrs ON times.ts BETWEEN cdrs.starttime AND cdrs.stoptime 
GROUP BY ts ORDER BY count DESC LIMIT 1;

这是我的测试结果(在Macbook Pro上运行的Linux VM上的MySQL 5.6.17):

+---------------------+----------+
| ts                  | count(*) |
+---------------------+----------+
| 2014-05-14 10:59:00 |     1001 |
+---------------------+----------+
1 row in set (1 min 3.90 sec)

这实现了几个目标:

  • 将检查的行数减少两个数量级。
  • 将执行时间从3小时减少到约1分钟。
  • 还会在找到最高计数时返回实际时间戳。

这是我的查询的EXPLAIN:

explain select ts, count(*) from times join cdrs on times.ts between cdrs.starttime and cdrs.stoptime group by ts order by count(*) desc limit 1;
+----+-------------+-------+-------+---------------+---------+---------+------+--------+------------------------------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows   | Extra                                          |
+----+-------------+-------+-------+---------------+---------+---------+------+--------+------------------------------------------------+
|  1 | SIMPLE      | times | index | PRIMARY       | PRIMARY | 5       | NULL |   1440 | Using index; Using temporary; Using filesort   |
|  1 | SIMPLE      | cdrs  | ALL   | starttime     | NULL    | NULL    | NULL | 260727 | Range checked for each record (index map: 0x4) |
+----+-------------+-------+-------+---------------+---------+---------+------+--------+------------------------------------------------+

注意列中的数字,并与原始查询的EXPLAIN进行比较。您可以通过将这些行相乘来估计检查的总行数(但如果您的查询不是SIMPLE,则会变得更复杂。)

答案 1 :(得分:1)

内联视图并非绝对必要。 (你有很多时间在内联视图的查询上运行EXPLAIN,EXPLAIN将实现内联视图(即运行内联视图查询并填充派生表),然后给出一个EXPLAIN外部查询。

请注意,此查询将返回等效结果:

SELECT COUNT(*) AS max_channels
  FROM cdrs cl1
  JOIN cdrs cl2
    ON cl1.StartTime BETWEEN cl2.StartTime  AND cl2.StopTime
 GROUP BY cl1.id
 ORDER BY max_channels DESC
 LIMIT 1

虽然它仍然需要做所有的工作,并且可能没有更好的表现; EXPLAIN应该运行得更快。 (我们希望在Extra列中看到“Using temporary; Using filesort”。)


结果集中的行数将是表中的行数(~250,000行),并且需要对这些行进行排序,因此需要一段时间。更大的问题(我的直觉告诉我)是加入操作。

我想知道如果在谓词中交换cl1和cl2,EXPLAIN(或性能)是否会有所不同,即

ON cl2.StartTime BETWEEN cl1.StartTime AND cl1.StopTime

我在想,只是因为我想尝试一个相关的子查询。那是〜250,000次执行,而且不太可能会更快......

SELECT ( SELECT COUNT(*) 
           FROM cdrs cl2
          WHERE cl2.StartTime BETWEEN cl1.StartTime AND cl1.StopTime
       ) AS max_channels
     , cl1.StartTime
  FROM cdrs cl1
 ORDER BY max_channels DESC
 LIMIT 11

你可以运行一个EXPLAIN,我们仍然会看到“使用临时;使用filesort”,它还会显示“依赖子查询”......


显然,在cl1表上添加谓词以减少要返回的行数(例如,仅检查过去15天);这应该可以加快速度,但它无法为您提供所需的答案。

WHERE cl1.StartTime > NOW() - INTERVAL 15 DAY

(我在这里的任何想法都不是对你的问题的肯定答案,或者对性能问题的解决方案;它们只是在思考。)