MySQL,用于大表查询的复合索引

时间:2011-10-20 20:12:06

标签: mysql query-optimization

以下查询在user_chars(约20毫米记录)和user_data(约10毫米记录)上运行。查询运行得太慢,我想知道更好的复合索引是否可以改善这种情况。

关于什么是最佳综合指数的想法?

SELECT username, title, status  
FROM (  
    SELECT username, title, status  
    FROM user_chars w, user_data r  
    WHERE w.user_id = r.user_id  
    AND (status < '300' OR is_admin = '1')    
    AND (  
        (rating_id = 'rating1' AND rating BETWEEN 55 AND 65)  
        OR (rating_id = 'rating2' AND rating BETWEEN 50 AND 60)  
        OR (rating_id = 'rating3' AND rating BETWEEN 30 AND 40)  
        OR (rating_id = 'rating4' AND rating BETWEEN 90 AND 100)  
        ...  
    )  
    GROUP BY w.user_id  
    HAVING COUNT(*) >= 3  
) data  
WHERE username != '0'  
AND title != '0'

以下是表格:

CREATE TABLE user_data (
  user_id int(10) unsigned NOT NULL AUTO_INCREMENT,
  username decimal(17,14) DEFAULT NULL,
  title decimal(17,14) DEFAULT NULL,
  status smallint(6) unsigned NOT NULL,
  is_admin tinyint(1) NOT NULL DEFAULT '0',
      PRIMARY KEY (user_id),
  KEY username (username),
  KEY title (title),
  KEY status (status),
  KEY is_admin (is_admin),
  KEY chars_avg_index (user_id,username,title,status),
) ENGINE=MyISAM  DEFAULT CHARSET=utf8;


CREATE TABLE user_chars (
  user_id int(10) unsigned NOT NULL,
  rating_id char(32) DEFAULT NULL,
  rating tinyint(3) unsigned NOT NULL,
  PRIMARY KEY (user_id),
  KEY rating_id (rating_id),
  KEY rating (rating),
  KEY chars_index (user_id,rating_id,rating)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8;

编辑:添加了EXPLAIN

+----+-------------+------------+--------+--------------------------------------------+-----------------+---------+-----------+-------+-----------------------------------------------------------+
| id | select_type | table      | type   | possible_keys                              | key             | key_len | ref       | rows  | Extra                                                     |
+----+-------------+------------+--------+--------------------------------------------+-----------------+---------+-----------+-------+-----------------------------------------------------------+
|  1 | PRIMARY     | <derived2> | ALL    | NULL                                       | NULL            | NULL    | NULL      |  3668 | Using where                                               |
|  2 | DERIVED     | w          | range  | user_id,rating_id,rating,chars_index       | chars_index     | 98      | NULL      | 13215 | Using where; Using index; Using temporary; Using filesort |
|  2 | DERIVED     | r          | eq_ref | PRIMARY,status,is_admin,chars_avg_index    | PRIMARY         | 4       | w.user_id |     1 | Using where                                               |
+----+-------------+------------+--------+--------------------------------------------+-----------------+---------+-----------+-------+-----------------------------------------------------------+

3 个答案:

答案 0 :(得分:2)

当我查看此查询的EXPLAIN输出时,看起来MySQL在与WHERE进行联接之前将内部查询的user_chars子句应用于user_data }。因此,在(rating_id, rating)中添加user_id(不包含user_chars)的索引应该有助于内部查询的WHERE子句:

ALTER TABLE user_chars ADD INDEX (rating_id, rating);

编辑:此行为取决于每个表中的行数,因此发布EXPLAIN输出会很有帮助:]

Edit2:我还会按如下方式重写查询:

SELECT username, title, status  
FROM user_chars w, user_data r  
WHERE w.user_id = r.user_id  
AND (status < '300' OR is_admin = '1')    
AND (  
    (rating_id = 'rating1' AND rating BETWEEN 55 AND 65)  
    OR (rating_id = 'rating2' AND rating BETWEEN 50 AND 60)  
    OR (rating_id = 'rating3' AND rating BETWEEN 30 AND 40)  
    OR (rating_id = 'rating4' AND rating BETWEEN 90 AND 100)
    ...
)  
AND username != '0'  
AND title != '0'
GROUP BY w.user_id  
HAVING COUNT(*) >= 3  

答案 1 :(得分:1)

这是一个有趣的执行计划。我担心我无法提供任何特别具体的建议,主要是因为我没有设法提出任何简单的测试数据来说服我的MySQL服务器使用相同的计划。

我确实有一些随意的建议:

  • 您不需要嵌套查询 - 您可以使用HAVING COUNT(*) >= 3 AND username != '0' AND title != '0'获得相同的效果。或者您可以尝试将usernametitle条件移到内部WHERE子句中。

  • 我的测试表明MySQL不够智能,无法对status < '300' OR is_admin = '1'条件使用index merge和/或范围优化,即使我在(is_admin, status)上创建索引。创建一个编码这两个值的单个列可能是一个好主意,最好是只需要对它进行单一范围比较。

  • 您可能还会考虑删除所需的任何索引,除非其他查询需要它们。未使用的索引只会占用空间,减慢INSERT的速度并使查询计划程序混淆。

  • 如果您最近没有这样做,请在表格上运行ANALYZE TABLE,看看执行计划是否发生变化。

答案 2 :(得分:0)

user_data表的当前结构不幸地阻止了对任何索引的有效使用。

基本上,从user_data获取的数据的整体条件如下:

WHERE username != '0' AND title != '0' AND (status < '300' OR is_admin = '1')

应在聚合之前应用条件,否则聚合将处理多余的数据。

当您搜索与其他东西相等且条件与AND连接的任何内容时,索引可以发挥最佳效果,您的情况正好相反。 因此,为了优化查询,您可以引入一些非规范化列,它可以以某种方式存储(username!='0'AND title!='0'AND(status&lt;'300'或is_admin ='1'))的结果并被索引。到那时,我们将继续我们所拥有的。

您将结果与user_chars一起加入,其中包含多个OR,但所有这些操作都在rating_id和rating上运行。因为,评级列更具选择性(具有更多不同的值),所以最好将列放在复合索引(rating,rating_id)的左侧。拥有索引你不再需要(评级)和(rating_id,评级)的索引,只需删除它们。

现在,我不确定MySQL是否可以自行进行优化,因此您需要比较以下查询的执行情况:

SELECT user_id
FROM user_data JOIN user_chars USING (user_id)
WHERE username != '0' AND title != '0' AND (status < '300' OR is_admin = '1')
AND (  
    (rating_id = 'rating1' AND rating BETWEEN 55 AND 65)  
    OR (rating_id = 'rating2' AND rating BETWEEN 50 AND 60)  
    OR (rating_id = 'rating3' AND rating BETWEEN 30 AND 40)  
    OR (rating_id = 'rating4' AND rating BETWEEN 90 AND 100)
)
GROUP BY user_id
HAVING COUNT(*) > 3

和第二个:

SELECT user_id
FROM user_data JOIN user_chars USING (user_id)
WHERE username != '0' AND title != '0' AND (status < '300' OR is_admin = '1')
AND rating_id in ('rating1', 'rating2', 'rating3', 'rating4')
AND rating BETWEEN 55 AND 100 -- adjust the lines according to ... in your query
AND (  
    (rating_id = 'rating1' AND rating BETWEEN 55 AND 65)  
    OR (rating_id = 'rating2' AND rating BETWEEN 50 AND 60)  
    OR (rating_id = 'rating3' AND rating BETWEEN 30 AND 40)  
    OR (rating_id = 'rating4' AND rating BETWEEN 90 AND 100)
)
GROUP BY user_id
HAVING COUNT(*) > 3

后一个查询可能执行得更快,因为它包含使用我们的索引的显式提示。此外,两个查询都只选择user_ids而不是在聚合期间浪费内存。现在,您可以将最快查询的结果加入user_data表:

SELECT username, title, status
FROM (
SELECT user_id
FROM user_data JOIN user_chars USING (user_id)
WHERE username != '0' AND title != '0' AND (status < '300' OR is_admin = '1')
AND rating_id in ('rating1', 'rating2', 'rating3', 'rating4')
AND rating BETWEEN 55 AND 100
AND (  
    (rating_id = 'rating1' AND rating BETWEEN 55 AND 65)  
    OR (rating_id = 'rating2' AND rating BETWEEN 50 AND 60)  
    OR (rating_id = 'rating3' AND rating BETWEEN 30 AND 40)  
    OR (rating_id = 'rating4' AND rating BETWEEN 90 AND 100)
)
GROUP BY user_id
HAVING COUNT(*) > 3
) as user_ids JOIN user_data USING (user_id);