INNER加入时间过长

时间:2019-02-25 17:33:42

标签: mysql algorithm inner-join

我在这里问了最初的问题in stack before.,抱歉,这不是解决此问题的最佳方法。

问题是我有一个查询,即使使用INNER JOIN至少也要花费5秒钟才能完成,我想知道是否有更快的方法可以做到这一点。这是我得到的答案:

` q = "SELECT DISTINCT e2.eventId FROM event_tags e1 INNER JOIN event_tags e2 " \
        "ON BINARY e2.tagName=e1.tagName AND e2.eventId != e1.eventId " \
        "WHERE e1.eventId = {} ORDER BY RAND() LIMIT {}".format(eventId, '10')`

我的标签表如下

mysql> describe event_tags;

+---------+------------------+------+-----+---------+----------------+
| Field   | Type             | Null | Key | Default | Extra          |
+---------+------------------+------+-----+---------+----------------+
| tagId   | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| tagName | text             | NO   |     | NULL    |                |
| eventId | int(10) unsigned | NO   | PRI | NULL    |                |
+---------+------------------+------+-----+---------+----------------+

3 rows in set (0.00 sec)

,而且我有一堆标签,它们只会继续增长。当我对标签表进行计数时,我有504,402个tagId,并且标签名也是如此。我怎样才能使查找更快?

以下是事件代码表的一些示例数据

mysql> select * from event_tags limit 40;
+-------+-------------------------------------------+---------+
| tagId | tagName                                   | eventId |
+-------+-------------------------------------------+---------+
|   261 | Justin Timberlake (Rescheduled from 11/9) |      38 |
|   264 | Rogers Arena                              |      38 |
|   267 | Pop                                       |      38 |
|   271 | Rock                                      |      38 |
|   285 | Justin Timberlake (Rescheduled from 11/8) |      41 |
|   288 | Rogers Arena                              |      41 |
|   291 | Pop                                       |      41 |
|   294 | Rock                                      |      41 |
|   595 | Yogesh Soman                              |      84 |
|   599 | Geetanjali Kulkarni                       |      84 |
|   602 | Bhagyashree Shankpal                      |      84 |
|   606 | Lalit Prabhakar                           |      84 |
|   611 | Sameer Sanjay Vidwans                     |      84 |
|   617 | Drama                                     |      84 |
|   647 | Shrihari Abhyankar                        |      89 |
|   651 | Deepali Borkar                            |      89 |
|   654 | Akash Kamble                              |      89 |
|   657 | Sharavi Kulkarni                          |      89 |
|   660 | Sharav Wadhawekar                         |      89 |
|   667 | Nipun Dharmadhikari                       |      89 |
|   670 | Drama                                     |      89 |
|   689 | Frank Grillo                              |      94 |
|   692 | Jamie Bell                                |      94 |
|   695 | Margaret Qualley                          |      94 |
|   700 | James Badge Dale                          |      94 |
|   704 | Tim Sutton                                |      94 |
|   710 | Drama                                     |      94 |
|   734 | Bruce Dern                                |     101 |
|   739 | Anthony Michael Hall                      |     101 |
|   745 | Sean Astin                                |     101 |
|   749 | Aly Michalka                              |     101 |
|   754 | Victoria Smurfit                          |     101 |
|   759 | Carl Bessai                               |     101 |
|   762 | Drama                                     |     101 |
|   783 | Sarah Clarke                              |     106 |
|   785 | Xander Berkeley                           |     106 |
|   787 | Kristen Gutoskie                          |     106 |
|   790 | Mackenzie Astin                           |     106 |
|   794 | Bobby Campo                               |     106 |
|   798 | Adam Cushman                              |     106 |
+-------+-------------------------------------------+---------+
40 rows in set (0.00 sec)

这是该表的CREATE语句:

CREATE TABLE IF NOT EXISTS event_tags(
    tagId INT UNSIGNED NOT NULL AUTO_INCREMENT,
    tagName VARCHAR(40) NOT NULL,
    eventId INT UNSIGNED NOT NULL,
    PRIMARY KEY(tagId, eventId)
);

以下是查询的解释:

mysql> EXPLAIN SELECT DISTINCT e2.eventId FROM event_tags e1 INNER JOIN event_tags e2 ON BINARY e2.tagName=e1.tagName AND e2.eventId != e1.eventId WHERE e1.eventId = 487 ORDER BY RAND() LIMIT 10
    -> ;
+----+-------------+-------+------+---------------+------+---------+------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | Extra                                        |
+----+-------------+-------+------+---------------+------+---------+------+-------+----------------------------------------------+
|  1 | SIMPLE      | e1    | ALL  | NULL          | NULL | NULL    | NULL | 34275 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | e2    | ALL  | NULL          | NULL | NULL    | NULL | 34275 | Using where; Using join buffer               |
+----+-------------+-------+------+---------------+------+---------+------+-------+----------------------------------------------+
2 rows in set (0.03 sec)

更新:我在表上创建了一个索引:

CREATE INDEX tagsNdx ON event_tags (eventId, tagName(255));

现在看起来像这样:

mysql> show index from event_tags; +------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | event_tags | 0 | PRIMARY | 1 | tagId | A | 455408 | NULL | NULL | | BTREE | | | | event_tags | 0 | PRIMARY | 2 | eventId | A | 455408 | NULL | NULL | | BTREE | | | | event_tags | 1 | tagsNdx | 1 | eventId | A | 186 | NULL | NULL | | BTREE | | | | event_tags | 1 | tagsNdx | 2 | tagName | A | 186 | 255 | NULL | | BTREE | | | +------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 4 rows in set (0.00 sec) 但是它仍然很慢。

1 个答案:

答案 0 :(得分:0)

以下是可能的优化:

  1. 从主键中删除“ eventId”列(此步骤是可选步骤,您可以根据需要进一步详细说明)。
  2. 在列(eventId,tag_name)上创建索引。
  3. 执行命令:ANALYZE TABLE event_tags