为什么添加INNER JOIN会使此查询变得如此之慢?

时间:2014-09-10 11:03:23

标签: mysql sql

我有一个包含以下三个表的数据库:

匹配表有200,000个匹配...

CREATE TABLE `matches` (
`match_id` bigint(20) unsigned NOT NULL,
`start_time` int(10) unsigned NOT NULL,
PRIMARY KEY (`match_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

英雄表有~100个英雄......

CREATE TABLE `heroes` (
`hero_id` smallint(5) unsigned NOT NULL,
`name` char(40) NOT NULL,
PRIMARY KEY (`hero_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

matches_heroes表有2,000,000个关系(每场比赛10个随机英雄)......

CREATE TABLE `matches_heroes` (
`relation_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`match_id` bigint(20) unsigned NOT NULL,
`hero_id` smallint(6) unsigned NOT NULL,
PRIMARY KEY (`relation_id`),
KEY `match_id` (`match_id`),
KEY `hero_id` (`hero_id`),
CONSTRAINT `matches_heroes_ibfk_2` FOREIGN KEY (`hero_id`)
REFERENCES `heroes` (`hero_id`),
CONSTRAINT `matches_heroes_ibfk_1` FOREIGN KEY (`match_id`)
REFERENCES `matches` (`match_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=3689891 DEFAULT CHARSET=utf8

以下查询需要1秒钟,对于我这么简单的事情来说,这似乎很慢:

SELECT SQL_NO_CACHE COUNT(*) AS match_count
FROM matches INNER JOIN matches_heroes ON matches.match_id = matches_heroes.match_id
WHERE hero_id = 5

仅删除WHERE子句没有帮助,但是如果我也取出INNER JOIN,就像这样:

SELECT SQL_NO_CACHE COUNT(*) AS match_count FROM matches

......只需0.05秒。似乎INNER JOIN非常昂贵。我对连接没有多少经验。这是正常的还是我做错了什么?

更新#1:这是EXPLAIN的结果。

id  select_type  table          type   possible_keys                     key     key_len  ref                                rows  Extra  
1   SIMPLE       matches_heroes ref    match_id,hero_id,match_id_hero_id hero_id 2        const                              34742
1   SIMPLE       matches        eq_ref PRIMARY                           PRIMARY 8        mydatabase.matches_heroes.match_id 1     Using index

更新#2:听完你们之后,我认为它运作正常,而且速度和它一样快。如果您不同意,请告诉我。谢谢你的帮助。我真的很感激。

2 个答案:

答案 0 :(得分:1)

使用COUNT(matches.match_id)代替count(*),因为使用连接时最好不要使用*,因为它会进行额外的计算。使用联接中的列是确保您不请求任何其他操作的最佳方式。(MySql InnerJoin上没有问题,我的错误)。

此外,你应该验证你是否已经对所有密钥进行了碎片整理,并且有足够的内存可供索引加载到内存中

更新1:


尝试为match_id,hero_id添加组合索引,因为它应该可以提供更好的性能。

ALTER TABLE `matches_heroes` ADD KEY `match_id_hero_id` (`match_id`,`hero_id`)


更新2:


我对接受的答案不满意,mysql只有2毫米记录的速度很慢,而且我的ubuntu PC(i7处理器,标准硬盘驱动器)上的基准测试也没有。

-- pre-requirements

CREATE TABLE seq_numbers (
    number INT NOT NULL
) ENGINE = MYISAM;


DELIMITER $$
CREATE PROCEDURE InsertSeq(IN MinVal INT, IN MaxVal INT)
    BEGIN
        DECLARE i INT;
        SET i = MinVal;
        START TRANSACTION;
        WHILE i <= MaxVal DO
            INSERT INTO seq_numbers VALUES (i);
            SET i = i + 1;
        END WHILE;
        COMMIT;
    END$$
DELIMITER ;

CALL InsertSeq(1,200000)
;

ALTER TABLE seq_numbers ADD PRIMARY KEY (number)
;

--  create tables

-- DROP TABLE IF EXISTS `matches`
CREATE TABLE `matches` (
`match_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`start_time` int(10) unsigned NOT NULL,
PRIMARY KEY (`match_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
;

CREATE TABLE `heroes` (
`hero_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
`name` char(40) NOT NULL,
PRIMARY KEY (`hero_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
;

CREATE TABLE `matches_heroes` (
`match_id` bigint(20) unsigned NOT NULL,
`hero_id` smallint(6) unsigned NOT NULL,
PRIMARY KEY (`match_id`,`hero_id`),
KEY (match_id),
KEY (hero_id),
CONSTRAINT `matches_heroes_ibfk_2` FOREIGN KEY (`hero_id`) REFERENCES `heroes` (`hero_id`),
CONSTRAINT `matches_heroes_ibfk_1` FOREIGN KEY (`match_id`) REFERENCES `matches` (`match_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=MyISAM DEFAULT CHARSET=utf8
;
-- insert DATA
-- 100
INSERT INTO heroes(name)
SELECT SUBSTR(CONCAT(char(RAND()*25+65),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97)),1,RAND()*9+4) as RandomName
FROM seq_numbers WHERE number <= 100

-- 200000
INSERT INTO matches(start_time)
SELECT rand()*1000000
FROM seq_numbers WHERE number <= 200000

-- 2000000
INSERT INTO matches_heroes(hero_id,match_id)
SELECT a.hero_id, b.match_id
FROM heroes as a
INNER JOIN matches as b ON 1=1
LIMIT 2000000

-- warm-up database, load INDEXes in ram (optional, works only for MyISAM tables)
LOAD INDEX INTO CACHE matches_heroes,matches,heroes


-- get random hero_id
SET @randHeroId=(SELECT hero_id FROM matches_heroes ORDER BY rand() LIMIT 1);


-- test 1 

SELECT SQL_NO_CACHE @randHeroId,COUNT(*) AS match_count
FROM matches as a 
INNER JOIN matches_heroes as b ON a.match_id = b.match_id
WHERE b.hero_id = @randHeroId
; -- Time: 0.039s


-- test 2: adding some complexity 
SET @randName = (SELECT `name` FROM heroes WHERE hero_id = @randHeroId LIMIT 1);

SELECT SQL_NO_CACHE @randName, COUNT(*) AS match_count
FROM matches as a 
INNER JOIN matches_heroes as b ON a.match_id = b.match_id
INNER JOIN heroes as c ON b.hero_id = c.hero_id
WHERE c.name = @randName
; -- Time: 0.037s

结论:测试结果快了大约20倍,在测试之前我的服务器负载大约是80%,因为它不是专用的mysql服务器并且运行了其他cpu密集型任务,所以如果运行整个脚本(从上面)并得到较低的结果,因为:

  1. 你有一个共享主机,负载太大了。在这种情况下,您无能为力:您要么向当前主持人投诉,要么支付更好的主机/虚拟机,要么尝试其他主机
  2. 您配置的key_buffer_size(针对MyISAM)或innodb_buffer_pool_size(针对innoDB)太小,最佳大小将超过150MB
  3. 你的可用ram是不够的,你需要大约100 - 150 mb的ram才能将索引加载到内存中。解决方案:释放一些公羊或购买更多公羊
  4. 请注意,通过使用测试脚本,生成新数据可以排除索引碎片问题。 希望这会有所帮助,并询问您是否在测试时遇到问题。


    观测值:


    SELECT SQL_NO_CACHE COUNT(*) AS match_count 
    FROM matches INNER JOIN matches_heroes ON matches.match_id = matches_heroes.match_id 
    WHERE hero_id = 5` 
    

    相当于:

    SELECT SQL_NO_CACHE COUNT(*) AS match_count 
    FROM matches_heroes 
    WHERE hero_id = 5` 
    

    所以你不需要加入,如果这是你需要的数量,但我猜这只是一个例子。

答案 1 :(得分:1)

所以你说读一张200,000条记录的表比读取2,000,000条记录的表更快,找到所需的记录,然后把它们全部用来找到200,000条记录表中的匹配记录?

这让你感到惊讶吗?这对dbms来说只是做了很多工作。 (甚至可以说,当dbms认为全表扫描速度更快时,dbms决定不使用hero_id索引。)

所以在我看来,这里发生的事情并没有错。