优化InnoDB表和有问题的查询

时间:2013-06-19 12:24:54

标签: mysql clustered-index

我有一个很大的InnoDB表,此时包含大约2000万行,每天插入约20000个新行。它们包含不同主题的消息。

CREATE TABLE IF NOT EXISTS `Messages` (
  `ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `TopicID` bigint(20) unsigned NOT NULL,
  `DATESTAMP` int(11) DEFAULT NULL,
  `TIMESTAMP` int(10) unsigned NOT NULL,
  `Message` mediumtext NOT NULL,
  `Checksum` varchar(50) DEFAULT NULL,
  `Nickname` varchar(80) NOT NULL,
  PRIMARY KEY (`ID`),
  UNIQUE KEY `TopicID` (`TopicID`,`Checksum`),
  KEY `DATESTAMP` (`DATESTAMP`),
  KEY `Nickname` (`Nickname`),
  KEY `TIMESTAMP` (`TIMESTAMP`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 AUTO_INCREMENT=25195126 ;

注意:Cheksum存储MD5校验和,该校验和可防止在相同主题中插入两次相同的消息。 (昵称+时间戳+主题词+最后20个字符的消息)

我正在构建的网站有一个新闻源,用户可以选择查看来自不同论坛的不同昵称的最新消息。查询如下:

SELECT
Messages.ID AS MessageID,
Messages.Message,
Messages.TIMESTAMP,
Messages.Nickname,
Topics.ID AS TopicID,
Topics.Title AS TopicTitle,
Forums.Title AS ForumTitle

FROM Messages   

JOIN FollowedNicknames ON FollowedNicknames.UserID = 'MYUSERID'
JOIN Forums ON Forums.ID = FollowedNicknames.ForumID
JOIN Subforums ON Subforums.ForumID = Forums.ID
JOIN Topics ON Topics.SubforumID = Subforums.ID

WHERE 

Messages.Nickname = FollowedNicknames.Nickname AND 
Messages.TopicID = Topics.ID AND Messages.DATESTAMP = '2013619'
ORDER BY Messages.TIMESTAMP DESC

TIMESTAMP包含一个unix时间戳,DATESTAMP只是一个从unix时间戳生成的日期,可以通过'='运算符而不是带有unix时间戳的范围扫描更快地访问。

问题是,此查询大约需要13秒(或更多)无缓冲。这对于用意而言当然是不可接受的。添加DATESTAMP似乎可以加快速度,但不是很多。

此时,我真的不知道该怎么办。我已经阅读了有关复合主键的内容,但我仍然不确定它们是否会有任何好处以及如何在这种特殊情况下正确实现它。

我知道使用BIGINT可能有点矫枉过正,但它们会影响那么多吗?

说明:

+----+-------------+-----------------------+--------+---------------------------------------+------------+---------+-----------------------------------------------+------+----------------------------------------------+
| id | select_type | table                 | type   | possible_keys                         | key        | key_len | ref                                           | rows | Extra                                        |
+----+-------------+-----------------------+--------+---------------------------------------+------------+---------+-----------------------------------------------+------+----------------------------------------------+
|  1 | SIMPLE      | FollowedNicknames     | ALL    | UserID,ForumID,Nickname               | NULL       | NULL    | NULL                                          |    8 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | Forums                | eq_ref | PRIMARY                               | PRIMARY    | 8       | database.FollowedNicknames.ForumiID           |    1 | NULL                                         |
|  1 | SIMPLE      | Messages              | ref    | TopicID,DATETIME,Nickname             | Nickname   | 242     | database.FollowedNicknames.Nickname           |   15 | Using where                                  |
|  1 | SIMPLE      | Topics                | eq_ref | PRIMARY,SubforumID                    | PRIMARY    | 8       | database.Messages.TopicID                     |    1 | NULL                                         |
|  1 | SIMPLE      | Subforums             | eq_ref | PRIMARY,ForumID                       | PRIMARY    | 8       | database.Topics.SubforumID                    |    1 | Using where                                  |
+----+-------------+-----------------------+--------+---------------------------------------+------------+---------+-----------------------------------------------+------+----------------------------------------------+

1 个答案:

答案 0 :(得分:0)

您不应JOINVARCHARNickname;您应该使用用户ID来加入这些表。这肯定会减慢查询速度,可能是最大的问题。如果您在JOIN子句中明确地编写了所有WHERE而不是在SELECT Messages.ID AS MessageID, Messages.Message, Messages.TIMESTAMP, Messages.Nickname, Topics.ID AS TopicID, Topics.Title AS TopicTitle, Forums.Title AS ForumTitle FROM Messages JOIN FollowedNicknames ON Messages.Nickname = FollowedNicknames.Nickname AND FollowedNicknames.UserID = 'MYUSERID' JOIN Forums ON Forums.ID = FollowedNicknames.ForumID JOIN Subforums ON Subforums.ForumID = Forums.ID JOIN Topics ON Messages.TopicID = Topics.ID AND Topics.SubforumID = Subforums.ID WHERE Messages.DATESTAMP = '2013619' ORDER BY Messages.TIMESTAMP DESC 子句中,那么也会更容易理解:

INT

我会使用DATESTAMP而不是DATE作为Checksum列的数据类型。 latin1_general_ci列可能应使用INT作为排序规则。我会使用INT UNSIGNED作为ID列,只要它们的值小于2,000,000,000,因为{{1}}可以存储大约4,000,000,000的值。 InnoDB比MyISAM更受主键的影响,它可以产生显着的差异。