Question

您好我需要帮助来优化大于1百万的大型数据库记录的查询。当前查询需要27-30秒才能执行。

SELECT SQL_CALC_FOUND_ROWS
candidate.candidate_id AS candidateID,
candidate.candidate_id AS exportID,
candidate.is_hot AS isHot,
candidate.date_modified AS dateModifiedSort,
candidate.date_created AS dateCreatedSort,
candidate.first_name AS firstName,
candidate.last_name AS lastName,
candidate.city AS city,
candidate.state AS state,
candidate.key_skills AS keySkills,
owner_user.first_name AS ownerFirstName,
owner_user.last_name AS ownerLastName,
CONCAT(owner_user.last_name,
        owner_user.first_name) AS ownerSort,
DATE_FORMAT(candidate.date_created, '%m-%d-%y') AS dateCreated,
DATE_FORMAT(candidate.date_modified, '%m-%d-%y') AS dateModified,
candidate.email2 AS email2 FROM
candidate
    LEFT JOIN
user AS owner_user ON candidate.owner = owner_user.user_id
    LEFT JOIN
saved_list_entry ON saved_list_entry.data_item_type = 100
    AND saved_list_entry.data_item_id = candidate.candidate_id WHERE
is_active = 1 GROUP BY candidate.candidate_id ORDER BY    dateModifiedSort 
DESC LIMIT 0 , 15

是否有任何方法可以减少查询的执行时间。我还在表中添加了索引，但它没有正常工作。

Answer 1

您正在使用查询模式

     SELECT a vast bunch of stuff
       FROM a complex assembly of JOIN operations
      ORDER BY some variable DESC
      LIMIT 0,small number

这本质上是低效的：为了满足你的查询，MySQL服务器必须构造一个庞大的结果集，然后它必须对整个事物进行排序，然后它需要前15行并丢弃其余部分。

为了提高效率，您需要减少排序。这是一种方法。看起来你想找到最近修改过的十五名候选人。该查询将非常便宜地检索那些候选者的ID。它利用了你的一个索引。

                   SELECT candidate_id
                     FROM candidate
                    ORDER BY date_modified DESC
                    LIMIT 0, 15

然后，您可以将其用作主查询中的子查询。添加如下的子句：

  WHERE candidate.candidate_id IN (
                   SELECT candidate_id
                     FROM candidate
                    ORDER BY date_modified DESC
                    LIMIT 0, 15)

在适当的地方查询。

另请注意，您使用的是nonstandard and potentially harmful MySQL specific extension to GROUP BY。您的查询有效，但如果候选人拥有多个所有者，则在随机选择后只返回一个。

最后，您似乎已在大表中的许多列上放置了单列索引。这是一个臭名昭着的SQL反模式：所有这些索引都会降低INSERT和UPDATE操作的速度，而且大多数这些操作可能没有加快查询的速度。当然，对于此查询，唯一有用的索引是date_modified上的索引和主键。

使用特定的多列索引可以最好地满足许多复杂查询。一堆单列索引对此类查询没有帮助。

Answer 2

我已经更改了下面查询中的表别名，使用它这必须解决你的问题

SELECT SQL_CALC_FOUND_ROWS
candidate.candidate_id AS candidateID,
candidate.candidate_id AS exportID,
candidate.is_hot AS isHot,
candidate.date_modified AS dateModifiedSort,
candidate.date_created AS dateCreatedSort,
candidate.first_name AS firstName,
candidate.last_name AS lastName,
candidate.city AS city,
candidate.state AS state,
candidate.key_skills AS keySkills,
user.first_name AS ownerFirstName,
user.last_name AS ownerLastName,
CONCAT(user.last_name,
        user.first_name) AS ownerSort,
DATE_FORMAT(candidate.date_created, '%m-%d-%y') AS dateCreated,
DATE_FORMAT(candidate.date_modified, '%m-%d-%y') AS dateModified,
candidate.email2 AS email2 FROM
candidate
    LEFT JOIN
user ON candidate.owner = user.user_id
    LEFT JOIN
saved_list_entry ON saved_list_entry.data_item_type = 100
    AND saved_list_entry.data_item_id = candidate.candidate_id WHERE
is_active = 1 GROUP BY candidate.candidate_id ORDER BY    dateModifiedSort 
DESC LIMIT 0 , 15

使用以下查询为加入条件

创建索引

create index index_user user(user_id);

create index index_saved_list_entry saved_list_entry(data_item_type,data_item_id);

create index index_candidate candidate(is_active,candidate_id,dateModifiedSort);

Answer 3

首先，一个候选人，我怀疑ID始终只是一个条目，所以你为什么要做GROUP BY超出我的意思，这可以被删除并改善一点。

其次，您正在对“saved_list_entry”表进行左连接，但实际上没有从中拉出任何列，因此可能会完全删除。

第三，考虑到GROUP BY不再适用，我建议将索引更新为：

table             index
CANDIDATE         ( is_active, date_modified, candidate_id, owner )
user              ( user_id )
saved_list_entry  ( data_item_id, data_item_type )

由于您的订单是按降序修改的日期，让IT处于is_active（Where条件）的第二个位置，它将快速浏览您的前15个。但是，您的SQL_CALC_FOUND_ROWS仍然需要遍历所有其他限定条件，但结果集将由索引预先排序以匹配。

SELECT SQL_CALC_FOUND_ROWS
      c.candidate_id AS candidateID,
      c.candidate_id AS exportID,
      c.is_hot AS isHot,
      c.date_modified AS dateModifiedSort,
      c.date_created AS dateCreatedSort,
      c.first_name AS firstName,
      c.last_name AS lastName,
      c.city AS city,
      c.state AS state,
      c.key_skills AS keySkills,
      u.first_name AS ownerFirstName,
      u.last_name AS ownerLastName,
      CONCAT(u.last_name, u.first_name) AS ownerSort,
      DATE_FORMAT(c.date_created, '%m-%d-%y') AS dateCreated,
      DATE_FORMAT(c.date_modified, '%m-%d-%y') AS dateModified,
      c.email2 AS email2 
   FROM
      candidate c
         LEFT JOIN user u
            ON c.owner = u.user_id
         LEFT JOIN saved_list_entry s
            ON c.candidate_id = s.data_item_id
            AND s.data_item_type = 100
   WHERE
      c.is_active = 1 
   GROUP BY 
      c.candidate_id 
   ORDER BY    
      c.date_modified DESC 
   LIMIT 
      0, 15

Answer 4

摆脱saved_list_entry，它什么都没有。
延迟加入user。这将让您摆脱GROUP BY，这会增加一些时间，并可能使FOUND_ROWS()的价值膨胀。

类似的东西：

SELECT  c2.*,
        ou.first_name AS ownerFirstName,
        ou.last_name AS ownerLastName,
        CONCAT(ou.last_name, ou.first_name) AS ownerSort,
    FROM  
      ( SELECT  SQL_CALC_FOUND_ROWS
                c.candidate_id AS candidateID, c.candidate_id AS exportID,
                c.is_hot AS isHot, c.date_modified AS dateModifiedSort,
                c.date_created AS dateCreatedSort, c.first_name AS firstName,
                c.last_name AS lastName, c.city AS city, c.state AS state,
                c.key_skills AS keySkills,
                DATE_FORMAT(c.date_created, '%m-%d-%y') AS dateCreated,
                DATE_FORMAT(c.date_modified, '%m-%d-%y') AS dateModified,
                c.email2 AS email2
            FROM  candidate AS c
            WHERE  is_active = 1
            GROUP BY  c.candidate_id
            ORDER BY  c.date_modified DESC  -- note change here
            LIMIT  0 , 15 
      ) AS c2
    LEFT JOIN  user AS ou  ON c2.owner = ou.user_id;

（我搞砸了列顺序，但你可以解决这个问题。）

需要索引：

candidate:  INDEX(is_active, candidate_id, date_modified)

大型数据库的查询优化

4 个答案: