大型数据库的查询优化

时间:2016-05-09 14:23:18

标签: mysql sql query-optimization

您好我需要帮助来优化大于1百​​万的大型数据库记录的查询。当前查询需要27-30秒才能执行。

SELECT SQL_CALC_FOUND_ROWS
candidate.candidate_id AS candidateID,
candidate.candidate_id AS exportID,
candidate.is_hot AS isHot,
candidate.date_modified AS dateModifiedSort,
candidate.date_created AS dateCreatedSort,
candidate.first_name AS firstName,
candidate.last_name AS lastName,
candidate.city AS city,
candidate.state AS state,
candidate.key_skills AS keySkills,
owner_user.first_name AS ownerFirstName,
owner_user.last_name AS ownerLastName,
CONCAT(owner_user.last_name,
        owner_user.first_name) AS ownerSort,
DATE_FORMAT(candidate.date_created, '%m-%d-%y') AS dateCreated,
DATE_FORMAT(candidate.date_modified, '%m-%d-%y') AS dateModified,
candidate.email2 AS email2 FROM
candidate
    LEFT JOIN
user AS owner_user ON candidate.owner = owner_user.user_id
    LEFT JOIN
saved_list_entry ON saved_list_entry.data_item_type = 100
    AND saved_list_entry.data_item_id = candidate.candidate_id WHERE
is_active = 1 GROUP BY candidate.candidate_id ORDER BY    dateModifiedSort 
DESC LIMIT 0 , 15

是否有任何方法可以减少查询的执行时间。我还在表中添加了索引,但它没有正常工作。

Indexes

4 个答案:

答案 0 :(得分:1)

您正在使用查询模式

     SELECT a vast bunch of stuff
       FROM a complex assembly of JOIN operations
      ORDER BY some variable DESC
      LIMIT 0,small number

这本质上是低效的:为了满足你的查询,MySQL服务器必须构造一个庞大的结果集,然后它必须对整个事物进行排序,然后它需要前15行并丢弃其余部分。

为了提高效率,您需要减少排序。这是一种方法。看起来你想找到最近修改过的十五名候选人。该查询将非常便宜地检索那些候选者的ID。它利用了你的一个索引。

                   SELECT candidate_id
                     FROM candidate
                    ORDER BY date_modified DESC
                    LIMIT 0, 15

然后,您可以将其用作主查询中的子查询。添加如下的子句:

  WHERE candidate.candidate_id IN (
                   SELECT candidate_id
                     FROM candidate
                    ORDER BY date_modified DESC
                    LIMIT 0, 15)

在适当的地方查询。

另请注意,您使用的是nonstandard and potentially harmful MySQL specific extension to GROUP BY。您的查询有效,但如果候选人拥有多个所有者,则在随机选择后只返回一个。

最后,您似乎已在大表中的许多列上放置了单列索引。这是一个臭名昭着的SQL反模式:所有这些索引都会降低INSERT和UPDATE操作的速度,而且大多数这些操作可能没有加快查询的速度。当然,对于此查询,唯一有用的索引是date_modified上的索引和主键。

使用特定的多列索引可以最好地满足许多复杂查询。一堆单列索引对此类查询没有帮助。

答案 1 :(得分:1)

我已经更改了下面查询中的表别名,使用它 这必须解决你的问题

SELECT SQL_CALC_FOUND_ROWS
candidate.candidate_id AS candidateID,
candidate.candidate_id AS exportID,
candidate.is_hot AS isHot,
candidate.date_modified AS dateModifiedSort,
candidate.date_created AS dateCreatedSort,
candidate.first_name AS firstName,
candidate.last_name AS lastName,
candidate.city AS city,
candidate.state AS state,
candidate.key_skills AS keySkills,
user.first_name AS ownerFirstName,
user.last_name AS ownerLastName,
CONCAT(user.last_name,
        user.first_name) AS ownerSort,
DATE_FORMAT(candidate.date_created, '%m-%d-%y') AS dateCreated,
DATE_FORMAT(candidate.date_modified, '%m-%d-%y') AS dateModified,
candidate.email2 AS email2 FROM
candidate
    LEFT JOIN
user ON candidate.owner = user.user_id
    LEFT JOIN
saved_list_entry ON saved_list_entry.data_item_type = 100
    AND saved_list_entry.data_item_id = candidate.candidate_id WHERE
is_active = 1 GROUP BY candidate.candidate_id ORDER BY    dateModifiedSort 
DESC LIMIT 0 , 15

使用以下查询为加入条件

创建索引
create index index_user user(user_id);

create index index_saved_list_entry saved_list_entry(data_item_type,data_item_id);

create index index_candidate candidate(is_active,candidate_id,dateModifiedSort);

答案 2 :(得分:1)

首先,一个候选人,我怀疑ID始终只是一个条目,所以你为什么要做GROUP BY超出我的意思,这可以被删除并改善一点。

其次,您正在对“saved_list_entry”表进行左连接,但实际上没有从中拉出任何列,因此可能会完全删除。

第三,考虑到GROUP BY不再适用,我建议将索引更新为:

table             index
CANDIDATE         ( is_active, date_modified, candidate_id, owner )
user              ( user_id )
saved_list_entry  ( data_item_id, data_item_type )

由于您的订单是按降序修改的日期,让IT处于is_active(Where条件)的第二个位置,它将快速浏览您的前15个。但是,您的SQL_CALC_FOUND_ROWS仍然需要遍历所有其他限定条件,但结果集将由索引预先排序以匹配。

SELECT SQL_CALC_FOUND_ROWS
      c.candidate_id AS candidateID,
      c.candidate_id AS exportID,
      c.is_hot AS isHot,
      c.date_modified AS dateModifiedSort,
      c.date_created AS dateCreatedSort,
      c.first_name AS firstName,
      c.last_name AS lastName,
      c.city AS city,
      c.state AS state,
      c.key_skills AS keySkills,
      u.first_name AS ownerFirstName,
      u.last_name AS ownerLastName,
      CONCAT(u.last_name, u.first_name) AS ownerSort,
      DATE_FORMAT(c.date_created, '%m-%d-%y') AS dateCreated,
      DATE_FORMAT(c.date_modified, '%m-%d-%y') AS dateModified,
      c.email2 AS email2 
   FROM
      candidate c
         LEFT JOIN user u
            ON c.owner = u.user_id
         LEFT JOIN saved_list_entry s
            ON c.candidate_id = s.data_item_id
            AND s.data_item_type = 100
   WHERE
      c.is_active = 1 
   GROUP BY 
      c.candidate_id 
   ORDER BY    
      c.date_modified DESC 
   LIMIT 
      0, 15

答案 3 :(得分:1)

  1. 摆脱saved_list_entry,它什么都没有。

  2. 延迟加入user。这将让您摆脱GROUP BY,这会增加一些时间,并可能使FOUND_ROWS()的价值膨胀。

  3. 类似的东西:

    SELECT  c2.*,
            ou.first_name AS ownerFirstName,
            ou.last_name AS ownerLastName,
            CONCAT(ou.last_name, ou.first_name) AS ownerSort,
        FROM  
          ( SELECT  SQL_CALC_FOUND_ROWS
                    c.candidate_id AS candidateID, c.candidate_id AS exportID,
                    c.is_hot AS isHot, c.date_modified AS dateModifiedSort,
                    c.date_created AS dateCreatedSort, c.first_name AS firstName,
                    c.last_name AS lastName, c.city AS city, c.state AS state,
                    c.key_skills AS keySkills,
                    DATE_FORMAT(c.date_created, '%m-%d-%y') AS dateCreated,
                    DATE_FORMAT(c.date_modified, '%m-%d-%y') AS dateModified,
                    c.email2 AS email2
                FROM  candidate AS c
                WHERE  is_active = 1
                GROUP BY  c.candidate_id
                ORDER BY  c.date_modified DESC  -- note change here
                LIMIT  0 , 15 
          ) AS c2
        LEFT JOIN  user AS ou  ON c2.owner = ou.user_id;
    

    (我搞砸了列顺序,但你可以解决这个问题。)

    需要索引:

    candidate:  INDEX(is_active, candidate_id, date_modified)