使用巨大的NOT IN语句优化查询

时间:2012-05-09 18:30:23

标签: mysql query-optimization

我试图在某个时间戳之前找到仅存在的源代码。这个查询对于这项工作来说似乎很差。知道如何优化或可能改进的索引吗?

select distinct sourcesite 
  from contentmeta 
  where timestamp <= '2011-03-15'
  and sourcesite not in (
    select distinct sourcesite 
      from contentmeta 
      where timestamp>'2011-03-15'
  );

源网站和时间戳上有一个索引,但查询仍需要很长时间

mysql> EXPLAIN select distinct sourcesite from contentmeta where timestamp <= '2011-03-15' and sourcesite not in (select distinct sourcesite from contentmeta where timestamp>'2011-03-15');
+----+--------------------+-------------+----------------+---------------+----------+---------+------+--------+-------------------------------------------------+
| id | select_type        | table       | type           | possible_keys | key      | key_len | ref  | rows   | Extra                                           |
+----+--------------------+-------------+----------------+---------------+----------+---------+------+--------+-------------------------------------------------+
|  1 | PRIMARY            | contentmeta | index          | NULL          | sitetime | 14      | NULL | 725697 | Using where; Using index                        |
|  2 | DEPENDENT SUBQUERY | contentmeta | index_subquery | sitetime      | sitetime | 5       | func |     48 | Using index; Using where; Full scan on NULL key |
+----+--------------------+-------------+----------------+---------------+----------+---------+------+--------+-------------------------------------------------+

3 个答案:

答案 0 :(得分:3)

子查询不需要DISTINCT,也不需要外部查询的WHERE子句,因为您已经通过NOT IN进行过滤。

尝试:

select distinct sourcesite
from contentmeta
where sourcesite not in (
    select sourcesite
    from contentmeta
    where timestamp > '2011-03-15'
);

答案 1 :(得分:3)

这应该有效:

SELECT DISTINCT c1.sourcesite
FROM contentmeta c1
LEFT JOIN contentmeta c2
  ON c2.sourcesite = c1.sourcesite
  AND c2.timestamp > '2011-03-15'
WHERE c1.timestamp <= '2011-03-15'
  AND c2.sourcesite IS NULL

为获得最佳效果,请在contentmeta(sourcesitetimestamp)上设置多列索引。

通常,连接比子查询执行得更好,因为派生表不能使用索引。

答案 2 :(得分:1)

我发现“不在”只是不能很好地优化许多数据库。改为使用left outer join

select distinct sourcesite 
from contentmeta cm 
left outer join
(
   select distinct sourcesite
   from contentmeta
   where timestamp>'2011-03-15'
) t
  on cm.sourcesite = t.sourcesite
where timestamp <= '2011-03-15' and t.sourcesite is null

这假定sourcesite永远不会为空。