使用左连接排序不使用索引而且非常慢

时间:2015-12-22 10:53:07

标签: sql postgresql join

我有以下两个查询。查询1很快,因为它使用索引(使用嵌套循环连接),而查询2使用散列连接,速度较慢。

查询1按表1列排序,查询2按表2列排序。

查询1

learning=# explain analyze
select *
from users left join
     access_logs
     on users.userid = access_logs.userid
order by users.userid
limit 10 offset 90;


                                                    QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=14.00..15.46 rows=10 width=104) (actual time=1.330..1.504 rows=10 loops=1)
   ->  Merge Left Join  (cost=0.85..291532.97 rows=1995958 width=104) (actual time=0.037..1.482 rows=100 loops=1)
         Merge Cond: (users.userid = access_logs.userid)
         ->  Index Scan using users_pkey on users  (cost=0.43..151132.75 rows=1995958 width=76) (actual time=0.018..1.135 rows=100 loops=1)
         ->  Index Scan using access_logs_userid_idx on access_logs  (cost=0.43..110471.45 rows=1995958 width=28) (actual time=0.012..0.198 rows=100 loops=1)
 Planning time: 0.469 ms
 Execution time: 1.569 ms

查询2

learning=# explain analyze
select *
from users left join
     access_logs
     on users.userid = access_logs.userid
order by access_logs.userid
limit 10 offset 90;
                                                                   QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=293584.20..293584.23 rows=10 width=104) (actual time=3821.432..3821.439 rows=10 loops=1)
   ->  Sort  (cost=293583.98..298573.87 rows=1995958 width=104) (actual time=3821.391..3821.415 rows=100 loops=1)
         Sort Key: access_logs.userid
         Sort Method: top-N heapsort  Memory: 51kB
         ->  Hash Left Join  (cost=73231.06..217299.90 rows=1995958 width=104) (actual time=539.859..3168.754 rows=1995958 loops=1)
               Hash Cond: (users.userid = access_logs.userid)
               ->  Seq Scan on users  (cost=0.00..44814.58 rows=1995958 width=76) (actual time=0.009..443.260 rows=1995958 loops=1)
               ->  Hash  (cost=34636.58..34636.58 rows=1995958 width=28) (actual time=539.112..539.112 rows=1995958 loops=1)
                     Buckets: 262144  Batches: 2  Memory Usage: 58532kB
                     ->  Seq Scan on access_logs  (cost=0.00..34636.58 rows=1995958 width=28) (actual time=0.006..170.061 rows=1995958 loops=1)
 Planning time: 0.480 ms
 Execution time: 3832.245 ms

问题

  • 第二个查询很慢,因为排序是在计划之前的连接之前完成的。
  • 为什么第二个表中的排序不使用索引?下面有一个计划,只是排序。

查询 - 解析分析select * from access_logs顺序by userid limit 10 offset 90;

计划

 Limit  (cost=5.41..5.96 rows=10 width=28) (actual time=0.199..0.218 rows=10 loops=1)
   ->  Index Scan using access_logs_userid_idx on access_logs  (cost=0.43..110471.45 rows=1995958 width=28) (actual time=0.029..0.201 rows=100 loops=1)
 Planning time: 0.120 ms
 Execution time: 0.252 ms

修改1

我的目标不是比较两个查询,实际上我想要查询2中的结果,我只提供了查询1,以便我可以比较理解。

订单依据不限于连接列,用户也可以通过表2中的其他列进行订购,计划如下。

learning=# explain analyze select * from users left join access_logs on users.userid=access_logs.userid order by access_logs.last_login limit 10;
                                                                   QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=260431.83..260431.86 rows=10 width=104) (actual time=3846.625..3846.627 rows=10 loops=1)
   ->  Sort  (cost=260431.83..265421.73 rows=1995958 width=104) (actual time=3846.623..3846.623 rows=10 loops=1)
         Sort Key: access_logs.last_login
         Sort Method: top-N heapsort  Memory: 27kB
         ->  Hash Left Join  (cost=73231.06..217299.90 rows=1995958 width=104) (actual time=567.104..3174.818 rows=1995958 loops=1)
               Hash Cond: (users.userid = access_logs.userid)
               ->  Seq Scan on users  (cost=0.00..44814.58 rows=1995958 width=76) (actual time=0.007..443.364 rows=1995958 loops=1)
               ->  Hash  (cost=34636.58..34636.58 rows=1995958 width=28) (actual time=566.814..566.814 rows=1995958 loops=1)
                     Buckets: 262144  Batches: 2  Memory Usage: 58532kB
                     ->  Seq Scan on access_logs  (cost=0.00..34636.58 rows=1995958 width=28) (actual time=0.004..169.137 rows=1995958 loops=1)
 Planning time: 0.490 ms
 Execution time: 3857.171 ms

1 个答案:

答案 0 :(得分:2)

第二个查询中的排序不会使用索引,因为不保证索引将所有值都排序。如果ib_logfile1中的某些记录与users不匹配,那么access_logs会生成Left Join在查询中引用的nullaccess_logs.userid,但实际上不会出现在access_logs中因而没有被索引覆盖。

解决方法是在access_log为每个用户创建默认初始记录,并使用Inner Join