如何优化此Postgresql计数查询?

时间:2013-07-16 08:09:53

标签: query-optimization postgresql-9.1

SELECT COUNT(*)
FROM "businesses"
WHERE (businesses.postal_code_id IN
         (SELECT id
          FROM postal_codes
          WHERE lower(city) IN ('los angeles')
            AND lower(region) = 'california'))
  AND (EXISTS
         (SELECT *
          FROM categorizations c
          WHERE c.business_id=businesses.id
            AND c.category_id IN (86)))

我有一个postgres数据库业务,类别和位置。这个查询执行了95665.9ms,我很确定瓶颈在于分类。有没有更好的方法来执行此操作?得到的计数是1032

=# EXPLAIN ANALYZE SELECT COUNT(*)
-# FROM "businesses"
-# WHERE (businesses.postal_code_id IN
(#          (SELECT id
(#           FROM postal_codes
(#           WHERE lower(city) IN ('los angeles')
(#             AND lower(region) = 'california'));
                                                                             QUERY PLAN                                                                              
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=4007.74..4007.75 rows=1 width=0) (actual time=263820.923..263820.924 rows=1 loops=1)
   ->  Nested Loop  (cost=41.93..4005.20 rows=1015 width=0) (actual time=469.716..263679.865 rows=112513 loops=1)
         ->  HashAggregate  (cost=15.59..15.60 rows=1 width=4) (actual time=332.664..332.946 rows=82 loops=1)
               ->  Bitmap Heap Scan on postal_codes  (cost=11.57..15.59 rows=1 width=4) (actual time=84.772..332.407 rows=82 loops=1)
                     Recheck Cond: ((lower((city)::text) = 'los angeles'::text) AND (lower((region)::text) = 'california'::text))
                     ->  BitmapAnd  (cost=11.57..11.57 rows=1 width=0) (actual time=77.530..77.530 rows=0 loops=1)
                           ->  Bitmap Index Scan on idx_postal_codes_lower_city  (cost=0.00..5.66 rows=187 width=0) (actual time=22.800..22.800 rows=82 loops=1)
                                 Index Cond: (lower((city)::text) = 'los angeles'::text)
                           ->  Bitmap Index Scan on idx_postal_codes_lower_region  (cost=0.00..5.66 rows=187 width=0) (actual time=54.714..54.714 rows=2356 loops=1)
                                 Index Cond: (lower((region)::text) = 'california'::text)
         ->  Bitmap Heap Scan on businesses  (cost=26.34..3976.91 rows=1015 width=4) (actual time=95.926..3208.426 rows=1372 loops=82)
               Recheck Cond: (postal_code_id = postal_codes.id)
               ->  Bitmap Index Scan on index_businesses_on_postal_code_id  (cost=0.00..26.08 rows=1015 width=0) (actual time=89.864..89.864 rows=1380 loops=82)
                     Index Cond: (postal_code_id = postal_codes.id)
 Total runtime: 263821.016 ms
(15 rows)

加入版本:

EXPLAIN ANALYZE SELECT count(*) FROM businesses
LEFT JOIN postal_codes
ON businesses.postal_code_id = postal_codes.id
WHERE lower(postal_codes.city) = 'los angeles'
AND lower(postal_codes.region) = 'california';

-[ RECORD 1 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN | Aggregate  (cost=4006.14..4006.15 rows=1 width=0) (actual time=143357.170..143357.171 rows=1 loops=1)
-[ RECORD 2 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |   ->  Nested Loop  (cost=37.91..4005.19 rows=381 width=0) (actual time=138.666..143218.064 rows=112514 loops=1)
-[ RECORD 3 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |         ->  Bitmap Heap Scan on postal_codes  (cost=11.57..15.59 rows=1 width=4) (actual time=0.559..33.957 rows=82 loops=1)
-[ RECORD 4 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |               Recheck Cond: ((lower((city)::text) = 'los angeles'::text) AND (lower((region)::text) = 'california'::text))
-[ RECORD 5 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |               ->  BitmapAnd  (cost=11.57..11.57 rows=1 width=0) (actual time=0.532..0.532 rows=0 loops=1)
-[ RECORD 6 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |                     ->  Bitmap Index Scan on idx_postal_codes_lower_city  (cost=0.00..5.66 rows=187 width=0) (actual time=0.058..0.058 rows=82 loops=1)
-[ RECORD 7 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |                           Index Cond: (lower((city)::text) = 'los angeles'::text)
-[ RECORD 8 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |                     ->  Bitmap Index Scan on idx_postal_codes_lower_region  (cost=0.00..5.66 rows=187 width=0) (actual time=0.461..0.461 rows=2356 loops=1)
-[ RECORD 9 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |                           Index Cond: (lower((region)::text) = 'california'::text)
-[ RECORD 10 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |         ->  Bitmap Heap Scan on businesses  (cost=26.34..3976.91 rows=1015 width=4) (actual time=55.493..1742.407 rows=1372 loops=82)
-[ RECORD 11 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |               Recheck Cond: (postal_code_id = postal_codes.id)
-[ RECORD 12 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |               ->  Bitmap Index Scan on index_businesses_on_postal_code_id  (cost=0.00..26.09 rows=1015 width=0) (actual time=53.141..53.141 rows=1381 loops=82)
-[ RECORD 13 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |                     Index Cond: (postal_code_id = postal_codes.id)
-[ RECORD 14 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN | Total runtime: 143357.260 ms

简化查询的结果要大得多,但鉴于有索引而且我只做一次加入,我很惊讶这需要很长时间

1 个答案:

答案 0 :(得分:2)

尝试在列城市上使用功能索引

CREATE INDEX ON postal_codes((lower(city)))

列城市和地区之间存在很强的依赖关系,因此有时您必须将这些预测分开,以便更好地准确计划程序预测。如果你需要更好的预测,那么你需要将lower_city和lower_region列添加到表postal_codes - PostgreSQL没有索引的统计数据。

将执行计划发送到此处 - 通过http://explain.depesz.com/ - 如果可能的结果EXPLAIN ANALYZE YOUR_QUERY

9.1应该自动将相关子查询转换为半连接,但我不确定。尝试将查询从子查询重写为仅INNER JOIN表单(可能没有帮助,但可能)。

相关问题