Postgres解释计划对于具有不同值的相同查询是不同的

时间:2017-10-30 09:14:45

标签: postgresql heroku

我在heroku上运行Postgres 9.56的数据库。 我正在使用不同的参数值运行以下SQL,但是在性能上会产生非常不同的结果。

查询1

SELECT COUNT(s), DATE_TRUNC('MONTH', t.departure)
 FROM tk_seat s
 LEFT JOIN tk_trip t ON t.trip_id = s.trip_id
 WHERE DATE_PART('year', t.departure)= '2017'
 AND t.trip_status = 'BOOKABLE'
 AND t.route_id = '278'
 AND s.seat_status_type != 'NONE'
 AND s.operator_id = '15'
 GROUP BY DATE_TRUNC('MONTH', t.departure)
 ORDER BY DATE_TRUNC('MONTH', t.departure)

查询2

SELECT COUNT(s), DATE_TRUNC('MONTH', t.departure)
 FROM tk_seat s
 LEFT JOIN tk_trip t ON t.trip_id = s.trip_id
 WHERE DATE_PART('year', t.departure)= '2017'
 AND t.trip_status = 'BOOKABLE'
 AND t.route_id = '150'
 AND s.seat_status_type != 'NONE'
 AND s.operator_id = '15'
 GROUP BY DATE_TRUNC('MONTH', t.departure)
 ORDER BY DATE_TRUNC('MONTH', t.departure)

只有差异才是t.route_id值。

所以,我尝试运行 explain 并给我非常不同的结果。

查询1

"GroupAggregate  (cost=279335.17..279335.19 rows=1 width=298)"
"  Group Key: (date_trunc('MONTH'::text, t.departure))"
"  ->  Sort  (cost=279335.17..279335.17 rows=1 width=298)"
"        Sort Key: (date_trunc('MONTH'::text, t.departure))"
"        ->  Nested Loop  (cost=0.00..279335.16 rows=1 width=298)"
"              Join Filter: (s.trip_id = t.trip_id)"
"              ->  Seq Scan on tk_trip t  (cost=0.00..5951.88 rows=1 width=12)"
"                    Filter: (((trip_status)::text = 'BOOKABLE'::text) AND (route_id = '278'::bigint) AND (date_part('year'::text, departure) = '2017'::double precision))"
"              ->  Seq Scan on tk_seat s  (cost=0.00..271738.35 rows=131594 width=298)"
"                    Filter: (((seat_status_type)::text <> 'NONE'::text) AND (operator_id = '15'::bigint))"

对于查询2

"Sort  (cost=278183.94..278183.95 rows=1 width=298)"
"  Sort Key: (date_trunc('MONTH'::text, t.departure))"
"  ->  HashAggregate  (cost=278183.92..278183.93 rows=1 width=298)"
"        Group Key: date_trunc('MONTH'::text, t.departure)"
"        ->  Hash Join  (cost=5951.97..278183.88 rows=7 width=298)"
"              Hash Cond: (s.trip_id = t.trip_id)"
"              ->  Seq Scan on tk_seat s  (cost=0.00..271738.35 rows=131594 width=298)"
"                    Filter: (((seat_status_type)::text <> 'NONE'::text) AND (operator_id = '15'::bigint))"
"              ->  Hash  (cost=5951.88..5951.88 rows=7 width=12)"
"                    ->  Seq Scan on tk_trip t  (cost=0.00..5951.88 rows=7 width=12)"
"                          Filter: (((trip_status)::text = 'BOOKABLE'::text) AND (route_id = '150'::bigint) AND (date_part('year'::text, departure) = '2017'::double precision))"

我的问题是为什么以及如何使它变得相同?因为第一个查询给我的表现非常糟糕

查询1分析

"GroupAggregate  (cost=274051.28..274051.31 rows=1 width=8) (actual time=904682.606..904684.283 rows=7 loops=1)"
"  Group Key: (date_trunc('MONTH'::text, t.departure))"
"  ->  Sort  (cost=274051.28..274051.29 rows=1 width=8) (actual time=904682.432..904682.917 rows=13520 loops=1)"
"        Sort Key: (date_trunc('MONTH'::text, t.departure))"
"        Sort Method: quicksort  Memory: 1018kB"
"        ->  Nested Loop  (cost=0.42..274051.27 rows=1 width=8) (actual time=1133.925..904676.254 rows=13520 loops=1)"
"              Join Filter: (s.trip_id = t.trip_id)"
"              Rows Removed by Join Filter: 42505528"
"              ->  Index Scan using tk_trip_route_id_idx on tk_trip t  (cost=0.42..651.34 rows=1 width=12) (actual time=0.020..2.720 rows=338 loops=1)"
"                    Index Cond: (route_id = '278'::bigint)"
"                    Filter: (((trip_status)::text = 'BOOKABLE'::text) AND (date_part('year'::text, departure) = '2017'::double precision))"
"                    Rows Removed by Filter: 28"
"              ->  Seq Scan on tk_seat s  (cost=0.00..271715.83 rows=134728 width=8) (actual time=0.071..2662.102 rows=125796 loops=338)"
"                    Filter: (((seat_status_type)::text <> 'NONE'::text) AND (operator_id = '15'::bigint))"
"                    Rows Removed by Filter: 6782294"
"Planning time: 1.172 ms"
"Execution time: 904684.570 ms"

查询2分析

"Sort  (cost=275018.88..275018.89 rows=1 width=8) (actual time=2153.843..2153.843 rows=9 loops=1)"
"  Sort Key: (date_trunc('MONTH'::text, t.departure))"
"  Sort Method: quicksort  Memory: 25kB"
"  ->  HashAggregate  (cost=275018.86..275018.87 rows=1 width=8) (actual time=2153.833..2153.834 rows=9 loops=1)"
"        Group Key: date_trunc('MONTH'::text, t.departure)"
"        ->  Hash Join  (cost=2797.67..275018.82 rows=7 width=8) (actual time=2.472..2147.093 rows=36565 loops=1)"
"              Hash Cond: (s.trip_id = t.trip_id)"
"              ->  Seq Scan on tk_seat s  (cost=0.00..271715.83 rows=134728 width=8) (actual time=0.127..2116.153 rows=125796 loops=1)"
"                    Filter: (((seat_status_type)::text <> 'NONE'::text) AND (operator_id = '15'::bigint))"
"                    Rows Removed by Filter: 6782294"
"              ->  Hash  (cost=2797.58..2797.58 rows=7 width=12) (actual time=1.853..1.853 rows=1430 loops=1)"
"                    Buckets: 2048 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 78kB"
"                    ->  Bitmap Heap Scan on tk_trip t  (cost=32.21..2797.58 rows=7 width=12) (actual time=0.176..1.559 rows=1430 loops=1)"
"                          Recheck Cond: (route_id = '150'::bigint)"
"                          Filter: (((trip_status)::text = 'BOOKABLE'::text) AND (date_part('year'::text, departure) = '2017'::double precision))"
"                          Rows Removed by Filter: 33"
"                          Heap Blocks: exact=333"
"                          ->  Bitmap Index Scan on tk_trip_route_id_idx  (cost=0.00..32.21 rows=1572 width=0) (actual time=0.131..0.131 rows=1463 loops=1)"
"                                Index Cond: (route_id = '150'::bigint)"
"Planning time: 0.211 ms"
"Execution time: 2153.972 ms"

1 个答案:

答案 0 :(得分:2)

如果您暗示postgres不使用嵌套循环,您可以 - 可能 - 使它们相同:

SET enable_nestloop = 'off';

您可以通过将其设置为服务器,角色,内部函数定义或服务器配置来使其永久化:

ALTER DATABASE postgres
  SET enable_nestloop = 'off';
ALTER ROLE lkaminski
  SET enable_nestloop = 'off';

CREATE FUNCTION add(integer, integer) RETURNS integer
    AS 'select $1 + $2;'
    LANGUAGE SQL
    SET enable_nestloop = 'off'
    IMMUTABLE
    RETURNS NULL ON NULL INPUT;

至于为什么 - 你改变搜索条件和计划者估计从tk_trip他将得到1行而不是7,所以它改变了计划,因为看起来嵌套循环会更好。有时这是错误的,你可能会得到更慢的执行时间。但是如果你“强迫”它不使用嵌套循环,那么对于不同的参数,使用第二个计划而不是第一个计划(使用嵌套循环)可能会更慢。

您可以通过增加每列收集的统计信息来使计划程序估算更准确。 可能帮助。

ALTER TABLE tk_trip ALTER COLUMN route_id SET STATISTICS 1000;

作为旁注 - 您的LEFT JOIN实际上是INNER JOIN,因为您已将该表的条件放在WHERE而不是ON中。如果将它们移动到ON,你应该得到不同的计划(和结果) - 假设你想要LEFT JOIN。

相关问题