我有一个查询,使用一个或两个过滤器运行时,其执行效果与预期相同。 但是当我增加三分之一时,执行计划似乎会改变并会降低性能。
带有2个过滤器的查询
EXPLAIN ANALYZE SELECT
"ed_system"."name",
"ed_system"."geom"::bytea,
(
"ed_system"."geom" <<->> ST_GeomFromEWKB('\x01010000a0e6100000000000000000000000000000000000000000000000000000'::bytea)
)
AS "distance"
FROM
"ed_body"
INNER JOIN
"ed_system"
ON ("ed_body"."system_id" = "ed_system"."id")
WHERE
(
ST_3DDistance("ed_system"."geom", ST_GeomFromEWKB('\x01010000a0e6100000000000000000000000000000000000000000000000000000'::bytea)) < 20.0
AND NOT "ed_body"."ring_count" = 0
)
ORDER BY
"distance" ASC LIMIT 10
输出
Limit (cost=0.85..24526.33 rows=10 width=60) (actual time=17.999..34.674 rows=10 loops=1)
-> Nested Loop (cost=0.85..252614946.65 rows=103001 width=60) (actual time=17.997..34.670 rows=10 loops=1)
-> Index Scan using ed_system_geom_id on ed_system (cost=0.42..41629457.42 rows=6361500 width=64) (actual time=17.973..34.448 rows=12 loops=1)
Order By: (geom <<->> '01010000A0E6100000000000000000000000000000000000000000000000000000'::geometry)
Filter: (st_3ddistance(geom, '01010000A0E6100000000000000000000000000000000000000000000000000000'::geometry) < '20'::double precision)
-> Index Scan using ed_body_system_id_0b3a29a2 on ed_body (cost=0.43..33.15 rows=2 width=4) (actual time=0.010..0.015 rows=1 loops=12)
Index Cond: (system_id = ed_system.id)
Filter: (ring_count <> 0)
Rows Removed by Filter: 9
Planning time: 0.557 ms
Execution time: 34.729 ms
带有额外过滤器的查询
EXPLAIN ANALYZE SELECT
"ed_system"."name",
"ed_system"."geom"::bytea,
(
"ed_system"."geom" <<->> ST_GeomFromEWKB('\x01010000a0e6100000000000000000000000000000000000000000000000000000'::bytea)
)
AS "distance"
FROM
"ed_body"
INNER JOIN
"ed_system"
ON ("ed_body"."system_id" = "ed_system"."id")
WHERE
(
ST_3DDistance("ed_system"."geom", ST_GeomFromEWKB('\x01010000a0e6100000000000000000000000000000000000000000000000000000'::bytea)) < 20.0
AND NOT "ed_body"."ring_count" = 0
AND "ed_body"."scoopable" = True
)
ORDER BY
"distance" ASC LIMIT 10
输出:
Limit (cost=0.85..84471.16 rows=10 width=60) (actual time=238721.835..238721.835 rows=0 loops=1)
-> Nested Loop (cost=0.85..252549321.31 rows=29898 width=60) (actual time=238721.832..238721.832 rows=0 loops=1)
-> Index Scan using ed_system_geom_id on ed_system (cost=0.42..41629457.42 rows=6361500 width=64) (actual time=20.292..238664.727 rows=107 loops=1)
Order By: (geom <<->> '01010000A0E6100000000000000000000000000000000000000000000000000000'::geometry)
Filter: (st_3ddistance(geom, '01010000A0E6100000000000000000000000000000000000000000000000000000'::geometry) < '20'::double precision)
Rows Removed by Filter: 19118611
-> Index Scan using ed_body_system_id_0b3a29a2 on ed_body (cost=0.43..33.15 rows=1 width=4) (actual time=0.532..0.532 rows=0 loops=107)
Index Cond: (system_id = ed_system.id)
Filter: (scoopable AND (ring_count <> 0))
Rows Removed by Filter: 12
Planning time: 0.952 ms
Execution time: 238733.393 ms
几乎相同的计划,除了第二个计划在ed_system索引扫描上有大量已过滤的行并且是sssllooowww。
这使我认为由于第二个计划中的某些原因,计划者认为最好先遍历ed_body表并对其进行过滤,然后再对距离联接进行过滤。 我运行了VACUUM,添加了额外的索引,并尝试了一个子查询,但无济于事。
有什么建议吗?
编辑:
索引:
CREATE UNIQUE INDEX ed_body_pkey ON ed_body(id int4_ops);
CREATE INDEX ed_body_system_id_0b3a29a2 ON ed_body(system_id int4_ops);
CREATE INDEX ed_body_test4 ON ed_body(ring_count int4_ops,scoopable bool_ops);
CREATE UNIQUE INDEX ed_system_pkey ON ed_system(id int4_ops);
CREATE INDEX ed_system_geom_id ON ed_system USING GIST (geom gist_geometry_ops_nd);
CREATE INDEX ed_system_geom_idx ON ed_system(geom btree_geometry_ops);
编辑:
强制对ed_system表进行索引扫描。但是很慢。
Index Only Scan using ed_system_geom_idx on ed_system (cost=0.56..5581365.22 rows=358 width=4) (actual time=7.542..53948.185 rows=12 loops=1)
Filter: ((geom && '01030000A0E6100000010000000500000000000000000024C000000000000024C000000000000024C000000000000024C0000000000000244000000000000024C0000000000000244000000000000024400000000000002440000000000000244000000000000024C0000000000000244000000000000024C000000000000024C000000000000024C0'::geometry) AND ('01010000A0E6100000000000000000000000000000000000000000000000000000'::geometry && st_expand(geom, '10'::double precision)) AND _st_3ddwithin(geom, '01010000A0E6100000000000000000000000000000000000000000000000000000'::geometry, '10'::double precision))
Rows Removed by Filter: 19118706
Heap Fetches: 0
Planning time: 0.709 ms
Execution time: 53948.245 ms
答案 0 :(得分:2)
索引扫描中的“排序依据:”在这里看起来很吸引人,但可能没有吸引力。您可以通过将order by子句更改为 df['phoneNumber'] = df['phoneNumber'].str.replace({'(':'',')':''}), regex=True)
来强制它不使用它。如果这样做不能解决问题,请发布说明(分析,缓冲区),因为它至少会为我们提供更多有用的信息。
您说您猜测大约有20%的行满足bulmaCarousel.attach('#carousel-demo', {
slidesToScroll: 1,
slidesToShow: 4
});
(为什么要猜测?只需运行ORDER BY "distance" + 0.0 ASC LIMIT 10
查询并确定知道)。但是实际的行数已从某个未知值减少(因为一旦达到LIMIT就停止了),一直下降到零。因此,要么它们比这稀少得多,要么存在相关性,这使得它们在符合其他条件的行中特别稀有。