Postgres查询不使用索引

时间:2017-06-08 13:02:26

标签: postgresql

我面临着postgres查询的非常奇怪的行为。我有一个表“p_MyTable”,它是另一个表的分区。 “p_MyTable”有大约6亿条记录并且有索引,

CREATE INDEX idx_p_MyTable_id ON MySchema.p_MyTable USING btree(IndColumn);

当我执行以下查询时,它会立即运行并且非常快速地提供结果。这是查询和解释计划。

    explain select max(IndColumn) from MySchema.p_MyTable ms where ent_attr_97='ABC' and ent_attr_96='EFG' and ent_attr_98='HIJ'
and ent_date_2::date <= '2017-06-01 00:00:00'::date

"Result  (cost=4.85..4.86 rows=1 width=0)"
"  InitPlan 1 (returns $0)"
"    ->  Limit  (cost=0.57..4.85 rows=1 width=8)"
"          ->  Index Scan Backward using idx_p_MyTable_id on p_MyTable ms  (cost=0.57..727648341.49 rows=169996075 width=8)"
"                Index Cond: (IndColumn IS NOT NULL)"
"                Filter: ((ent_attr_97 = 'ABC'::text) AND (ent_attr_96 = 'EFG'::text) AND (ent_attr_98 = 'HIJ'::text) AND ((ent_date_2)::date <= '2017-06-01'::date))"

它也适用于聚合函数“min”以及相同的方式。但是当我尝试其他功能时,解释计划会发生变化,查询不会在很短的时间内执行。

    explain select count(IndColumn) from MySchema.p_MyTable ms where ent_attr_97='ABC' and ent_attr_96='EFG' and ent_attr_98='HIJ'
and ent_date_2::date <= '2017-06-01 00:00:00'::date

"Aggregate  (cost=53319339.50..53319339.51 rows=1 width=8)"
"  ->  Bitmap Heap Scan on p_MyTable ms  (cost=11209851.27..52894349.31 rows=169996075 width=8)"
"        Recheck Cond: (ent_attr_96 = 'EFG'::text)"
"        Filter: ((ent_attr_97 = 'ABC'::text) AND (ent_attr_98 = 'HIJ'::text) AND ((ent_date_2)::date <= '2017-06-01'::date))"
"        ->  Bitmap Index Scan on p_MyTable_comp  (cost=0.00..11167352.25 rows=509988224 width=0)"
"              Index Cond: (ent_attr_96 = 'EFG'::text)"

explain select distinct (IndColumn) from MySchema.p_MyTable ms where ent_attr_97='ABC' and ent_attr_96='EFG' and ent_attr_98='HIJ'
and ent_date_2::date <= '2017-06-01 00:00:00'::date

"HashAggregate  (cost=53319339.50..53319339.71 rows=21 width=8)"
"  Group Key: IndColumn"
"  ->  Bitmap Heap Scan on p_MyTable ms  (cost=11209851.27..52894349.31 rows=169996075 width=8)"
"        Recheck Cond: (ent_attr_96 = 'EFG'::text)"
"        Filter: ((ent_attr_97 = 'ABC'::text) AND (ent_attr_98 = 'HIJ'::text) AND ((ent_date_2)::date <= '2017-06-01'::date))"
"        ->  Bitmap Index Scan on p_MyTable_comp  (cost=0.00..11167352.25 rows=509988224 width=0)"
"              Index Cond: (ent_attr_96 = 'EFG'::text)"

explain select avg (IndColumn) from MySchema.p_MyTable ms where ent_attr_97='ABC' and ent_attr_96='EFG' and ent_attr_98='HIJ'
and ent_date_2::date <= '2017-06-01 00:00:00'::date

"Aggregate  (cost=53319339.50..53319339.51 rows=1 width=8)"
"  ->  Bitmap Heap Scan on p_MyTable ms  (cost=11209851.27..52894349.31 rows=169996075 width=8)"
"        Recheck Cond: (ent_attr_96 = 'EFG'::text)"
"        Filter: ((ent_attr_97 = 'ABC'::text) AND (ent_attr_98 = 'HIJ'::text) AND ((ent_date_2)::date <= '2017-06-01'::date))"
"        ->  Bitmap Index Scan on p_MyTable_comp  (cost=0.00..11167352.25 rows=509988224 width=0)"
"              Index Cond: (ent_attr_96 = 'EFG'::text)"

请解释一下,为什么解释会完全改变,因为我认为如果max / min正确使用索引,那么它也适用于其他功能。

提前致谢。

1 个答案:

答案 0 :(得分:0)

索引(大部分)用于&#34;其中&#34;条件,或帮助排序(&#34;顺序#34;)。 (好的索引可用于许多其他事情,但对于这种情况,它将有助于将其限制为这些情况)

&#34; max(IndColumn)&#34;是误导性的,因为查询被重写为:

select IndColumn
from MySchema.p_MyTable ms 
where ent_attr_97='ABC' and ent_attr_96='EFG' and ent_attr_98='HIJ'
and ent_date_2::date <= '2017-06-01 00:00:00'::date
AND IndColumn is not null 
ORDER by IndColumn desc 
LIMIT 1 

因此索引用于&#34; AND IndColumn不为null ORDER by IndColumn desc&#34;。

对于您的其他查询,您需要索引&#34;其中&#34;中的列, 像

这样的东西
CREATE INDEX idx_p_foo ON MySchema.p_MyTable USING btree(ent_attr_96, ent_attr_97, ent_attr_98);

可能有所帮助。 假设您有许多使用这3列的查询。

您还可以添加日期。虽然我不确定这会有多大帮助,因为它只是一个结束日期,没有开始。 所以不确定查询器是否会使用索引中的日期。如果确实如此,问题是按日期过滤了多少行,是否值得增加索引大小?

相关问题