Postgres分区修剪

时间:2012-04-03 16:54:16

标签: postgresql partitioning database-partitioning partition-problem

我在Postgres有一张大桌子。

表名为bigtable,列为:

integer    |timestamp   |xxx |xxx |...|xxx
category_id|capture_time|col1|col2|...|colN

我已经在tables_id的模数10和capture_time列的日期部分上对表进行了分区。

分区表如下所示:

CREATE TABLE myschema.bigtable_d000h0(
    CHECK ( category_id%10=0 AND capture_time >= DATE '2012-01-01' AND capture_time < DATE '2012-01-02')
) INHERITS (myschema.bigtable);

CREATE TABLE myschema.bigtable_d000h1(
    CHECK ( category_id%10=1 AND capture_time >= DATE '2012-01-01' AND capture_time < DATE '2012-01-02')
) INHERITS (myschema.bigtable);

当我在where子句中使用category_id和capture_time运行查询时,不会按预期修剪分区。

explain select * from bigtable where capture_time >= '2012-01-01' and  capture_time < '2012-01-02' and category_id=100;

"Result  (cost=0.00..9476.87 rows=1933 width=216)"
"  ->  Append  (cost=0.00..9476.87 rows=1933 width=216)"
"        ->  Seq Scan on bigtable  (cost=0.00..0.00 rows=1 width=210)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h0 bigtable  (cost=0.00..1921.63 rows=1923 width=216)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h1 bigtable  (cost=0.00..776.93 rows=1 width=218)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h2 bigtable  (cost=0.00..974.47 rows=1 width=216)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h3 bigtable  (cost=0.00..1351.92 rows=1 width=214)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h4 bigtable  (cost=0.00..577.04 rows=1 width=217)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h5 bigtable  (cost=0.00..360.67 rows=1 width=219)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h6 bigtable  (cost=0.00..1778.18 rows=1 width=214)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h7 bigtable  (cost=0.00..315.82 rows=1 width=216)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h8 bigtable  (cost=0.00..372.06 rows=1 width=219)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h9 bigtable  (cost=0.00..1048.16 rows=1 width=215)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"

但是,如果我在where子句中添加精确的模数条件(category_id%10=0),它就可以完美地运行

explain select * from bigtable where capture_time >= '2012-01-01' and  capture_time < '2012-01-02' and category_id=100 and category_id%10=0;

"Result  (cost=0.00..2154.09 rows=11 width=215)"
"  ->  Append  (cost=0.00..2154.09 rows=11 width=215)"
"        ->  Seq Scan on bigtable  (cost=0.00..0.00 rows=1 width=210)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100) AND ((category_id % 10) = 0))"
"        ->  Seq Scan on bigtable_d000h0 bigtable  (cost=0.00..2154.09 rows=10 width=216)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100) AND ((category_id % 10) = 0))"

有没有办法让分区修剪工作正常而不必在每个查询中添加模数条件?

2 个答案:

答案 0 :(得分:4)

事情是:对于排除约束PostgreSQL will create an implicit index。在你的情况下,这个索引将是一个部分索引,“因为你在列上使用了expresion,而不仅仅是它的值。它在documentation中说明(寻找11-2示例):

  

PostgreSQL没有复杂的定理证明器,可以识别以不同形式编写的数学上等效的表达式。 (这样的一般定理证明器不仅难以创建,它可能太慢而无法实际使用。)系统可以识别简单的不等式含义,例如“x <1”意味着“x <2” “; 否则谓词条件必须与查询的WHERE条件的一部分完全匹配,否则索引将不会被识别为可用。匹配发生在查询计划时,而不是在运行时。

因此,您的结果 - 您应该具有与创建CHECK约束时使用的完全相同的表达式。

对于基于HASH的分区,我更喜欢两种方法:

  • 添加一个可以采用有限值集合的字段(在您的情况下为10),如果设计中存在这样的值,则最好;
  • 以指定时间戳范围的方式指定哈希值范围:MINVALUE&lt; = category_id&lt; MAXVALUE

此外,还可以创建2级分区:

  • 在第一级创建基于category_id HASH的10个分区;
  • 在第二级,您可以根据日期范围创建必要数量的分区。

虽然我总是尝试只使用1列进行分区,但更容易管理。

答案 1 :(得分:1)

对于遇到同样问题的人: 我得出结论,最简单的方法是更改​​查询以包含模数条件category_id%10=0