Question

在我的项目中，我有一个非常简单的表格，如下所示：

create table entity
(
    id integer default 1,
    session_id varchar not null,
    type integer not null,
    category integer not null,
    created timestamp default now() not null
)
with (autovacuum_enabled=false);


create index created_index
    on entity (created);

我还有一个视图，它选择过去 30 秒的条目的分组结果，如下所示：

create view list(type, category, counter) as
    SELECT 
        type,
        category, 
        count(entity.id) AS counter
    FROM entity
    WHERE entity.created >= (now() - '00:00:30'::interval)
    GROUP BY entity.type, entity.category;

因为表没有更新或删除，我已经将它设置为 unlogged 并禁用了 auto_vaccuum。

该表现在大约有 20mio 条目，SELECT type, category, counter FROM list 的平均选择时间约为 2 秒。

是否有什么我可以优化以加快选择的速度，或者当前速度是否已经达到如此大表所能达到的最大值？

编辑：

这是EXPLAIN的输出：

Subquery Scan on list  (cost=9.37..9.73 rows=18 width=16) (actual time=425.268..425.278 rows=24 loops=1)
"  Output: list.type, list.category, list.counter “
  Buffers: shared hit=169485
  ->  HashAggregate  (cost=9.37..9.55 rows=18 width=16) (actual time=425.267..425.272 rows=24 loops=1)
"        Output: entity.type, entity.category, count(entity.id)
"        Group Key: entity.type, entity.category
        Buffers: shared hit=169485
"        ->  Index Scan using created_index on entity  (cost=0.57..9.13 rows=32 width=12) (actual time=0.050..228.416 rows=165470 loops=1)"
"              Output: entity.id, entity.session_id, entity.type, entity.category, entity.created"
              Index Cond: (entity.created >= (now() - '00:00:30'::interval))
              Buffers: shared hit=169485
Planning Time: 0.204 ms
Execution Time: 425.327 ms

执行时间看起来不错，但这是在系统静止时执行的。通常每秒大约有 1000 次插入到表中。

关于自动真空，这是一次绝望的尝试，看看它是否有任何改进。我应该再次启用它吗？

Answer 1

这是 covering index 的工作。如果您创建一个可以满足整个查询的复合索引，您将有机会以不同的方式执行昂贵的 HashAggregate。

覆盖索引通常最适合一组有限的查询。给你的是这个。

CREATE INDEX entity_cr_ty_ca_id ON entity(created, type, category) INCLUDE (id);

这很有效，因为查询可以......

随机访问索引到第一个符合条件的 created 值。
按顺序扫描索引。这是一个 B-TREE 索引，因此 type 和 category 的值按有用的顺序排列。
在执行 COUNT(*) 之前，从索引中拉取 id 值以检查它是否为空。

如果您知道 id 值永远不会为空，您可以简化此操作。使用 COUNT(*) 代替 COUNT(entity.id)。并将 id 排除在索引之外，而是像这样创建它。

CREATE INDEX entity_cr_ty_ca ON entity(created, type, category);

而且，不得不说：即使你让你的 dbms 快速生成一个大的结果集，它仍然必须传输到请求它的程序并由它解释。没有索引魔法可以加快速度。

如何提高 postgres 查询的选择速度？

1 个答案: