Question

我在postgresql中创建了一个包含复合主键（3列）的表。如果在查询中使用不包含前导列的子集，则不使用默认索引。如果我们明确地创建索引则不是这种情况（索引将用于任何子集）。

默认情况下，postgres将在主键上创建索引。但正如postgres document所说

A multicolumn B-tree index can be used with query conditions that involve any subset of the index's columns, but the index is most efficient when there are constraints on the leading (leftmost) columns.

如果查询不包含前导列，那么也将使用索引（如果我们显式创建索引），但是当我们尝试使用默认主键索引的子集时，索引不会被使用。

以下是不适用于子集的架构和查询。

# \d client_data
              Table "public.client_data"
       Column       |         Type          | Modifiers 
--------------------+-----------------------+-----------
 macaddr            | character varying(64) | not null
 ts                 | bigint                | not null
 interval           | smallint              | not null
 snr                | smallint              | not null
 rx_rate            | bigint                | 
 tx_rate            | bigint                | 
 rx_data            | bigint                | 
 tx_data            | bigint                | 

Indexes:
    "client_data_pkey" PRIMARY KEY, btree (macaddr, ts, interval)

如果我们指定所有主键列，则查询规划器将使用索引

# explain analyze select count(*) from client_data where macaddr='a:b:c' and ts=346783556 and interval=5;
                                                              QUERY PLAN                                                              
--------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=8.60..8.61 rows=1 width=0) (actual time=0.040..0.041 rows=1 loops=1)
   ->  Index Scan using client_data_pkey on client_data  (cost=0.00..8.59 rows=1 width=0) (actual time=0.037..0.037 rows=0 loops=1)
         Index Cond: (((macaddr)::text = 'a:b:c'::text) AND (ts = 346783556) AND ("interval" = 5))
 Total runtime: 0.096 ms
(4 rows)

但是如果我们指定子集，查询规划器将不使用索引

# explain analyze select count(*) from client_data where ts=346783556;
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=16176.01..16176.02 rows=1 width=0) (actual time=78.937..78.938 rows=1 loops=1)
   ->  Seq Scan on client_data  (cost=0.00..16175.92 rows=36 width=0) (actual time=78.932..78.932 rows=0 loops=1)
         Filter: (ts = 346783556)
 Total runtime: 78.975 ms
(4 rows)


# explain analyze select count(*) from client_data where ts=346783556 and interval=5;
                                                    QUERY PLAN                                                    
------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=17639.11..17639.12 rows=1 width=0) (actual time=78.815..78.815 rows=1 loops=1)
   ->  Seq Scan on client_data  (cost=0.00..17639.11 rows=1 width=0) (actual time=78.810..78.810 rows=0 loops=1)
         Filter: ((ts = 346783556) AND ("interval" = 5))
 Total runtime: 78.853 ms
(4 rows)

但是如果我们使用带有ts或interval的前导列（macaddr），将使用索引。

# explain analyze select count(*) from client_data where macaddr='a' and ts=346783556;
                                                              QUERY PLAN                                                              
--------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=8.59..8.60 rows=1 width=0) (actual time=0.055..0.056 rows=1 loops=1)
   ->  Index Scan using client_data_pkey on client_data  (cost=0.00..8.59 rows=1 width=0) (actual time=0.051..0.051 rows=0 loops=1)
         Index Cond: (((macaddr)::text = 'a'::text) AND (ts = 346783556))
 Total runtime: 0.103 ms
(4 rows)


# explain analyze select count(*) from client_data where macaddr='a' and interval=56;
                                                              QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=56.15..56.16 rows=1 width=0) (actual time=0.021..0.022 rows=1 loops=1)
   ->  Index Scan using client_data_pkey on client_data  (cost=0.00..56.15 rows=1 width=0) (actual time=0.017..0.017 rows=0 loops=1)
         Index Cond: (((macaddr)::text = 'a'::text) AND ("interval" = 56))
 Total runtime: 0.055 ms
(4 rows)

Answer 1

您应该在引用之后阅读其余文本。

PostgreSQL只能有效地将b树索引用于包含最左列的搜索。您可以使用(a,b)上的索引查找搜索a的查询或查找同时查找a和b的查询，但不查询只搜索b。这是因为多列b树索引的结构方式 - 无论如何都必须扫描大部分索引，因此PostgreSQL通常更有效地进行全表扫描。

如果您需要将它们作为离散列处理，并且需要在b上进行大量搜索/快速搜索，请在b上创建单独的索引。

你可能会发现，如果你SET enable_seqscan = off（仅用于测试目的），PostgreSQL会将你的索引用于非最左边的列，但它可能比seqscan慢。如果不是，您需要查看您的random_page_cost和seq_page_cost设置是否符合现实。

主键索引未被使用

1 个答案: