Question

我的桌子每年会增加10M行。

该表有10列，称为c1，c2，c3，...，c10。

我将使用WHERE子句，可能是其中的8个。

更具体一点：每次我查询表时，总会在c10列上有一个WHERE子句（它是一个日期，我可以搜索相等或范围）。

其他7个可搜索的列，不会遵循任何架构。我可以搜索：

c10，c1，c2，c5
c10，c5
c10，c3
c10，c2，c6
c10，c2，c3，c5，c6

......以及所有其他可能的组合。

因此，在WHERE子句中，c10将始终存在，而其他组合可以以任何组合存在（甚至根本不存在）。

在这种情况下，哪种索引策略可以提高性能？我认为正确的做法是为每一列创建一个索引。使用多列索引可以改善性能吗？

据我所知，您将获得（c1，c2，c3）多列索引的性能，仅适用于按顺序使用c1，c2，c3或c1，c2或c1的查询。但就像我说的，我在我的场景中唯一可以假设的是，c10将始终存在于WHERE子句中（如果有帮助，它也可以是第一个子句）

Answer 1

我强烈建议采用以下策略：

在其他列上创建单列索引;
对c10进行分区。由于它是一个日期，您可以按范围进行分区，进行年度或月度分区。

我发现分区会带来巨大的性能提升，特别是在WHERE和大表中总是使用一列或多列的情况下。

Answer 2

多列索引非常通用，比单列索引更通用。 (c1, c2)上的多列索引也适用于(c1)上的索引可以正常工作的查询。

假设您的条件都是相等条件，那么索引中列的顺序无关紧要。对于您描述的条件，以下索引将完全优化所有查询：

(c10, c5, c1, c2)
(c10, c3)
(c10, c2, c6)
(c10, c2, d3, c5, c6)

您是否需要所有这些索引是另一回事。这取决于列的选择性（即，他们选择的表中行的比例）。通过检索值来过滤几十行并不是特别昂贵。因此，如果c10条件只返回少量行，则包含索引中的其他列可能不会显着提高性能。

此外，更多索引意味着插入，更新和删除需要更多时间。这也会影响您的索引策略。

分区（如另一个答案中所述）也很有用。它是否适合您的情况，取决于数据和查询的外观。

Answer 3

为了回答我们应该使用什么样的索引的问题，我们可以创建一个简单的测试。首先，我们创建一个数据库、表和索引。

CREATE DATABASE index_test;

CREATE TABLE single_column(a int, b int, c int);
CREATE TABLE multi_column(a int, b int, c int);

CREATE INDEX single_column_a_idx ON single_column (a);
CREATE INDEX single_column_b_idx ON single_column (b);
CREATE INDEX single_column_c_idx ON single_column (c);

CREATE INDEX multi_column_idx ON multi_column (a, b, c);

用随机数据填充表格。

-- this function will be used for random number generation
CREATE OR REPLACE FUNCTION random_in_range(INTEGER, INTEGER) RETURNS INTEGER AS $$
SELECT floor(($1 + ($2 - $1 + 1) * random()))::INTEGER;
$$ LANGUAGE SQL;

INSERT INTO single_column(a, b, c)
SELECT random_in_range(1, 100),
    random_in_range(1, 100),
    random_in_range(1, 100)
FROM generate_series(1, 1000000);

INSERT INTO multi_column(a, b, c)
SELECT random_in_range(1, 100),
    random_in_range(1, 100),
    random_in_range(1, 100)
FROM generate_series(1, 1000000);

运行测试。

EXPLAIN ANALYZE SELECT * FROM single_column WHERE a < 3;
EXPLAIN ANALYZE SELECT * FROM single_column WHERE b < 3;
EXPLAIN ANALYZE SELECT * FROM single_column WHERE c < 3;

EXPLAIN ANALYZE SELECT * FROM multi_column WHERE a < 3;
EXPLAIN ANALYZE SELECT * FROM multi_column WHERE b < 3;
EXPLAIN ANALYZE SELECT * FROM multi_column WHERE c < 3;

EXPLAIN ANALYZE SELECT * FROM single_column WHERE a < 3 AND b > 10 AND c <= 11;
EXPLAIN ANALYZE SELECT * FROM multi_column WHERE a < 3 AND b > 10 AND c <= 11;

结果

index_test=# EXPLAIN ANALYZE SELECT * FROM single_column WHERE a < 3;
                                                               QUERY PLAN                                               
----------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on single_column  (cost=3925.39..13926.49 rows=367608 width=12) (actual time=5.802..44.904 rows=20070 loops=1)
   Recheck Cond: (a < 3)
   Heap Blocks: exact=5269
   ->  Bitmap Index Scan on single_column_a_idx  (cost=0.00..3833.49 rows=367608 width=0) (actual time=4.018..4.019 rows=20070 loops=1)
         Index Cond: (a < 3)
 Planning Time: 0.325 ms
 Execution Time: 46.589 ms
(7 rows)


index_test=# EXPLAIN ANALYZE SELECT * FROM single_column WHERE b < 3;
                                                               QUERY PLAN                                               
----------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on single_column  (cost=3925.39..13926.49 rows=367608 width=12) (actual time=6.630..26.814 rows=19902 loops=1)
   Recheck Cond: (b < 3)
   Heap Blocks: exact=5296
   ->  Bitmap Index Scan on single_column_b_idx  (cost=0.00..3833.49 rows=367608 width=0) (actual time=4.852..4.853 rows=19902 loops=1)
         Index Cond: (b < 3)
 Planning Time: 0.270 ms
 Execution Time: 28.762 ms
(7 rows)


index_test=# EXPLAIN ANALYZE SELECT * FROM single_column WHERE c < 3;
                                                               QUERY PLAN                                               
----------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on single_column  (cost=3925.39..13926.49 rows=367608 width=12) (actual time=5.896..25.304 rows=19946 loops=1)
   Recheck Cond: (c < 3)
   Heap Blocks: exact=5274
   ->  Bitmap Index Scan on single_column_c_idx  (cost=0.00..3833.49 rows=367608 width=0) (actual time=4.125..4.126 rows=19946 loops=1)
         Index Cond: (c < 3)
 Planning Time: 0.270 ms
 Execution Time: 27.136 ms
(7 rows)


index_test=# EXPLAIN ANALYZE SELECT * FROM multi_column WHERE a < 3;
                                                             QUERY PLAN                                                 
-------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on multi_column  (cost=8569.39..18570.49 rows=367608 width=12) (actual time=7.760..67.173 rows=19938 loops=1)
   Recheck Cond: (a < 3)
   Heap Blocks: exact=5267
   ->  Bitmap Index Scan on multi_column_idx  (cost=0.00..8477.49 rows=367608 width=0) (actual time=6.008..6.008 rows=19938 loops=1)
         Index Cond: (a < 3)
 Planning Time: 0.564 ms
 Execution Time: 68.630 ms
(7 rows)


index_test=# EXPLAIN ANALYZE SELECT * FROM multi_column WHERE b < 3;
                                                           QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..13481.03 rows=18667 width=12) (actual time=1.451..135.028 rows=19897 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on multi_column  (cost=0.00..10614.33 rows=7778 width=12) (actual time=0.038..61.993 rows=6632 loops=3)
         Filter: (b < 3)
         Rows Removed by Filter: 326701
 Planning Time: 1.123 ms
 Execution Time: 136.128 ms
(8 rows)


index_test=# EXPLAIN ANALYZE SELECT * FROM multi_column WHERE c < 3;
                                                           QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..13627.63 rows=20133 width=12) (actual time=0.957..135.119 rows=19860 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on multi_column  (cost=0.00..10614.33 rows=8389 width=12) (actual time=0.035..66.760 rows=6620 loops=3)
         Filter: (c < 3)
         Rows Removed by Filter: 326713
 Planning Time: 0.225 ms
 Execution Time: 136.239 ms
(8 rows)


index_test=# EXPLAIN ANALYZE SELECT * FROM single_column WHERE a < 3 AND b > 10 AND c <= 11;
                                                                   QUERY PLAN                                           
-------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on single_column  (cost=1424.66..5716.83 rows=2110 width=12) (actual time=21.694..26.123 rows=2000 loops=1)
   Recheck Cond: ((a < 3) AND (c <= 11))
   Filter: (b > 10)
   Rows Removed by Filter: 230
   Heap Blocks: exact=1833
   ->  BitmapAnd  (cost=1424.66..1424.66 rows=2338 width=0) (actual time=20.981..20.983 rows=0 loops=1)
         ->  Bitmap Index Scan on single_column_a_idx  (cost=0.00..230.43 rows=21067 width=0) (actual time=3.932..3.932 rows=20070 loops=1)
               Index Cond: (a < 3)
         ->  Bitmap Index Scan on single_column_c_idx  (cost=0.00..1192.92 rows=111000 width=0) (actual time=16.080..16.080 rows=110276 loops=1)
               Index Cond: (c <= 11)
 Planning Time: 1.812 ms
 Execution Time: 26.742 ms
(12 rows)


index_test=# EXPLAIN ANALYZE SELECT * FROM multi_column WHERE a < 3 AND b > 10 AND c <= 11;
                                                                 QUERY PLAN                                             
---------------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using multi_column_idx on multi_column  (cost=0.42..642.38 rows=2071 width=12) (actual time=0.329..2.086 rows=1953 loops=1)
   Index Cond: ((a < 3) AND (b > 10) AND (c <= 11))
   Heap Fetches: 0
 Planning Time: 0.176 ms
 Execution Time: 2.165 ms
(5 rows)

结论

在“single_column”表上，每个在 WHERE 子句中使用单个列的查询都将使用索引。

EXPLAIN ANALYZE SELECT * FROM single_column WHERE a < 3;
EXPLAIN ANALYZE SELECT * FROM single_column WHERE b < 3;
EXPLAIN ANALYZE SELECT * FROM single_column WHERE c < 3;

在“multi_column”表中，仅当使用的列是索引定义中的第一列时才使用索引。在此测试中，只有“a”列使用索引，因为“a”是 CREATE INDEX multi_column_idx ON multi_column (a, b, c); 中的第一列。

EXPLAIN ANALYZE SELECT * FROM multi_column WHERE a < 3;
EXPLAIN ANALYZE SELECT * FROM multi_column WHERE b < 3;
EXPLAIN ANALYZE SELECT * FROM multi_column WHERE c < 3;

使用单列 WHERE 子句的查询使用单列索引会更快。
使用多列 WHERE 子句的查询使用多列索引会更快。

Postgresql：多列索引与单列索引

3 个答案:

结论