Postgresql:多列索引与单列索引

时间:2018-04-11 13:48:04

标签: sql postgresql indexing

我的桌子每年会增加10M行。

该表有10列,称为c1,c2,c3,...,c10。

我将使用WHERE子句,可能是其中的8个。

更具体一点:每次我查询表时,总会在c10列上有一个WHERE子句(它是一个日期,我可以搜索相等或范围)。

其他7个可搜索的列,不会遵循任何架构。 我可以搜索:

  • c10,c1,c2,c5
  • c10,c5
  • c10,c3
  • c10,c2,c6
  • c10,c2,c3,c5,c6

......以及所有其他可能的组合。

因此,在WHERE子句中,c10将始终存在,而其他组合可以以任何组合存在(甚至根本不存在)。

在这种情况下,哪种索引策略可以提高性能? 我认为正确的做法是为每一列创建一个索引。使用多列索引可以改善性能吗?

据我所知,您将获得(c1,c2,c3)多列索引的性能,仅适用于按顺序使用c1,c2,c3或c1,c2或c1的查询。但就像我说的,我在我的场景中唯一可以假设的是,c10将始终存在于WHERE子句中(如果有帮助,它也可以是第一个子句)

3 个答案:

答案 0 :(得分:0)

我强烈建议采用以下策略:

  • 在其他列上创建单列索引;
  • c10进行分区。由于它是一个日期,您可以按范围进行分区,进行年度或月度分区。

我发现分区会带来巨大的性能提升,特别是在WHERE和大表中总是使用一列或多列的情况下。

答案 1 :(得分:0)

多列索引非常通用,比单列索引更通用。 (c1, c2)上的多列索引也适用于(c1)上的索引可以正常工作的查询。

假设您的条件都是相等条件,那么索引中列的顺序无关紧要。对于您描述的条件,以下索引将完全优化所有查询:

  • (c10, c5, c1, c2)
  • (c10, c3)
  • (c10, c2, c6)
  • (c10, c2, d3, c5, c6)

您是否需要所有这些索引是另一回事。这取决于列的选择性(即,他们选择的表中行的比例)。通过检索值来过滤几十行并不是特别昂贵。因此,如果c10条件只返回少量行,则包含索引中的其他列可能不会显着提高性能。

此外,更多索引意味着插入,更新和删除需要更多时间。这也会影响您的索引策略。

分区(如另一个答案中所述)也很有用。它是否适合您的情况,取决于数据和查询的外观。

答案 2 :(得分:0)

为了回答我们应该使用什么样的索引的问题,我们可以创建一个简单的测试。首先,我们创建一个数据库、表和索引。

CREATE DATABASE index_test;

CREATE TABLE single_column(a int, b int, c int);
CREATE TABLE multi_column(a int, b int, c int);

CREATE INDEX single_column_a_idx ON single_column (a);
CREATE INDEX single_column_b_idx ON single_column (b);
CREATE INDEX single_column_c_idx ON single_column (c);

CREATE INDEX multi_column_idx ON multi_column (a, b, c);

用随机数据填充表格。

-- this function will be used for random number generation
CREATE OR REPLACE FUNCTION random_in_range(INTEGER, INTEGER) RETURNS INTEGER AS $$
SELECT floor(($1 + ($2 - $1 + 1) * random()))::INTEGER;
$$ LANGUAGE SQL;

INSERT INTO single_column(a, b, c)
SELECT random_in_range(1, 100),
    random_in_range(1, 100),
    random_in_range(1, 100)
FROM generate_series(1, 1000000);

INSERT INTO multi_column(a, b, c)
SELECT random_in_range(1, 100),
    random_in_range(1, 100),
    random_in_range(1, 100)
FROM generate_series(1, 1000000);

运行测试。

EXPLAIN ANALYZE SELECT * FROM single_column WHERE a < 3;
EXPLAIN ANALYZE SELECT * FROM single_column WHERE b < 3;
EXPLAIN ANALYZE SELECT * FROM single_column WHERE c < 3;

EXPLAIN ANALYZE SELECT * FROM multi_column WHERE a < 3;
EXPLAIN ANALYZE SELECT * FROM multi_column WHERE b < 3;
EXPLAIN ANALYZE SELECT * FROM multi_column WHERE c < 3;

EXPLAIN ANALYZE SELECT * FROM single_column WHERE a < 3 AND b > 10 AND c <= 11;
EXPLAIN ANALYZE SELECT * FROM multi_column WHERE a < 3 AND b > 10 AND c <= 11;

结果

index_test=# EXPLAIN ANALYZE SELECT * FROM single_column WHERE a < 3;
                                                               QUERY PLAN                                               
----------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on single_column  (cost=3925.39..13926.49 rows=367608 width=12) (actual time=5.802..44.904 rows=20070 loops=1)
   Recheck Cond: (a < 3)
   Heap Blocks: exact=5269
   ->  Bitmap Index Scan on single_column_a_idx  (cost=0.00..3833.49 rows=367608 width=0) (actual time=4.018..4.019 rows=20070 loops=1)
         Index Cond: (a < 3)
 Planning Time: 0.325 ms
 Execution Time: 46.589 ms
(7 rows)


index_test=# EXPLAIN ANALYZE SELECT * FROM single_column WHERE b < 3;
                                                               QUERY PLAN                                               
----------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on single_column  (cost=3925.39..13926.49 rows=367608 width=12) (actual time=6.630..26.814 rows=19902 loops=1)
   Recheck Cond: (b < 3)
   Heap Blocks: exact=5296
   ->  Bitmap Index Scan on single_column_b_idx  (cost=0.00..3833.49 rows=367608 width=0) (actual time=4.852..4.853 rows=19902 loops=1)
         Index Cond: (b < 3)
 Planning Time: 0.270 ms
 Execution Time: 28.762 ms
(7 rows)


index_test=# EXPLAIN ANALYZE SELECT * FROM single_column WHERE c < 3;
                                                               QUERY PLAN                                               
----------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on single_column  (cost=3925.39..13926.49 rows=367608 width=12) (actual time=5.896..25.304 rows=19946 loops=1)
   Recheck Cond: (c < 3)
   Heap Blocks: exact=5274
   ->  Bitmap Index Scan on single_column_c_idx  (cost=0.00..3833.49 rows=367608 width=0) (actual time=4.125..4.126 rows=19946 loops=1)
         Index Cond: (c < 3)
 Planning Time: 0.270 ms
 Execution Time: 27.136 ms
(7 rows)


index_test=# EXPLAIN ANALYZE SELECT * FROM multi_column WHERE a < 3;
                                                             QUERY PLAN                                                 
-------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on multi_column  (cost=8569.39..18570.49 rows=367608 width=12) (actual time=7.760..67.173 rows=19938 loops=1)
   Recheck Cond: (a < 3)
   Heap Blocks: exact=5267
   ->  Bitmap Index Scan on multi_column_idx  (cost=0.00..8477.49 rows=367608 width=0) (actual time=6.008..6.008 rows=19938 loops=1)
         Index Cond: (a < 3)
 Planning Time: 0.564 ms
 Execution Time: 68.630 ms
(7 rows)


index_test=# EXPLAIN ANALYZE SELECT * FROM multi_column WHERE b < 3;
                                                           QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..13481.03 rows=18667 width=12) (actual time=1.451..135.028 rows=19897 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on multi_column  (cost=0.00..10614.33 rows=7778 width=12) (actual time=0.038..61.993 rows=6632 loops=3)
         Filter: (b < 3)
         Rows Removed by Filter: 326701
 Planning Time: 1.123 ms
 Execution Time: 136.128 ms
(8 rows)


index_test=# EXPLAIN ANALYZE SELECT * FROM multi_column WHERE c < 3;
                                                           QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..13627.63 rows=20133 width=12) (actual time=0.957..135.119 rows=19860 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on multi_column  (cost=0.00..10614.33 rows=8389 width=12) (actual time=0.035..66.760 rows=6620 loops=3)
         Filter: (c < 3)
         Rows Removed by Filter: 326713
 Planning Time: 0.225 ms
 Execution Time: 136.239 ms
(8 rows)


index_test=# EXPLAIN ANALYZE SELECT * FROM single_column WHERE a < 3 AND b > 10 AND c <= 11;
                                                                   QUERY PLAN                                           
-------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on single_column  (cost=1424.66..5716.83 rows=2110 width=12) (actual time=21.694..26.123 rows=2000 loops=1)
   Recheck Cond: ((a < 3) AND (c <= 11))
   Filter: (b > 10)
   Rows Removed by Filter: 230
   Heap Blocks: exact=1833
   ->  BitmapAnd  (cost=1424.66..1424.66 rows=2338 width=0) (actual time=20.981..20.983 rows=0 loops=1)
         ->  Bitmap Index Scan on single_column_a_idx  (cost=0.00..230.43 rows=21067 width=0) (actual time=3.932..3.932 rows=20070 loops=1)
               Index Cond: (a < 3)
         ->  Bitmap Index Scan on single_column_c_idx  (cost=0.00..1192.92 rows=111000 width=0) (actual time=16.080..16.080 rows=110276 loops=1)
               Index Cond: (c <= 11)
 Planning Time: 1.812 ms
 Execution Time: 26.742 ms
(12 rows)


index_test=# EXPLAIN ANALYZE SELECT * FROM multi_column WHERE a < 3 AND b > 10 AND c <= 11;
                                                                 QUERY PLAN                                             
---------------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using multi_column_idx on multi_column  (cost=0.42..642.38 rows=2071 width=12) (actual time=0.329..2.086 rows=1953 loops=1)
   Index Cond: ((a < 3) AND (b > 10) AND (c <= 11))
   Heap Fetches: 0
 Planning Time: 0.176 ms
 Execution Time: 2.165 ms
(5 rows)

结论

  • 在“single_column”表上,每个在 WHERE 子句中使用单个列的查询都将使用索引。
EXPLAIN ANALYZE SELECT * FROM single_column WHERE a < 3;
EXPLAIN ANALYZE SELECT * FROM single_column WHERE b < 3;
EXPLAIN ANALYZE SELECT * FROM single_column WHERE c < 3;
  • 在“multi_column”表中,仅当使用的列是索引定义中的第一列时才使用索引。在此测试中,只有“a”列使用索引,因为“a”是 CREATE INDEX multi_column_idx ON multi_column (a, b, c); 中的第一列。
EXPLAIN ANALYZE SELECT * FROM multi_column WHERE a < 3;
EXPLAIN ANALYZE SELECT * FROM multi_column WHERE b < 3;
EXPLAIN ANALYZE SELECT * FROM multi_column WHERE c < 3;
  • 使用单列 WHERE 子句的查询使用单列索引会更快。
  • 使用多列 WHERE 子句的查询使用多列索引会更快。