优化SQL查询

时间:2012-10-13 17:14:21

标签: sql postgresql optimization query-optimization

使用PostgreSQL 8.4并使用如下表格:

create table log (
    id bigint primary key,
    first_sn bigint not null,
    last_sn bigint not null
);

其中first_sn和last_sn表示一系列序列号,表格保持> 100万行,如果我想搜索序列号范围包含序列号列表中元素的所有行,我应该使用哪种索引和查询。

例如,对于我目前正在进行的列表[5348491,1230505,5882233]:

select 5348491, *
from log
where 5348491 between first_sn and last_sn
union
select 1230505, *
from log
where 1230505 between first_sn and last_sn
union
select 5882233, *
from log
where 5882233 between first_sn and last_sn;

但这有点慢。

编辑:这样的查询大约需要600毫秒,我希望能够使用> 10k序列号列表进行搜索。

由于有人要求它,这里有真正的表格,查询和解释分析(我犹豫了,因为所有的列名都是西班牙语,但在前面的例子中,id'将是& #39; movimiento_id'在这里,' first_sn'将是' serial_inicial',' last_sn'将是' serial_final'。&# 39; tipo_movimiento'是事件的类型,实际上它只是一种过滤结果集的方法):

    CREATE TABLE movimiento
(
  movimiento_id bigserial NOT NULL,
  serial_inicial bigint NOT NULL,
  serial_final bigint NOT NULL,
  serial_chip bigint,
  numero_telefono text,
  fecha_movimiento timestamp without time zone DEFAULT now(),
  producto_id integer NOT NULL,
  usuario_id integer NOT NULL,
  factura_proveedor text,
  fecha_ingreso date,
  fecha_venta date,
  vendedor_id integer,
  cliente_id integer,
  tipo_movimiento text NOT NULL,
  costo numeric(12,4),
  precio numeric(10,2),
  descuento double precision,
  bodega_id integer NOT NULL DEFAULT 1,
  fecha_activo timestamp without time zone,
  factura text,
  envio text,
  documento text,
  bodega_id_origen integer,
  fecha date,
  traslado_id integer,
  detalle_factura_id bigint,
  es_venta boolean DEFAULT false,
  CONSTRAINT movimiento_pkey PRIMARY KEY (movimiento_id ),
  CONSTRAINT movimiento_bodega_id_fkey FOREIGN KEY (bodega_id)
      REFERENCES bodega (bodega_id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT movimiento_bodega_id_origen_fkey FOREIGN KEY (bodega_id_origen)
      REFERENCES bodega (bodega_id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT movimiento_cliente_id_fkey FOREIGN KEY (cliente_id)
      REFERENCES cliente (cliente_id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT movimiento_producto_id_fkey FOREIGN KEY (producto_id)
      REFERENCES producto (producto_id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT movimiento_usuario_id_fkey FOREIGN KEY (usuario_id)
      REFERENCES usuario (usuario_id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT movimiento_vendedor_id_fkey FOREIGN KEY (vendedor_id)
      REFERENCES vendedor (vendedor_id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT movimiento_check CHECK (serial_final >= serial_inicial),
  CONSTRAINT movimiento_costo_check CHECK (costo >= 0::numeric),
  CONSTRAINT movimiento_descuento_check CHECK (descuento >= 0::double precision),
  CONSTRAINT movimiento_precio_check CHECK (precio >= 0::numeric),
  CONSTRAINT movimiento_tipo_movimiento_check CHECK (tipo_movimiento = ANY (ARRAY['Ingresado'::text, 'Vendido'::text, 'Entregado'::text, 'Regresado'::text, 'Eliminado'::text, 'Devuelto'::text, 'Inconforme'::text, 'Trasladado'::text, 'Consignado'::text, 'Devolucion Consignado'::text, 'Activado'::text, 'Devolucion Claro'::text, 'Asignado'::text, 'Fusion-Sale'::text, 'Fusion'::text, 'Separacion-Sale'::text, 'Separacion'::text]))
)
WITH (
  OIDS=TRUE
);

以下是查询:

    explain analyze select 869461009867643, *
from movimiento
where (869461009867643 between serial_inicial and serial_final)
and tipo_movimiento = 'Ingresado'
union all
select 12121001477546, *
from movimiento
where 12121001477546 between serial_inicial and serial_final
and tipo_movimiento = 'Ingresado'
union all
select 354689040208615, *
from movimiento
where 354689040208615 between serial_inicial and serial_final
and tipo_movimiento = 'Ingresado';

解释分析:

Append  (cost=7542.94..185580.33 rows=232322 width=165) (actual time=93.222..571.928 rows=4 loops=1)
  ->  Bitmap Heap Scan on movimiento  (cost=7542.94..61089.00 rows=90645 width=165) (actual time=93.220..206.248 rows=1 loops=1)
        Recheck Cond: (tipo_movimiento = 'Ingresado'::text)
        Filter: ((869461009867643::bigint >= serial_inicial) AND (869461009867643::bigint <= serial_final))
        ->  Bitmap Index Scan on tipo_movimiento_index  (cost=0.00..7520.28 rows=375432 width=0) (actual time=66.445..66.445 rows=372409 loops=1)
              Index Cond: (tipo_movimiento = 'Ingresado'::text)
  ->  Bitmap Heap Scan on movimiento  (cost=7534.24..61080.30 rows=55815 width=165) (actual time=84.364..179.571 rows=2 loops=1)
        Recheck Cond: (tipo_movimiento = 'Ingresado'::text)
        Filter: ((12121001477546::bigint >= serial_inicial) AND (12121001477546::bigint <= serial_final))
        ->  Bitmap Index Scan on tipo_movimiento_index  (cost=0.00..7520.28 rows=375432 width=0) (actual time=60.282..60.282 rows=372409 loops=1)
              Index Cond: (tipo_movimiento = 'Ingresado'::text)
  ->  Bitmap Heap Scan on movimiento  (cost=7541.75..61087.81 rows=85862 width=165) (actual time=173.876..186.082 rows=1 loops=1)
        Recheck Cond: (tipo_movimiento = 'Ingresado'::text)
        Filter: ((354689040208615::bigint >= serial_inicial) AND (354689040208615::bigint <= serial_final))
        ->  Bitmap Index Scan on tipo_movimiento_index  (cost=0.00..7520.28 rows=375432 width=0) (actual time=60.294..60.294 rows=372409 loops=1)
              Index Cond: (tipo_movimiento = 'Ingresado'::text)
Total runtime: 572.138 ms

这里的解释分析用a_horse_with_no_name的例子:

    Nested Loop  (cost=7614.18..98703.44 rows=125144 width=173) (actual time=629.373..2919.334 rows=4 loops=1)
  Join Filter: ((lista.serie >= movimiento.serial_inicial) AND (lista.serie <= movimiento.serial_final))
  CTE lista
    ->  Values Scan on "*VALUES*"  (cost=0.00..0.04 rows=3 width=8) (actual time=0.012..0.033 rows=3 loops=1)
  ->  Bitmap Heap Scan on movimiento  (cost=7614.14..59283.04 rows=375432 width=165) (actual time=110.909..460.563 rows=372409 loops=1)
        Recheck Cond: (tipo_movimiento = 'Ingresado'::text)
        ->  Bitmap Index Scan on tipo_movimiento_index  (cost=0.00..7520.28 rows=375432 width=0) (actual time=107.182..107.182 rows=372409 loops=1)
              Index Cond: (tipo_movimiento = 'Ingresado'::text)
  ->  CTE Scan on lista  (cost=0.00..0.06 rows=3 width=8) (actual time=0.001..0.003 rows=3 loops=372409)
Total runtime: 2919.514 ms

因此,结合a_horse_with_no_name和Craig Ringer的建议,搜索三个序列号在350毫秒以下运行。尝试10k,它在3s +中做到了:

create temporary table lista (
    serie bigint
) on commit drop;
create index lista_index on lista using btree (serie);
insert into lista (select distinct serial_inicial from movimiento limit 10000);
analyze lista;
select serie, movimiento.*
from movimiento join lista on serie between serial_inicial and serial_final
where tipo_movimiento = 'Ingresado';

2 个答案:

答案 0 :(得分:4)

如果您确实不需要提供所提供值的哪些信息,则可以使用简单的OR:

select *
from log
where (5348491 between first_sn and last_sn)
   or (1230505 between first_sn and last_sn)
   or (5882233 between first_sn and last_sn);

另一种选择是:

with sn_list (sn) as (
   values (5348491), (1230505), (5882233)
)
select ids.sn as searched_value,
       log.*
from log
  join sn_list on sn_list.sn between log.first_sn and log.last_sn;

虽然我认为这些解决方案中的任何一个都不会实际扩展到10k值来进行比较。

(我假设你在两个sn列上都有索引)

答案 1 :(得分:2)

如果您希望查找范围,最佳选择是B树索引。

PostgreSQL documentation中所说,

  

B-tree可以处理对数据进行相等和范围查询,这些查询可以按顺序排序。