大量的数据查询

时间:2016-04-08 02:03:12

标签: postgresql

我有一个分区表,现在大约有200个多个分区,每个分区表里面大约有1200万个数据。现在选择很慢,相应字段的表索引已经建立,但是仍然很慢,我看执行计划,发现大量从磁盘读取数据,对我来说,我改变了如何调整和优化它< / p>
   gjdd4=# \d t_bus_position_20160306_20160308
               Table "public.t_bus_position_20160306_20160308"
         Column         |              Type              |     Modifiers      
------------------------+--------------------------------+--------------------
 pos_uuid               | character varying(20)          | collate zh_CN.utf8
 pos_line_uuid          | character varying(20)          | 
 pos_line_type          | character varying(20)          | 
 pos_bus_uuid           | character varying(20)          | collate zh_CN.utf8
 pos_dev_uuid           | character varying(20)          | 
 pos_sta_uuid           | character varying(20)          | 
 pos_drv_ic_card        | character varying(30)          | 
 pos_lng                | character varying(30)          | 
 pos_lat                | character varying(30)          | 
 pos_bus_speed          | character varying(20)          | 
 pos_real_time_status   | character varying(20)          | 
 pos_gather_time        | timestamp(6) without time zone | 
 pos_storage_time       | timestamp(6) without time zone | 
 pos_is_offset          | boolean                        | 
 pos_is_overspeed       | character varying(1)           | 
 pos_cursor_over_ground | character varying(20)          | 
 pos_all_alarms         | character varying(30)          | 
 pos_is_in_station      | character varying(1)           | 
 pos_closed_alarms      | character varying(30)          | 
 pos_dis_to_pre_i       | integer                        | 
 pos_odometer_i         | bigint                         | 
 pos_relative_location  | real                           | 
 pos_dis_to_pre         | real                           | 
 pos_odometer           | double precision               | 
 pos_gather_time1       | bigint                         | 
Indexes:
    "idx_multi" btree (pos_bus_uuid, pos_gather_time DESC)
    "idx_trgm" btree (replace(to_char(pos_gather_time, 'YYYYMMDDHH24'::text), ' '::text, ''::text))
    "idx_trgm1" btree (to_char(pos_gather_time, 'YYYYMMDD'::text))
    "tp_20160306_20160308_pos_dev_uuid_idx" btree (pos_dev_uuid)
Check constraints:
    "t_bus_position_20160306_20160308_pos_gather_time_check" CHECK (pos_gather_time >= '2016-03-06 00:00:00'::timestamp without time zone AND
 pos_gather_time < '2016-03-09 00:00:00'::timestamp without time zone)

计划是这样的。

    gjdd4=# explain(costs,buffers,timing,analyze) select pos_bus_uuid from    test2 group by pos_bus_uuid;
     HashAggregate  (cost=802989.75..802993.00 rows=325 width=21) (actual time=42721.528..42721.679 rows=354 loops=1)
       Group Key: pos_bus_uuid
       Buffers: shared hit=3560 read=567491
       I/O Timings: read=20231.511
       ->  Seq Scan on test2  (cost=0.00..756602.00 rows=18555100 width=21) (actual time=0.067..27749.533 rows=18555100 loops=1)
             Buffers: shared hit=3560 read=567491
             I/O Timings: read=20231.511
     Planning time: 0.116 ms
     Execution time: 42721.839 ms
    (9 rows)

    Time: 42722.629 ms

2 个答案:

答案 0 :(得分:1)

您的查询不会进行任何实际聚合,只会distinct。如果这是您真正想要的(所有不同的pos_bus_uuid values),那么您可以使用名为 loose index scan 的技术:

此处假设pos_bus_uuid的定制查询具有非空约束:

WITH RECURSIVE t AS (
    (SELECT pos_bus_uuid FROM test2 ORDER BY pos_bus_uuid LIMIT 1)  -- parentheses required
  UNION ALL
     SELECT (SELECT pos_bus_uuid FROM test2
              WHERE pos_bus_uuid > t.pos_bus_uuid ORDER BY pos_bus_uuid LIMIT 1)
       FROM t
      WHERE t.pos_bus_uuid IS NOT NULL
   )
SELECT pos_bus_uuid FROM t WHERE pos_bus_uuid IS NOT NULL;

您的索引pos_bus_uuid应该足以满足此查询。

答案 1 :(得分:1)

Markus Winand的答案是正确的 - 你需要复制完整的sql,语法错误是由于你不包括最后一行 - 'SELECT pos_bus_uuid FROM t WHERE pos_bus_uuid IS NOT NULL;'

会将此作为评论添加,但声誉太低而无法发表评论。