如何优化需要时间戳规范化的查询

时间:2017-06-15 11:03:08

标签: postgresql plpgsql

我有以下数据源,它有几个物理值(每列一个)来自不同时间的几个设备:

+-----------+------------+---------+-------+
| id_device | timestamp  |  Vln1   | kWl1  |
+-----------+------------+---------+-------+
|       123 | 1495696500 |         |       |
|       122 | 1495696800 |         |       |
|       122 | 1495697100 | 230     | 5.748 |
|       122 | 1495697100 | 230     | 5.185 |
|       124 | 1495700100 | 226.119 | 0.294 |
|       122 | 1495713900 | 230     |       |
|       122 | 1495716000 |         |       |
|       122 | 1495716300 | 230     |       |
|       122 | 1495716300 |         |       |
|       122 | 1495716300 |         |       |
|       122 | 1495716600 | 230     | 4.606 |
|       122 | 1495716600 |         |       |
|       124 | 1495739100 |         |       |
|       123 | 1495739400 |         |       |
+-----------+------------+---------+-------+

timestamp(不幸的是)bigint并且每个设备在不同的时间以不同的频率发送数据:一些设备每5分钟推送一次,其他设备每10分钟推送一次,其他每15分钟推送一次。物理值可以是NULL

前端应用程序需要绘制特定时间戳的图表 - 让我们说出折线图 - 每分钟都有时间刻度。时间刻度由用户选择。 图表可以由多个设备的多个物理值组成,每行是对后端的独立请求。

让我们考虑以下情况:

  1. 所选择的时间刻度是10分钟
  2. 选择要绘制的两条线,在两个不同的设备上具有两个不同的物理值(列):
    1. 设备每5分钟推一次
    2. 另外每10分钟
  3. 前端应用程序所期望的是标准化结果:

    <timestamp>, <value>
    

    其中

    1. timestamp表示四舍五入的时间(00:00,00:10,00:20,等等)
    2. 如果每个“时间框”中有多个value(例如:在00:00和00:10内每5分钟推送一次设备将有2个值),单个值将为返回,这是一个聚合值(AVG)
    3. 为了实现这一点,我创建了一些帮助我的plpgsql函数,但我不确定我所做的是性能最好的。

      基本上我所做的是:

      1. 在选定的时间范围内获取特定设备和物理测量的数据
      2. 归一化返回的数据:每个时间戳四舍五入到所选的时间刻度(即10:12:23 - > 10:10:00)。这样,每个元组将表示“时间桶”中的值
      3. 根据用户选择的时间刻度创建range个时间段
      4. JOIN带有范围的时间戳标准化数据。如果在同一范围内有多个值,则汇总
      5. 以下是我的功能:

        create  or replace function app_iso50k1.blkGetTimeSelParams(
              t_end bigint,
              t_granularity integer,
              t_span bigint,
          OUT delta_time_bucket interval,
          OUT b_timebox timestamp,
          OUT e_timebox timestamp)
        as
        $$
        DECLARE
          delta_time interval;
        BEGIN
          /* normalization: no minutes */
          t_end = extract('epoch' from date_trunc('minute', (to_timestamp(t_end) at time zone 'UTC')::timestamp));
        
          delta_time =  app_iso50k1.blkGetDeltaTimeBucket(t_end, t_granularity);
          e_timebox = date_trunc('minute', (to_timestamp(t_end - extract('epoch' from delta_time)) at time zone 'UTC'))::timestamp;
          b_timebox = (to_timestamp(extract('epoch' from e_timebox) - t_span) at time zone 'UTC')::timestamp;
        
          delta_time_bucket = delta_time;
        END
        $$ immutable language 'plpgsql' security invoker;
        
        
        create or replace function app_iso50k1.getPhyMetData(
          tablename character varying,
          t_span bigint,
          t_end bigint,
          t_granularity integer,
          idinstrum integer,
          id_device integer,
          varname character varying,
          op character varying,
          page_size int,
          page int)
          RETURNS TABLE(times bigint , val double precision) as
        $$
        DECLARE
          series REFCURSOR;
          serie RECORD;
          first_notnull bool = false;
          prev_val double precision;
          time_params record;
          q_offset int;
        BEGIN
          time_params = app_iso50k1.blkGetTimeSelParams(t_end, t_granularity, t_span);
          if(page = 1) then
            q_offset = 0;
          else
            q_offset = page_size * (page -1);
          end if;
        
          if not public.blkIftableexists('resgetphymetdata')
          THEN
            create temporary table resgetphymetdata (times bigint, val double precision);
          ELSE
            truncate table resgetphymetdata;
          END IF;
        
          execute format($ff$
          insert into resgetphymetdata (
            /* generate every possible range between these dates */
            with ranges as (
                select generate_series($1, $2, interval '$5 minutes') as range_start
            ),
              /* normalize your data to which <t_granularity>-minute interval it belongs to */
            rounded_hst as (
              select
                date_trunc ('minutes', (to_timestamp("timestamp") at time zone 'UTC')::timestamp)::timestamp -
                mod (extract ('minutes' from ((to_timestamp("timestamp") at time zone 'UTC')::timestamp))::int, $5) * interval '1 minute' as round_time,
                *
              from public.%I
              where
                idinstrum = $3 and
                id_device = $4 and
                timestamp <= $8
            )
            select
              extract('epoch' from r.range_start)::bigint AS times,
              %s (hd.%I) AS val
            from
              ranges r
              left join rounded_hst hd on r.range_start = hd.round_time
            group by
              r.range_start
            order by
              r.range_start
            LIMIT $6 OFFSET $7
          );
          $ff$, tablename, op, varname) using time_params.b_timebox, time_params.e_timebox, idinstrum, id_device, t_granularity, page_size, q_offset, t_end;
        
          /* data cleansing: val holes between not-null values are filled with the previous value */
          open series no scroll for select * from resgetphymetdata;
          loop
            fetch series into serie;
            exit when not found;
        
            if NOT first_notnull then
              if serie.val NOTNULL then
                first_notnull = true;
                prev_val = serie.val;
              end if;
            else
              if serie.val is NULL then
                update resgetphymetdata
                set val = prev_val
                where current of series;
              else
                prev_val = serie.val;
              end if;
            end if;
          end loop;
          close series;
        
          return query select * from resgetphymetdata;
        END;
        $$ volatile language 'plpgsql' security invoker;
        

        您是否看到了我编码的替代品?有改进的余地吗? 谢谢!

0 个答案:

没有答案