具有每月间隔的Oracle分区

时间:2020-04-17 19:31:43

标签: sql oracle

我正在使用1个日历月的分区窗口执行查询。我正在使用的数据是定期收集的,例如每十五分钟。

代码如下:

SELECT AVG(data_value) OVER (
   PARTITION BY id
   ORDER BY time_stamp
   RANGE BETWEEN INTERVAL '1' MONTH PRECEDING AND CURRENT ROW)

此查询效果很好,并收集每月平均值。唯一的问题是间隔的开始和结束彼此相距恰好一个月,因此间隔窗口的边界是包含边界的,例如。开始时间为2019年11月1日00:00,结束时间为2019年12月1日00:00。

我需要这样做,以便不包括起始边界,因为它不被视为数据集的一部分,例如。从2019年11月1日00:15(下一行)开始,结束日期仍为2019年1月1日00:00。

我想知道Oracle是否可以做到这一点。

我想象代码看起来像这样:

SELECT AVG(data_value) OVER (
   PARTITION BY id
   ORDER BY time_stamp
   RANGE BETWEEN INTERVAL '1' MONTH (+ 1 ROW) PRECEDING AND CURRENT ROW)

我已经尝试了几种变体,但是Oracle不喜欢它们。任何帮助将不胜感激。

4 个答案:

答案 0 :(得分:0)

使用以下方法计算上个月的天数:

EXTRACT( DAY FROM TRUNC( time_stamp, 'MM' ) - 1 )

使用NUMTODSINTERVAL函数可以减少间隔几天,因此您可以排除正在计算的多余日期:

SELECT id,
       data_value,
       time_stamp,
       AVG(data_value)
         OVER (
           PARTITION BY id
           ORDER BY time_stamp
           RANGE BETWEEN NUMTODSINTERVAL(
                           EXTRACT( DAY FROM TRUNC( time_stamp, 'MM' ) - 2 ),
                           'DAY'
                         ) PRECEDING
                 AND     CURRENT ROW
       ) AS avg_value_month_minus_1_day
FROM   table_name;

因此,如果您的数据是:

CREATE TABLE table_name ( id, data_value, time_stamp ) AS
SELECT 1,
       LEVEL,
       DATE '2020-01-01' + LEVEL - 1
FROM   DUAL
CONNECT BY LEVEL <= 50;

然后将上述查询与您的输出进行比较:

SELECT id,
       data_value,
       time_stamp,
       AVG(data_value)
         OVER (
           PARTITION BY id
           ORDER BY time_stamp
           RANGE BETWEEN NUMTODSINTERVAL(
                           EXTRACT( DAY FROM TRUNC( time_stamp, 'MM' ) - 2 ),
                           'DAY'
                         ) PRECEDING
                 AND     CURRENT ROW
       ) AS avg_value_month_minus_1_day,
       AVG(data_value)
         OVER (
           PARTITION BY id
           ORDER BY time_stamp
           RANGE BETWEEN INTERVAL '1' MONTH PRECEDING
                 AND     CURRENT ROW
       ) AS avg_value_month
FROM   table_name;

输出(对于2月份,当前一个月有完整的数据时):

ID | DATA_VALUE | TIME_STAMP          | AVG_VALUE_MONTH_MINUS_1_DAY | AVG_VALUE_MONTH
-: | ---------: | :------------------ | --------------------------: | --------------:
 1 |         32 | 2020-02-01 00:00:00 |                          17 |            16.5
 1 |         33 | 2020-02-02 00:00:00 |                          18 |            17.5
 1 |         34 | 2020-02-03 00:00:00 |                          19 |            18.5
 1 |         35 | 2020-02-04 00:00:00 |                          20 |            19.5
 1 |         36 | 2020-02-05 00:00:00 |                          21 |            20.5
 1 |         37 | 2020-02-06 00:00:00 |                          22 |            21.5
 1 |         38 | 2020-02-07 00:00:00 |                          23 |            22.5
 1 |         39 | 2020-02-08 00:00:00 |                          24 |            23.5
 1 |         40 | 2020-02-09 00:00:00 |                          25 |            24.5
 1 |         41 | 2020-02-10 00:00:00 |                          26 |            25.5
 1 |         42 | 2020-02-11 00:00:00 |                          27 |            26.5
 1 |         43 | 2020-02-12 00:00:00 |                          28 |            27.5
 1 |         44 | 2020-02-13 00:00:00 |                          29 |            28.5
 1 |         45 | 2020-02-14 00:00:00 |                          30 |            29.5
 1 |         46 | 2020-02-15 00:00:00 |                          31 |            30.5
 1 |         47 | 2020-02-16 00:00:00 |                          32 |            31.5
 1 |         48 | 2020-02-17 00:00:00 |                          33 |            32.5
 1 |         49 | 2020-02-18 00:00:00 |                          34 |            33.5
 1 |         50 | 2020-02-19 00:00:00 |                          35 |            34.5

db <>提琴here

答案 1 :(得分:0)

A,Oracle不支持间隔两个月或更小的单位。

一种方法是将其减去:

select (sum(data_value) over (partition by id
                              order by time_stamp
                              range between interval '3' month preceding and current row
                             ) -
        sum(data_value) over (partition by id
                              order by time_stamp
                              range between interval '3' month preceding and '3' month preceding
                             )
       ) /
       (count(data_value) over (partition by id
                                order by time_stamp
                                range between interval '3' month preceding and current row
                               ) -
        count(data_value) over (partition by id
                                order by time_stamp
                                range between interval '3' month preceding and '3' month preceding
                               )
       ) 

诚然,这对于平均而言比较麻烦,但是对于sum()count()来说可能就很好。

答案 2 :(得分:0)

要移动您正在查看的时间范围,可以将要排序的值移动适当的时间间隔:

SELECT AVG(data_value)
       OVER (PARTITION BY id
                 ORDER BY time_stamp
                 RANGE BETWEEN INTERVAL '1' MONTH PRECEDING AND CURRENT ROW
       ) Current_Calc
     , AVG(data_value)
       OVER (PARTITION BY id
                 ORDER BY time_stamp - interval '15' minute
                 RANGE BETWEEN INTERVAL '1' MONTH PRECEDING AND CURRENT ROW
       ) Shift_Back
     , AVG(data_value)
       OVER (PARTITION BY id
                 ORDER BY time_stamp + interval '15' minute
                 RANGE BETWEEN INTERVAL '1' MONTH PRECEDING AND CURRENT ROW
       ) shift_forward
  FROM Your_Data

基于对问题的描述,我相信您希望将其后移15分钟,但我可能会误读问题说明,并且没有适当的数据可用于测试和预期的结果 < / strong>

这些滑动窗口相对于当前time_stamp总是包含一个月的数据,这意味着每time_stamp个月您将获得29至32天的数据,其中包括该数据在当前和前几个月的平均值中进行计数。

另一方面,如果您感兴趣的是离散月份的平均值,那么您应该按月份进行分区,而不是创建一个滑动窗口,如果您希望每月运行的平均值可以添加排序,但是您成功了不需要windowing子句:

SELECT TRUNC(time_stamp, 'MM') MON
     , AVG(data_value)
       OVER (PARTITION BY id, TRUNC(time_stamp, 'MM')) MON_AVG
     , AVG(data_value)
       OVER (PARTITION BY id, TRUNC(time_stamp, 'MM')
             ORDER BY time_stamp) RUN_MON_AVG
     , TRUNC(time_stamp - INTERVAL '15' MINUTE, 'MM') MON_2
     , AVG(data_value)
       OVER (PARTITION BY id, TRUNC(time_stamp - INTERVAL '15' MINUTE, 'MM')
       ) MON_AVG_2
     , AVG(data_value)
       OVER (PARTITION BY id, TRUNC(time_stamp - INTERVAL '15' MINUTE, 'MM')
             ORDER BY time_stamp) RUN_MON_AVG
  FROM Your_Data

答案 3 :(得分:0)

感谢您的反馈!我能够根据以上答案汇总所需的答案。这是我使用的代码:

   SELECT AVG(data_value) OVER (
   PARTITION BY id
   ORDER BY time_stamp
   RANGE BETWEEN (NUMTODSINTERVAL(EXTRACT( DAY FROM (TRUNC(time_stamp,'MM') - 1) ),'DAY') - NUMTODSINTERVAL(1,'SECOND')) PRECEDING AND CURRENT ROW)

因为我的间隔恰好是一个月,并且我想删除第一个条目,所以我首先按照上面的建议将前一个月转换为以秒为单位的间隔。然后,我从间隔的下限减去一秒钟。这样的效果是使区间的下限为“开放”界限,而将上限为“封闭”界限。

作为一个旁注,我使用了一秒钟是因为我的数据集的周期性不一致,但是它的最小值是三分钟,因此任何小于该值的方法都将起作用。