如何按工作日计算每小时4周移动平均线?

时间:2017-10-13 17:07:02

标签: mysql

给定一个带有日期时间列的表格,我希望得到每小时4周移动平均条目数和每个结果的星期几。

例如,在10月1日到10月13日之间,我想回到一个结果,显示按小时和每周分组的行数的4周滚动平均值。

到目前为止,我得到了4周的小时总数,但没有滚动总数:

SELECT 
   DAYOFWEEK(start_time) as DOW, 
   date_format( start_time, '%H' ) as 'HOUR',
   count( * ) as 'count' 
FROM mytable 
WHERE start_time >='2017-08-01' and start_time <= '2017-08-29' 
GROUP BY DAYOFWEEK(start_time),date_format( start_time, '%H' )

1 个答案:

答案 0 :(得分:1)

这是一种经过部分测试的方法。

它使用日期参数来确保where子句的一致性。其他参数也用于控制每小时桶(我在有限的测试中使用3)和周数(我在测试中使用0,因为我有一个非常小的行集)。

第一个子查询用于生成“范围”,当连接到源行时,将这些行放入每个“滚动n小时范围”。这些范围是使用date_format输出YYYYMMDDHH定义的,它们是字符串,然后数据也被强制为相同的字符串格式以便加入,因此如果在大型表上使用,这可能会导致性能问题(是的,不是sargable,我不喜欢不喜欢它。)

可以看到此解决方案正常工作here at SQL Fiddle

架构设置

CREATE TABLE `myTable` (
  `id` mediumint(8) unsigned NOT NULL auto_increment,
  `start_time` datetime,
  PRIMARY KEY (`id`)
) AUTO_INCREMENT=1;

INSERT INTO MyTable
    (`start_time`)
VALUES
    ('2017-08-01 00:01:00'),
    ('2017-08-01 00:15:00'),
    ('2017-08-01 00:29:00'),

    ## more here, 3 rows per hour over a narrow date range

    ('2017-08-03 08:01:00'),
    ('2017-08-03 08:15:00'),
    ('2017-08-03 08:29:00')
;

<强>查询

set @start_time := '2017-08-02';
set @num_hrs    := 4; -- controls length of rolling period e.g. 4 hours each
set @num_weeks  := 4; -- controls the date date

set @end_time   := date_add(@start_time, INTERVAL ((7 * @num_weeks)+1) DAY);

SELECT
       DOW
     , hour_of_day 
     , COUNT(*) period_count
     , (COUNT(*) * 1.0) / @num_hrs rolling_av
FROM (
    ## build a set of ranges in YYYYMMDDHH format differing by the wanted number of hours
    SELECT 
          id
       ,  DATE_FORMAT(date_add(start_time, INTERVAL (@num_hrs*-1) HOUR), '%Y%m%d%H') as range_start
       ,  DATE_FORMAT(start_time, '%Y%m%d%H') as range_end
    FROM mytable
    WHERE start_time >= @start_time and start_time < @end_time
    ) R
INNER JOIN (
    SELECT
           start_time
         , DAYOFWEEK(start_time) as DOW 
         , date_format(start_time, '%H' ) as hour_of_day
    FROM MyTable
    WHERE start_time >= @start_time and start_time < @end_time
    ) T ON DATE_FORMAT(T.start_time, '%Y%m%d%H') >= R.range_start
                    AND DATE_FORMAT(T.start_time, '%Y%m%d%H') <= R.range_end
GROUP BY 
       DOW, hour_of_day
ORDER BY 
       DOW, hour_of_day
;

<强> Results

| DOW | hour_of_day | period_count | rolling_av |
|-----|-------------|--------------|------------|
|   4 |          00 |           36 |         12 |
|   4 |          01 |           36 |         12 |
|   4 |          02 |           36 |         12 |
|   4 |          03 |           36 |         12 |
|   4 |          04 |           36 |         12 |
|   4 |          05 |           36 |         12 |
|   4 |          06 |           36 |         12 |
|   4 |          07 |           36 |         12 |
|   4 |          08 |           36 |         12 |
|   4 |          09 |           36 |         12 |
|   4 |          10 |           36 |         12 |
|   4 |          11 |           36 |         12 |
|   4 |          12 |           36 |         12 |
|   4 |          13 |           36 |         12 |
|   4 |          14 |           36 |         12 |
|   4 |          15 |           36 |         12 |
|   4 |          16 |           36 |         12 |
|   4 |          17 |           36 |         12 |
|   4 |          18 |           36 |         12 |
|   4 |          19 |           36 |         12 |
|   4 |          20 |           36 |         12 |
|   4 |          21 |           27 |          9 |
|   4 |          22 |           18 |          6 |
|   4 |          23 |            9 |          3 |