每周的总记录数

时间:2014-11-09 22:28:28

标签: sql postgresql date generate-series

我有一个Postgres 9.1数据库。我试图生成每周的记录数(对于给定的日期范围)并将其与上一年进行比较。

我有以下代码用于生成系列:

select generate_series('2013-01-01', '2013-01-31', '7 day'::interval) as series

但是,我不知道如何将计算的记录加入到生成的日期。

因此,使用以下记录作为示例:

Pt_ID      exam_date
======     =========
1          2012-01-02
2          2012-01-02
3          2012-01-08
4          2012-01-08
1          2013-01-02
2          2013-01-02
3          2013-01-03
4          2013-01-04
1          2013-01-08
2          2013-01-10
3          2013-01-15
4          2013-01-24

我希望将记录返回为:

  series        thisyr      lastyr
===========     =====       =====
2013-01-01        4           2
2013-01-08        3           2
2013-01-15        1           0
2013-01-22        1           0
2013-01-29        0           0

不确定如何在子搜索中引用日期范围。感谢您的帮助。

2 个答案:

答案 0 :(得分:3)

简单的方法是通过@jpw演示的CROSS JOIN来解决这个问题。但是,有一些隐藏的问题

  1. 无条件CROSS JOIN效果随着行数的增加而迅速恶化。在聚合中处理这个巨大的派生表之前,总行数乘以您要测试的周数。索引无法提供帮助。

  2. 1月1日开始的几周会导致不一致。 ISO周可能是另一种选择。见下文。

  3. 以下所有问题都会大量使用exam_date 上的索引。一定要有一个。

    仅加入相关行

    应该更快

    SELECT d.day, d.thisyr
         , count(t.exam_date) AS lastyr
    FROM  (
       SELECT d.day::date, (d.day - '1 year'::interval)::date AS day0  -- for 2nd join
            , count(t.exam_date) AS thisyr
       FROM   generate_series('2013-01-01'::date
                            , '2013-01-31'::date  -- last week overlaps with Feb.
                            , '7 days'::interval) d(day)  -- returns timestamp
       LEFT   JOIN tbl t ON t.exam_date >= d.day::date
                        AND t.exam_date <  d.day::date + 7
       GROUP  BY d.day
       ) d
    LEFT   JOIN tbl t ON t.exam_date >= d.day0      -- repeat with last year
                     AND t.exam_date <  d.day0 + 7
    GROUP  BY d.day, d.thisyr
    ORDER  BY d.day;
    

    这是从1月1日开始的几个星期,就像你原来的一样。正如评论的那样,这产生了一些不一致的地方:每周从不同的一天开始,自从我们在年底切断,一年的最后一周只有1或2天(闰年)。

    与ISO周期相同

    根据要求,请考虑 ISO周,从周一开始,始终为7天。但他们跨越了多年的边界。 Per documentation on EXTRACT()

      

         

    当天的一周中的星期数。根据定义(ISO 8601),星期一和星期的第一周开始   年份包含当年的1月4日。换句话说,第一个   一年的星期四是在那一年的第1周。

         

    在ISO定义中,1月初的日期可能是上一年的第52周或第53周的一部分,并且   12月下旬可以成为明年第一周的一部分。对于   例如,2005-01-01是2004年第53周的一部分,并且   2006-01-01是2005年第52周的一部分,而2012-12-31是。isoyear   2013年第一周的一部分。建议使用week   字段与SELECT w AS isoweek , day::text AS thisyr_monday, thisyr_ct , day0::text AS lastyr_monday, count(t.exam_date) AS lastyr_ct FROM ( SELECT w, day , date_trunc('week', '2012-01-04'::date)::date + 7 * w AS day0 , count(t.exam_date) AS thisyr_ct FROM ( SELECT w , date_trunc('week', '2013-01-04'::date)::date + 7 * w AS day FROM generate_series(0, 4) w ) d LEFT JOIN tbl t ON t.exam_date >= d.day AND t.exam_date < d.day + 7 GROUP BY d.w, d.day ) d LEFT JOIN tbl t ON t.exam_date >= d.day0 -- repeat with last year AND t.exam_date < d.day0 + 7 GROUP BY d.w, d.day, d.day0, d.thisyr_ct ORDER BY d.w, d.day; 一起获得一致的结果。

    以上查询用ISO周重写:

    date_trunc('week', '2012-01-04'::date)::date
    

    1月4日始终是今年的第一个ISO周。因此,此表达式获取给定年份的第一个ISO周的星期一日期:

    EXTRACT()

    使用EXTRACT()

    简化

    由于ISO周与SELECT w AS isoweek , COALESCE(thisyr_ct, 0) AS thisyr_ct , COALESCE(lastyr_ct, 0) AS lastyr_ct FROM generate_series(1, 5) w LEFT JOIN ( SELECT EXTRACT(week FROM exam_date)::int AS w, count(*) AS thisyr_ct FROM tbl WHERE EXTRACT(isoyear FROM exam_date)::int = 2013 GROUP BY 1 ) t13 USING (w) LEFT JOIN ( SELECT EXTRACT(week FROM exam_date)::int AS w, count(*) AS lastyr_ct FROM tbl WHERE EXTRACT(isoyear FROM exam_date)::int = 2012 GROUP BY 1 ) t12 USING (w); 返回的周数一致,我们可以简化查询。首先,简短而简单的形式:

    WITH params AS (          -- enter parameters here, once 
       SELECT date_trunc('week', '2012-01-04'::date)::date AS last_start
            , date_trunc('week', '2013-01-04'::date)::date AS this_start
            , date_trunc('week', '2014-01-04'::date)::date AS next_start
            , 1 AS week_1
            , 5 AS week_n     -- show weeks 1 - 5
       )
    SELECT w.w AS isoweek
         , p.this_start + 7 * (w - 1) AS thisyr_monday
         , COALESCE(t13.ct, 0) AS thisyr_ct
         , p.last_start + 7 * (w - 1) AS lastyr_monday
         , COALESCE(t12.ct, 0) AS lastyr_ct
    FROM params p
       , generate_series(p.week_1, p.week_n) w(w)
    LEFT   JOIN (
       SELECT EXTRACT(week FROM t.exam_date)::int AS w, count(*) AS ct
       FROM   tbl t, params p
       WHERE  t.exam_date >= p.this_start      -- only relevant dates
       AND    t.exam_date <  p.this_start + 7 * (p.week_n - p.week_1 + 1)::int
    -- AND    t.exam_date <  p.next_start      -- don't cross over into next year
       GROUP  BY 1
       ) t13  USING (w)
    LEFT   JOIN (                              -- same for last year
       SELECT EXTRACT(week FROM t.exam_date)::int AS w, count(*) AS ct
       FROM   tbl t, params p
       WHERE  t.exam_date >= p.last_start
       AND    t.exam_date <  p.last_start + 7 * (p.week_n - p.week_1 + 1)::int
    -- AND    t.exam_date <  p.this_start
       GROUP  BY 1
       ) t12  USING (w);
    

    优化查询

    同样有更多细节并针对性能进行了优化

    JOIN LATERAL

    在索引支持下,这应该非常快,并且可以很容易地适应选择的间隔。 上次查询中generate_series()的隐式{{1}}需要 Postgres 9.3

    SQL Fiddle.

答案 1 :(得分:1)

使用cross join应该可行,我只是要粘贴下面的SQL Fiddle的markdown输出。对于2013-01-08系列来说,你的样本输出似乎不正确:thisyr应该是2,而不是3.这可能不是最好的方法,但是我的Postgresql知识还有很多不足之处。

SQL Fiddle

PostgreSQL 9.2.4架构设置

CREATE TABLE Table1
    ("Pt_ID" varchar(6), "exam_date" date);

INSERT INTO Table1
    ("Pt_ID", "exam_date")
VALUES
    ('1', '2012-01-02'),('2', '2012-01-02'),
    ('3', '2012-01-08'),('4', '2012-01-08'),
    ('1', '2013-01-02'),('2', '2013-01-02'),
    ('3', '2013-01-03'),('4', '2013-01-04'),
    ('1', '2013-01-08'),('2', '2013-01-10'),
    ('3', '2013-01-15'),('4', '2013-01-24');

查询1

select 
  series, 
  sum (
    case 
      when exam_date 
        between series and series + '6 day'::interval
      then 1 
      else 0 
    end
  ) as thisyr,
  sum (
    case 
      when exam_date + '1 year'::interval 
        between series and series + '6 day'::interval
      then 1 else 0 
    end
  ) as lastyr

from table1
cross join generate_series('2013-01-01', '2013-01-31', '7 day'::interval) as series
group by series
order by series

<强> Results

|                         SERIES | THISYR | LASTYR |
|--------------------------------|--------|--------|
| January, 01 2013 00:00:00+0000 |      4 |      2 |
| January, 08 2013 00:00:00+0000 |      2 |      2 |
| January, 15 2013 00:00:00+0000 |      1 |      0 |
| January, 22 2013 00:00:00+0000 |      1 |      0 |
| January, 29 2013 00:00:00+0000 |      0 |      0 |