使用SQL按组和排名识别结果

时间:2012-05-05 23:34:54

标签: sql postgresql grouping window-functions

我有一个具有以下结构的表:

id          timestamp       area
717416915   18:30:53.063    25.691601
717416915   18:31:34.863    31.200506
717416915   18:32:23.665    25.690088
1994018321  18:32:45.467    37.409171
1994018321  18:33:19.612    37.409171
424164505   18:36:16.634    18.22091
424164505   18:36:36.899    18.210754
424164505   18:37:08.614    19.829266
2394018356  18:37:27.231    79.31705

我想要做的是以这样的方式总结这些值,以便我可以识别按id排序的timestamp区域。例如,如果我想要第一个area值,它将是以下内容:

id          timestamp       area_1
717416915   18:30:53.063    25.691601
1994018321  18:32:45.467    37.409171
424164505   18:36:16.634    18.22091
2394018356  18:37:27.231    79.31705

如果我希望每area获得第二个id值,则会出现以下情况:

id          timestamp       area_2
717416915   18:31:34.863    31.200506
1994018321  18:33:19.612    37.409171
424164505   18:36:36.899    18.210754

我知道我需要按时间排序,然后根据id确定第一个值。我不太明白该怎么做。我尝试做的是以下(没有运行,因为我对如何使用OVER函数仍然有点不清楚。)

WITH T AS (
    SELECT * OVER(PARTITION BY a.id ORDER BY a.timestamp) AS rnk
    FROM mytable AS a
) 
SELECT area as area_1
FROM T
WHERE rnk = 1
GROUP BY a.id
ORDER BY a.timestamp;

我计划使用rnk=2等来获取id的后续区域值。

2 个答案:

答案 0 :(得分:10)

语法应如下:

SELECT RANK() OVER(PARTITION BY a.id ORDER BY a.timestamp) AS rnk

答案 1 :(得分:1)

@ dbaseman的提示工作正常(修复了你的查询):

WITH t AS (
    SELECT *
         , rank() OVER(PARTITION BY id ORDER BY ts) AS rnk
    FROM tbl
) 
SELECT id, ts, area AS area1
FROM   t
WHERE  rnk = 1
ORDER  BY id, ts;

有一个较短的方法:

SELECT DISTINCT
       id
     , nth_value(ts,   1) OVER w  AS ts
     , nth_value(area, 1) OVER w  AS area_n
FROM   tbl
WINDOW w AS (PARTITION BY id ORDER BY ts);

你必须测试它是否也更快。应该表现得相似 More about PostgreSQL's arsenal of window functions in the manual.