PostgreSQL找到按日期分组的前N行

时间:2014-09-03 03:10:27

标签: sql postgresql

我正在处理一个典型的博客应用,并有一个返回以下数据的视图:

| post_id | title | publish_on | tag_id | tag_name |

| 1 | Why is Postgres awesome                | 2014-09-02 | 1    | tech |
| 1 | Why is Postgres awesome                | 2014-09-02 | 2    | postgres |
| 2 | How to ask a question on stackoverflow | 2014-09-10 | 1    | tech |
| 2 | How to ask a question on stackoverflow | 2014-09-10 | 2    | postgres |
| 2 | How to ask a question on stackoverflow | 2014-09-10 | 3    | guide |
| 3 | This is a draft                        | null       | null | null |
| 4 | This is something else without a tag   | 2014-10-10 | null | null |
| 5 | This question is also published on 9/2 | 2014-09-02 | null | null |
| 6 | And so is this                         | 2014-09-02 | 1    | tech |
| 7 | But this one is on 9/10                | 2014-09-10 | 3    | guide|
| 8 | This is on 10/10                       | 2014-10-10 | null | null |
| 9 | And so is this                         | 2014-10-10 | 2    | postgres |
| 10| This is another draft                  | null       | null | null |

我希望按发布日期对帖子进行分组,然后为每个广告素材选择前3个帖子(这将显示在信息中心内,以便用户可以知道今天将发布哪些帖子,下周某个时间后来)现在我尝试these solutions使用了类似的东西:

ROW_NUMBER() OVER (PARTITION BY publish_on ORDER BY publish_on DESC)

但由于多个标记可以复制行,因此这些查询会失败。我还尝试了各种PARTION BY条件的组合,但我想我不太了解它们以使其正常运行。

任何帮助/指示赞赏!

更新:预期输出

对于每个publish_on日期,我希望预计在该日期发布N(3)个帖子。

| 1 | Why is Postgres awesome                | 2014-09-02 | 1    | tech |
| 1 | Why is Postgres awesome                | 2014-09-02 | 2    | postgres |
| 5 | This question is also published on 9/2 | 2014-09-02 | null | null |
| 6 | And so is this                         | 2014-09-02 | 1    | tech |

| 2 | How to ask a question on stackoverflow | 2014-09-10 | 1    | tech |
| 2 | How to ask a question on stackoverflow | 2014-09-10 | 2    | postgres |
| 2 | How to ask a question on stackoverflow | 2014-09-10 | 3    | guide |
| 7 | But this one is on 9/10                | 2014-09-10 | 3    | guide|

| 4 | This is something else without a tag   | 2014-10-10 | null | null |
| 8 | This is on 10/10                       | 2014-10-10 | null | null |
| 9 | And so is this                         | 2014-10-10 | 2    | postgres |

| 3 | This is a draft                        | null       | null | null |
| 10| This is another draft                  | null       | null | null |

希望这可以使问题更清楚地理解。

1 个答案:

答案 0 :(得分:1)

这是你在找什么? SQL Fiddle

SELECT * 
FROM (SELECT ROW_NUMBER() OVER (PARTITION BY tag_name order by publish_on DESC) AS r,
    t.*
    from blog t ) x
where x.r <= 3

解释和问题

我假设“每个桶”是指tag_name(或tag_id)。那么你只需要“每个桶”中最近的3个帖子。如果帖子被多次标记,那么您希望如何对待它们 - 每个标记出现一次 - 或者每个结果集只出现一次?

修改

现在可以按预期显示结果。 SQL Fiddle for this here.

SELECT DISTINCT x.Post_id, y.title, x.Publish_on, y.tag_id, y.tag_name
FROM blog y
INNER JOIN (SELECT ROW_NUMBER() OVER (PARTITION BY publish_on order by publish_on DESC) AS r,
    t.post_id, t.publish_on
    from (SELECT DISTINCT s.post_id, s.publish_on
          FROM blog s) t 
           ) x ON x.post_id = y.post_id
where x.r <= 3
ORDER BY x.publish_on

增加复杂性的主要问题是表结构未规范化。实际上这应该是3个表,因此描述和日期不会在不同的行中重复,即

CREATE TABLE blog
(post_id int not null,
 title varchar(50) not null,
 publish_on date)

CREATE TABLE blog_tag
(post_id int not null,
 tag_ig int not null)

CREATE TABLE tag
(tag_id int not null,
 tag_name varchar(10) not null)

然后可以用see full SQL Fidle for this here.

替换SQL
SELECT x.Post_id, x.title, x.Publish_on, t.tag_id, t.tag_name
FROM (SELECT ROW_NUMBER() OVER (PARTITION BY publish_on order by publish_on DESC) AS r,
    b.*
    from blog b) x
LEFT JOIN blog_tag bt ON bt.post_id = x.post_id
LEFT JOIN tag t ON t.tag_id = bt.tag_id
WHERE x.r <= 3
ORDER BY x.publish_on, x.post_id, t.tag_id