限制每组的结果

时间:2016-08-31 19:09:20

标签: sql postgresql greatest-n-per-group

我想限制每个组中的记录,这样当我将它们聚合到select语句中的JSON对象时,它只需要N conversations具有最高count

有什么想法吗?

我的查询:

select
          dt.id as app_id,
          json_build_object(
              'rows', array_agg(
                 json_build_object(
                    'url', dt.started_at_url,
                    'count', dt.count
                 )
              )
          ) as data
      from (
          select a.id, c.started_at_url, count(c.id)
          from apps a
          left join conversations c on c.app_id = a.id
          where started_at_url is not null and c.started_at::date > (current_date - (7  || ' days')::interval)::date
          group by a.id, c.started_at_url
          order by count desc
      ) as dt
      where dt.id = 'ASnYW1-RgCl0I'
      group by dt.id

1 个答案:

答案 0 :(得分:1)

您的问题类似于groupwise-max问题,并且有很多解决方案。

过滤row_number窗口函数

一个简单的方法是使用row_number()窗口函数并仅过滤出结果为<的行。 N(以5为例):

select
          dt.id as app_id,
          json_build_object(
              'rows', array_agg(
                 json_build_object(
                    'url', dt.started_at_url,
                    'count', dt.count
                 )
              )
          ) as data
      from (
          select
              a.id, c.started_at_url,
              count(c.id) as count,
              row_number() over(partition by a.id order by count(c.id) desc) as rn
          from apps a
          left join conversations c on c.app_id = a.id
          where started_at_url is not null and c.started_at > (current_date - (7  || ' days')::interval)::date
          group by a.id, c.started_at_url
          order by count desc
      ) as dt
      where
          dt.id = 'ASnYW1-RgCl0I'
          and dt.rn <= 5 /* get top 5 only */
      group by dt.id

使用LATERAL

另一个选择是使用LATERALLIMIT仅返回您感兴趣的结果:

select
    a.id as app_id,
    json_build_object(
        'rows', array_agg(
           json_build_object(
              'url', dt.started_at_url,
              'count', dt.count
           )
        )
    ) as data
form
    apps a, lateral(
        select
            c.started_at_url,
            count(*) as count
        from
            conversations c
        where
            c.app_id = a.id /* here is why lateral is necessary */
            and c.started_at_url is not null
            and c.started_at > (current_date - (7  || ' days')::interval)::date
        group by
            c.started_at_url
        order by
            count(*) desc
        limit 5 /* get top 5 only */
    ) as dt
where
    a.id = 'ASnYW1-RgCl0I'
group by
    a.id

OBS:我还没有尝试过,所以可能会有拼写错误。如果您希望进行一些测试,可以提供样本数据集。

OBS 2:如果你真的在最终查询中按app_id进行过滤,那么你甚至不需要GROUP BY条款。