Question

我正在寻找一个表中ID的集合，这些ID是来自其他表的ID。总结我的问题有点困难，所以我举一个例子：

我有两个表，表Box和表Item。

CREATE TABLE box(
id bigint NOT NULL,
label varchar,
CONSTRAINT box_pk PRIMARY KEY (id));

CREATE TABLE item(
id bigint NOT NULL,
box bigint NOT NULL,
label varchar,
CONSTRAINT item_pk PRIMARY KEY (id),
CONSTRAINT box_fk FOREIGN KEY (box) REFERENCES box(id));

它们之间有多对一的引用，一个盒子可以包含很多项目，没有一个盒子就不能存在一个项目。

当前有很多盒子（> 100,000个）和项目（> 600,000个），即使大多数盒子有大约10个项目，也有相当数量的项目超过1,000个。

我需要对这些物品进行特定的处理，必须将一个物品与同一盒子中的所有其他物品（使用Java代码）进行比较。为了避免一次选择很多项目，我想尝试在满足一定组块大小的单个单元格（用逗号分隔）中重新组合所有框ID，此组块等于该组的最大项目数的盒子。

我唯一要做的是一个请求，该请求按框计数项目的数量：

SELECT b.id, count(i.*) as items 
FROM box b LEFT JOIN item i ON i.box = b.id 
WHERE i.box IS NOT NULL 
GROUP BY b.id 
ORDER BY items DESC

id   | items
3834 | 7206
78350| 6151
73525| 5996
3838 | 5192
71331| 5184
76842| 3982
76854| 3982
...

例如，如果我将大量项目设置为15000，我想要的结果将看起来像这样。 id_group将是一个文本列。

id_group          | total_amount
3834,78350        | 13357
73525,3838        | 11188
71331,76842,76854 | 13148

开始时不会有很多ID，但是后面的框中的项目较少，每个单元格中会有越来越多的ID达到组块限制，这就是我想要的！如果由于某种原因有一个盒子包含的商品超过了块限制，那么它将仅在单元格中返回该单个ID。不过，我不需要total_amount，我只需要用逗号连接的框的ID即可完成流程。

postgreSQL有办法做到这一点吗？

Answer 1

您可以使用递归CTE实施贪婪算法来组合框：

with recursive b as (
      select b.id, count(*) as items,
             row_number() over (order by count(*), b.id) as seqnum
      from box b join
           item i 
           on i.box = b.id 
      group by b.id 
     ),
     cte as (
      select b.id::text as ids, b.items as items, 1 as grp, 1 as seqnum
      from b
      where seqnum = 1
      union all
      select (case when b.items + cte.items < 15000
                   then cte.ids || ',' || b.id
                   else b.id::text
              end) as ids,
             (case when b.items + cte.items < 15000
                   then cte.items + b.items
                   else b.items
              end) as items,
             (case when b.items + cte.items < 15000
                   then cte.grp
                   else cte.grp + 1
              end) as grp,
             b.seqnum
      from cte join
           b
           on b.seqnum = cte.seqnum + 1
     )
select distinct on (grp) cte.*
from cte
order by grp, seqnum desc;

Here是db <>小提琴。

将对象的引用总和与最大限制分组

1 个答案: