我有两张具有以下架构的表
GROUP_ID | PURCHASE_ID |ITEMS ---> TABLE1
1 21 X
1 21 Y
1 21 Z
2 22 X
GROUP_ID |CUSTOMER_ID |ITEMS --->TABLE2
1 ABC X
1 ABC Y
1 ABC Z
1 ABC A
1 ABC B
单个GROUP_ID和PURCHASE_ID可以有多个项目,类似地,单个GROUP_ID和CUSTOMER_ID可以有多个ITEMS。每个GROUP_ID和每个PURCHASE_ID只购买两个或三个项目但是给定的CUSTOMER_ID,GROUP_ID可以有一个数字物品。
我想查询每个GROUP_ID和PURCHASE_ID以及ITEMS [set],有多少客户至少购买了所有ITEMS。
select distinct GROUP_ID,PURCHASE_ID,count(object_id)over(partition by GROUP_ID,PURCHASE_ID) from
(select a.GROUP_ID GROUP_ID,a.PURCHASE_ID PURCHASE_ID,b.CUSTOMER_ID object_id from
(select GROUP_ID,PURCHASE_ID,items,count(items)over(partition by GROUP_ID,PURCHASE_ID) val from TABLE1)a,
(select GROUP_ID,CUSTOMER_ID,ITEMS from TABLE2)b
where a.GROUP_ID=b.GROUP_ID and a.items=b.ITEMS and val=3
group by a.GROUP_ID,a.PURCHASE_ID,b.CUSTOMER_ID
having count(*)=3)
对于GROUP_ID = 1和PURCHASE_ID = 21的上述给定数据,计数应为1,因为存在具有ID ABC的客户,其具有项[X,Y,Z]的子集 我已经编写了代码来获取上述三个项目的客户数量。有没有办法优化这个或实现这一目标。
非常感谢任何帮助
答案 0 :(得分:1)
这是一个棘手的问题;我通常在所有必需的列上加入两个表,并查找与之匹配的不同计数:
select t1.group_id,
t1.purchase_id,
count(distinct t2.customer_id) as customer_count
from Table1 as t1
inner join Table2 as t2
on t2.group_id = t1.group_id
and t2.items = t1.items
group by t1.group_id,
t1.purchase_id
having count(distinct t2.items) >= count(distinct t1.items)
这是未经测试的,请尝试一下,让我知道它是否有效。