查找所有子项完全匹配的父级ID

时间:2018-02-09 05:18:40

标签: sql sql-server tsql set

场景

假设我们有一组表示四个关键概念的数据库表:

  1. 实体类型(例如帐户,客户等)
  2. 实体(例如上述实体类型的实例)
  3. 同类群组(指定群组)
  4. 群组成员(构成群组成员的实体)
  5. 群组的规则是:

    1. 一个群组总是至少有一个群组成员。
    2. 同类群组成员必须是该群组的唯一成员(即实体5不能成为群组3的成员两次,但它可能是群组3和群组4的成员)
    3. 虽然一个群组可能合法地成为另一个群组的子集,但没有两个群组在成员资格方面完全相同。
    4. 实体的规则是:

      1. 没有两个实体可能具有相同的值对(business_key, entity_type_id)
      2. 具有不同entity_type_id的两个实体可能共享business_key
      3. 因为图片代表了一千行代码,所以这里是ERD:

        ERD of Entities and Cohorts

        问题

        我想要一个SQL查询,当提供(business_key, entity_type_id)对的集合时,将搜索与完全匹配的同类群组,如果只有cohort_id则返回一行群组存在,否则为零行。

        即。 - 如果实体集与entity_ids 1和2匹配,则只会返回cohort_id cohort_members正好为1和2,而不只是1,而不仅仅是2,而不是同类群组使用entity_ids 1 2和3.如果不存在满足此要求的同类群组,则返回零行。

        测试用例

        为了帮助人们解决这个问题,我创建了一个表格的小提琴以及一些定义各种实体类型,实体和同类群组的数据。还有一个表格,其中包含用于匹配的测试数据,名为test_cohort。它包含6个测试队列,用于测试各种场景。前5个测试应该完全匹配一个队列。第6次测试是一个测试零行条款的虚假测试。使用测试表时,关联的INSERT语句应该只有一行未注释(请参阅小提琴,它最初设置如下):

        http://sqlfiddle.com/#!18/2d022

        我在SQL中的尝试如下,虽然它未通过测试#2和#4(可以在小提琴中找到):

        SELECT actual_cohort_member.cohort_id
        FROM test_cohort
        INNER JOIN entity
            ON entity.business_key = test_cohort.business_key
            AND entity.entity_type_id = test_cohort.entity_type_id
        INNER JOIN cohort_member AS existing_potential_member
            ON existing_potential_member.entity_id = entity.entity_id
        INNER JOIN cohort
            ON cohort.cohort_id = existing_potential_member.cohort_id
        RIGHT OUTER JOIN cohort_member AS actual_cohort_member
            ON actual_cohort_member.cohort_id = cohort.cohort_id
            AND actual_cohort_member.cohort_id = existing_potential_member.cohort_id
            AND actual_cohort_member.entity_id = existing_potential_member.entity_id
        GROUP BY actual_cohort_member.cohort_id
        HAVING
            SUM(CASE WHEN
                actual_cohort_member.cohort_id = existing_potential_member.cohort_id AND
                actual_cohort_member.entity_id = existing_potential_member.entity_id THEN 1 ELSE 0
            END) = COUNT(*)
        ;
        

2 个答案:

答案 0 :(得分:2)

这种情况可以通过在WHERE子句中添加复合条件来实现,因为您要与一对值进行比较。然后,您必须根据WHERE子句中设置的条件以及cohort_id的总行数来计算结果。

SELECT  c.cohort_id
FROM    cohort c
        INNER JOIN cohort_member cm
            ON c.cohort_id = cm.cohort_id
        INNER JOIN entity e
            ON cm.entity_id = e.entity_id
WHERE   (e.entity_type_id = 1 AND e.business_key = 'acc1')      -- condition here
         OR (e.entity_type_id = 1 AND e.business_key = 'acc2')
GROUP   BY c.cohort_id
HAVING  COUNT(*) = 2                                            -- number must be the same to the total number of condition
        AND (SELECT COUNT(*) 
             FROM cohort_member cm2 
             WHERE cm2.cohort_id = c.cohort_id) = 2             -- number must be the same to the total number of condition

正如您在上面的测试用例中所看到的,过滤器中的值取决于WHERE子句中的条件数。建议在此创建动态查询。

<强>更新

如果表test_cohort只包含一个场景,那么这将满足您的要求,但是,如果test_cohort包含场景列表,那么您可能希望查看其他答案,因为此解决方案不改变任何表模式。

SELECT  c.cohort_id
FROM    cohort c
        INNER JOIN cohort_member cm
            ON c.cohort_id = cm.cohort_id
        INNER JOIN entity e
            ON cm.entity_id = e.entity_id
        INNER JOIN test_cohort tc
            ON tc.business_key = e.business_key
                AND tc.entity_type_id = e.entity_type_id
GROUP   BY c.cohort_id
HAVING  COUNT(*) = (SELECT COUNT(*) FROM test_cohort)
        AND (SELECT COUNT(*) 
             FROM cohort_member cm2 
             WHERE cm2.cohort_id = c.cohort_id) = (SELECT COUNT(*) FROM test_cohort)

答案 1 :(得分:1)

我在i表中添加了一列test_cohort,以便您可以同时测试所有方案。这是一个DDL

CREATE TABLE test_cohort (
i int,
business_key NVARCHAR(255),
entity_type_id INT
);

INSERT INTO test_cohort VALUES
(1, 'acc1', 1), (1, 'acc2', 1) -- TEST #1: should match against cohort 1
,(2, 'cli1', 2), (2, 'cli2', 2) -- TEST #2: should match against cohort 2
,(3, 'cli1', 2) -- TEST #3: should match against cohort 3
,(4, 'acc1', 1), (4, 'acc2', 1), (4, 'cli1', 2), (4, 'cli2', 2) -- TEST #4: should match against cohort 4
,(5, 'acc1', 1), (5, 'cli2', 2) -- TEST #5: should match against cohort 5
,(6, 'acc1', 3), (6, 'cli2', 3) -- TEST #6: should not match any cohort

查询:

select
    c.i, m.cohort_id
from
    (
        select 
            *, cnt = count(*) over (partition by i)
        from 
            test_cohort
    ) c
    join entity e on c.entity_type_id = e.entity_type_id and c.business_key = e.business_key
    join (
        select
            *, cnt = count(*) over (partition by cohort_id)
        from
            cohort_member
    ) m on e.entity_id = m.entity_id and c.cnt = m.cnt
group by m.cohort_id, c.cnt, c.i
having count(*) = c.cnt

输出

i   cohort_id
------------
1   1
2   2
3   3
4   4
5   5

想法是计算加入前的行数。并按完全匹配进行比较