递归SQL - 计算层次结构中后代的数量

时间:2016-04-16 09:02:43

标签: sql postgresql recursion common-table-expression hierarchical-data

考虑一个包含以下列的数据库表:

  • mathematician_id
  • 名称
  • advisor1
  • advisor2

数据库代表Math Genealogy Project的数据,每个数学家通常只有一个顾问,但有两个顾问的情况。

视觉辅助使事情更清晰:

enter image description here

如何计算每位数学家的后代数?

我应该使用公用表格表达式( WITH RECURSIVE ),但我现在几乎陷入困境。我发现的所有类似例子都涉及只有一个父级而不是两个父级的层次结构。

更新

我将Vladimir Baranov提供的SQL Server解决方案改编为也适用于PostgreSQL:

WITH RECURSIVE cte AS (
    SELECT m.id as start_id,
        m.id,
        m.name,
        m.advisor1,
        m.advisor2,
        1 AS level
    FROM public.mathematicians AS m

    UNION ALL

    SELECT cte.start_id,
        m.id,
        m.name,
        m.advisor1,
        m.advisor2,
        cte.level + 1 AS level
    FROM public.mathematicians AS m
    INNER JOIN cte ON cte.id = m.advisor1
               OR cte.id = m.advisor2
),
cte_distinct AS (
    SELECT DISTINCT start_id, id 
    FROM cte
)
SELECT cte_distinct.start_id,
    m.name,
    COUNT(*)-1 AS descendants_count
FROM cte_distinct
INNER JOIN public.mathematicians AS m ON m.id = cte_distinct.start_id
GROUP BY cte_distinct.start_id, m.name
ORDER BY cte_distinct.start_id

1 个答案:

答案 0 :(得分:4)

您没有说出您使用的DBMS。我将在此示例中使用SQL Server,但它也适用于支持递归查询的其他数据库。

示例数据

我从Euler开始只输入你树的右边部分。

最有趣的部分是Lagrange和Dirichlet之间的多条路径。

DECLARE @T TABLE (ID int, name nvarchar(50), Advisor1ID int, Advisor2ID int);

INSERT INTO @T (ID, name, Advisor1ID, Advisor2ID) VALUES
(1,  'Euler', NULL, NULL),
(2,  'Lagrange', 1, NULL),
(3,  'Laplace', NULL, NULL),
(4,  'Fourier', 2, NULL),
(5,  'Poisson', 2, 3),
(6,  'Dirichlet', 4, 5),
(7,  'Lipschitz', 6, NULL),
(8,  'Klein', NULL, 7),
(9,  'Lindemann', 8, NULL),
(10, 'Furtwangler', 8, NULL),
(11, 'Hilbert', 9, NULL),
(12, 'Taussky-Todd', 10, NULL);

这就是它的样子:

SELECT * FROM @T;

+----+--------------+------------+------------+
| ID |     name     | Advisor1ID | Advisor2ID |
+----+--------------+------------+------------+
|  1 | Euler        | NULL       | NULL       |
|  2 | Lagrange     | 1          | NULL       |
|  3 | Laplace      | NULL       | NULL       |
|  4 | Fourier      | 2          | NULL       |
|  5 | Poisson      | 2          | 3          |
|  6 | Dirichlet    | 4          | 5          |
|  7 | Lipschitz    | 6          | NULL       |
|  8 | Klein        | NULL       | 7          |
|  9 | Lindemann    | 8          | NULL       |
| 10 | Furtwangler  | 8          | NULL       |
| 11 | Hilbert      | 9          | NULL       |
| 12 | Taussky-Todd | 10         | NULL       |
+----+--------------+------------+------------+

<强>查询

这是一个经典的递归查询,有两个有趣的点。

1)CTE的递归部分使用Advisor1IDAdvisor2ID连接到锚点部分:

        INNER JOIN CTE 
            ON CTE.ID = T.Advisor1ID
            OR CTE.ID = T.Advisor2ID

2)由于可能有多个到后代的路径,递归查询可能会多次输出该节点。为了消除这些重复项,我在DISTINCT中使用了CTE_Distinct。有可能更有效地解决它。

更好地了解查询的工作原理,分别运行每个CTE并检查中间结果。

WITH
CTE
AS
(
    SELECT
        T.ID AS StartID
        ,T.ID
        ,T.name
        ,T.Advisor1ID
        ,T.Advisor2ID
        ,1 AS Lvl
    FROM @T AS T

    UNION ALL

    SELECT
        CTE.StartID
        ,T.ID
        ,T.name
        ,T.Advisor1ID
        ,T.Advisor2ID
        ,CTE.Lvl + 1 AS Lvl
    FROM
        @T AS T
        INNER JOIN CTE 
            ON CTE.ID = T.Advisor1ID
            OR CTE.ID = T.Advisor2ID
)
,CTE_Distinct
AS
(
    SELECT DISTINCT
        StartID
        ,ID
    FROM CTE
)
SELECT
    CTE_Distinct.StartID
    ,T.name
    ,COUNT(*) AS DescendantCount
FROM
    CTE_Distinct
    INNER JOIN @T AS T ON T.ID = CTE_Distinct.StartID
GROUP BY
    CTE_Distinct.StartID
    ,T.name
ORDER BY CTE_Distinct.StartID;

<强>结果

+---------+--------------+-----------------+
| StartID |     name     | DescendantCount |
+---------+--------------+-----------------+
|       1 | Euler        |              11 |
|       2 | Lagrange     |              10 |
|       3 | Laplace      |               9 |
|       4 | Fourier      |               8 |
|       5 | Poisson      |               8 |
|       6 | Dirichlet    |               7 |
|       7 | Lipschitz    |               6 |
|       8 | Klein        |               5 |
|       9 | Lindemann    |               2 |
|      10 | Furtwangler  |               2 |
|      11 | Hilbert      |               1 |
|      12 | Taussky-Todd |               1 |
+---------+--------------+-----------------+

此处DescendantCount将节点本身计为后代。如果要为叶节点看0而不是1,则可以从此结果中减去1。

这是SQL Fiddle

相关问题