获取链接行的详细信息

时间:2017-03-14 16:21:37

标签: sql-server tsql

我正试图获得一个“血统”或类似的东西,以及关于第一个和最后一个链接的信息(至少;一切都会好),在一个表之间有自引用链接的表“替换”和已替换它们的行。该表的结构如下:

myStruct.config[0] |= 0x1F00 | 0);

我坚持这种结构。 :-)它有点双重关联(是的,它有点愚蠢):每一行都有一个唯一的CREATE TABLE Thing ( Id INT PRIMARY KEY, TStamp DATETIME, Replaces INT NULL, ReplacedBy INT NULL ); ,然后被另一行“替换”的行将有一个非Id } NULL给出替换行的ReplacedBy,替换行也会有一个链接回到它在Id中替换的内容。因此,如果我们愿意,我们可以使用ReplacesReplaces(或两者)。

以下是一些示例数据:

ReplacedBy

因此1被11替换,2被12替换,12替换为22。

我希望以合理的方式从此表中获取每个链接链的以下信息:

  • 启动链的行的详细信息
  • 链中最后一行的详细信息
  • 链中间链接的详细信息或链中至少有多少链接(总数)

...按照应用于链中 last 行的日期范围进行过滤。

在一个理想的宇宙中,我会得到这样的东西:

+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−−−−−−+
| FirstId | LastId | Id | Links |   TStamp   |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−−−−−−+
|       1 |     11 |  1 |     2 | 2017−01−01 |
|       1 |     11 | 11 |     2 | 2017−01−11 |
|       2 |     22 |  2 |     3 | 2017−01−02 |
|       2 |     22 | 12 |     3 | 2017−01−12 |
|       2 |     22 | 22 |     3 | 2017−01−22 |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−−−−−−+

到目前为止,我有这个查询,我可以进行后期处理以获得上述内容:

INSERT INTO Thing
(Id, TStamp,       Replaces, ReplacedBy)
VALUES
(1,  '2017-01-01', NULL,       11),
(2,  '2017-01-02', NULL,       12),
(3,  '2017-01-03', NULL,     NULL),
(4,  '2017-01-04', NULL,     NULL),
(11, '2017-01-11',    1,     NULL),
(12, '2017-01-12',    2,       22),
(22, '2017-01-22',   12,     NULL);

这让我:

+−−−−+−−−−−−−−−−−−+−−−−−−−−−−+−−−−−−−−−−−−+−−−−−−−+
| Id | TStamp     | Replaces | ReplacedBy | Depth |
+−−−−+−−−−−−−−−−−−+−−−−−−−−−−+−−−−−−−−−−−−+−−−−−−−+
|  1 | 2017−01−01 |     NULL |         11 |     0 |
|  2 | 2017−01−02 |     NULL |         12 |     0 |
| 11 | 2017−01−11 |        1 |       NULL |     1 |
| 12 | 2017−01−12 |        2 |         12 |     0 |
| 12 | 2017−01−12 |        2 |         12 |     1 |
| 22 | 2017−01−13 |       12 |       NULL |     1 |
| 22 | 2017−01−13 |       12 |       NULL |     2 |
+−−−−+−−−−−−−−−−−−+−−−−−−−−−−+−−−−−−−−−−−−+−−−−−−−+

我可以使用这样的东西来计算(例如)每个链的最后一行:

WITH Data AS (
    SELECT  Id, Replaces, ReplacedBy, 0 AS Depth
    FROM    Thing
    UNION ALL
    SELECT  Thing.Id, Thing.Replaces, Thing.ReplacedBy, Depth + 1
    FROM    Data
    JOIN    Thing
    ON      Thing.Replaces = Data.Id
),
MaxData AS (
    SELECT  Data.Id, Data.Depth
    FROM    Data
    JOIN    (
        SELECT  Id, MAX(Depth) AS MaxDepth
        FROM    Data
        GROUP BY Id
    ) j ON data.Id = j.Id AND Data.Depth = j.MaxDepth
    WHERE   Depth > 0
)
SELECT  *
FROM    MaxData
ORDER BY
        Id;

......这给了我:

+−−−−+−−−−−−−+
| Id | Depth |
+−−−−+−−−−−−−+
| 11 |     1 |
| 12 |     1 |
| 22 |     2 |
+−−−−+−−−−−−−+

...但是我已经失去了起点和点。

我有强烈的感觉我错过了一些非常直接的东西 - 但很聪明 - 这会让我在很大程度上得到这个问题而不是后期处理,某种加入“min”和“max”查询(但不像我上面的那个)。它会是什么?

该表在WITH Data AS ( SELECT Id, TStamp, Replaces, ReplacedBy, 0 AS Depth FROM Thing UNION ALL SELECT Thing.Id, Thing.TStamp, Thing.Replaces, Thing.ReplacedBy, Depth + 1 FROM Data JOIN Thing ON Thing.Replaces = Data.Id ) SELECT * FROM Data WHERE ReplacedBy IS NOT NULL OR Depth > 0 ORDER BY Id, Depth; Replaces上没有任何索引,但我们可以添加任何所需的索引。该表只是很少使用(大约300k行,每天可能只有几百次更新/插入)。

我仅限于SQL Server 2008功能。

2 个答案:

答案 0 :(得分:3)

受到Gordon Linoff's answerHABO's comment的启发,突出了戈登正在做的事情,这很重要,我:

  • 删除了SQL Server 2012+ FIRST_VALUE函数,将其替换为数据“概述”查询中的CROSS JOIN
  • 在概述查询中包含Links计数
  • 删除了对Gordon tWHERE NOT EXISTS (SELECT 1 FROM Thing t2 WHERE t2.ReplacedBy = t.id)的依赖,其中(最后在SQL Server 2008上)没有绑定任何内容
  • 过滤掉未被替换的行

下面,我还添加了问题中提到的日期过滤

  

...按照应用于链中最后一行的日期范围进行过滤。

...戈登完全没有报道,并且改变了我们的方法,但只是在时间的箭头方面。

所以,首先,没有日期标准,坚持非常接近戈登的回答:

WITH Data AS (
    SELECT  Id AS FirstId, Id, TStamp, Replaces, ReplacedBy, 0 AS Depth
    FROM    Thing
    WHERE   Replaces IS NULL AND ReplacedBy IS NOT NULL
    UNION ALL
    SELECT  d.FirstId, t.Id, t.TStamp, t.Replaces, t.ReplacedBy, d.Depth + 1
    FROM    Data d
    JOIN    Thing t ON t.Replaces = d.Id
),
Overview AS (
    SELECT  FirstId, MAX(Id) AS LastId, COUNT(*) AS Links
    FROM    Data
    GROUP BY
            FirstId
)
SELECT  d.FirstId, o.LastId, d.Id, o.Links, d.Depth, d.TStamp
FROM    Data d
CROSS APPLY (
    SELECT  LastId, Links
    FROM    Overview
    WHERE   FirstId = d.FirstId
) o
ORDER BY
        d.FirstId, d.Depth
;

关键部分是将种子Id抓取为FirstId

SELECT  Id AS FirstId, Id, TStamp, Replaces, ReplacedBy, 0 AS Depth
FROM    Thing
WHERE   Replaces IS NULL AND ReplacedBy IS NOT NULL

然后通过递归连接的结果传播它:

SELECT  d.FirstId, t.Id, t.TStamp, t.Replaces, t.ReplacedBy, d.Depth + 1
FROM    Data d
JOIN    Thing t ON t.Replaces = d.Id

只需将其添加到我的原始查询中即可获得我想要的大部分内容。然后我们添加第二个查询以获取每个LastId的{​​{1}}(Gordon在分区上将其作为FirstId,但我不能在SQL Server 2008中执行此操作)并使用概述查询还可以让我获取链接数量。我们在FIRST_VALUE值的基础上交叉应用它,以获得我想要的整体结果。

上面的查询为示例数据返回以下内容:

+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
| FirstId | LastId | Id | Links | Depth | TStamp     |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
|       1 |     11 |  1 |     2 |     0 | 2017-01-01 |
|       1 |     11 | 11 |     2 |     1 | 2017-01-11 |
|       2 |     22 |  2 |     3 |     0 | 2017-01-02 |
|       2 |     22 | 12 |     3 |     1 | 2017-01-12 |
|       2 |     22 | 22 |     3 |     2 | 2017-01-13 |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+

...例如,正是我想要的,加上FirstId如果我想要(所以我知道中间链接的顺序)。

如果我们想要包含从未替换过的行,我们只需要更改

Depth

WHERE   Replaces IS NULL AND ReplacedBy IS NOT NULL

给我们:

+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
| FirstId | LastId | Id | Links | Depth | TStamp     |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
|       1 |     11 |  1 |     2 |     0 | 2017-01-01 |
|       1 |     11 | 11 |     2 |     1 | 2017-01-11 |
|       2 |     22 |  2 |     3 |     0 | 2017-01-02 |
|       2 |     22 | 12 |     3 |     1 | 2017-01-12 |
|       2 |     22 | 22 |     3 |     2 | 2017-01-13 |
|       3 |      3 |  3 |     1 |     0 | 2017-01-03 |
|       4 |      4 |  4 |     1 |     0 | 2017-01-04 |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+

但是我们忽略了问题所要求的日期标准:

  

...按照应用于链中最后一行的日期范围进行过滤。

要在不构建大量临时结果集的情况下执行此操作,我们必须向后工作:我们需要选择结尾,而不是选择起始点(链中的第一个条目WHERE Replaces IS NULL )。 指向(链中的最后一个条目,Replaces IS NULL),然后通过链反转我们的逻辑。这主要是因为:

  • 使用ReplacedBy IS NULL
  • 交换FirstId
  • LastId交换Replaces(方便桌子同时使用!)
  • 使用ReplacedBy获取链中的第一个ID,而不是MIN来获取最后一个
  • 使用MAX而不是d.Depth - 1
  • 然后,一旦我们在最终选择中知道它,就根据d.Depth + 1修复Depth,以获得那些值为0 =第一个链接而不是一些变化的负数的漂亮值:Links

所有这些都给了我们:

o.Links + d.Depth - 1 AS Depth

例如,如果我们使用

WITH Data AS (
    SELECT  Id AS LastId, Id, TStamp, Replaces, ReplacedBy, 0 AS Depth
    FROM    Thing
    WHERE   ReplacedBy IS NULL AND Replaces IS NOT NULL
    -- Filtering by date of last entry would go here
    UNION ALL
    SELECT  d.LastId, t.Id, t.TStamp, t.Replaces, t.ReplacedBy, d.Depth - 1
    FROM    Data d
    JOIN    Thing t ON t.ReplacedBy = d.Id
),
Overview AS (
    SELECT  LastId, MIN(Id) AS FirstId, COUNT(*) AS Links
    FROM    Data
    GROUP BY
            LastId
)
SELECT  o.FirstId, d.LastId, d.Id, o.Links, o.Links + d.Depth - 1 AS Depth, d.TStamp
FROM    Data d
CROSS APPLY (
    SELECT  FirstId, Links
    FROM    Overview
    WHERE   LastId = d.LastId
) o
ORDER BY
        o.FirstId, d.Depth
;

我在哪里

AND     TStamp BETWEEN '2017-01-12' AND '2017-02-01'

以上,我们的样本数据得到了这个结果:

+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
| FirstId | LastId | Id | Links | Depth |   TStamp   |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+
|       2 |     22 |  2 |     3 |     0 | 2017−01−02 |
|       2 |     22 | 12 |     3 |     1 | 2017−01−12 |
|       2 |     22 | 22 |     3 |     2 | 2017−01−13 |
+−−−−−−−−−+−−−−−−−−+−−−−+−−−−−−−+−−−−−−−+−−−−−−−−−−−−+

...因为-- Filtering by date of last entry would go here 链的最后一个链接超出了日期范围,所以我们不包括它。

答案 1 :(得分:2)

这有点棘手。安排CTE从每个列表的开头开始。这使后续处理更容易:

WITH Data AS (
      SELECT Id as FirstId, Id, TStamp, Replaces, ReplacedBy, 0 AS Depth
      FROM Thing t
      WHERE NOT EXISTS (SELECT 1 FROM Thing t2 WHERE t2.ReplacedBy = t.id)
      UNION ALL
      SELECT  d.FirstId, t.Id, t.TStamp, t.Replaces, t.ReplacedBy, d.Depth + 1
      FROM Data d JOIN
           Thing t
           ON t.Replaces = d.Id
     )
SELECT d.*,
       FIRST_VALUE(id) OVER (PARTITION BY FirstId ORDER BY Depth DESC) as LastId
FROM Data d;

然后,您可以使用FIRST_VALUE()反向排序来获取链中的最后一个值。

这将返回没有链接的链。您可以添加过滤器以删除它们。