查找具有重叠范围的行

时间:2017-02-16 22:56:43

标签: postgresql

假设我的数据如下所示:

    create table tab(id smallint, nums int4range)
    insert into tab values (1, int4range(1,10)), (2, int4range(1,20)), (3,int4range(3,8)), (4,int4range(15,25)), (5,int4range(3,8))

然后select * from tab给出:

 id |  nums
----+---------
  1 | [1,10)
  2 | [1,20)
  3 | [3,8)
  4 | [15,25)
  5 | [3,8)

我想要一个查询,它可以找到由这些范围的交集形成的范围以及属于这些子范围的id。所以输出看起来像某种形式:

  nums  | ids
--------+------------
[1,3)   | 1, 2
[3,8)   | 1, 2, 3, 5
[8,10)  | 1, 2
[10,15) | 2
[15,20) | 2, 4
[20,25) | 4

我对'ids'列的输出不可知 - 数组似乎是合乎逻辑的,但我完全满足给定范围内第一,第二,第三......第n个id的列。

我知道不会有超过五个具有重叠范围的ID,因此根据需要使用空值的固定数量的列完全没问题。我也知道,如果重要的话,就没有没有ID的范围。

感谢您提供的任何帮助。

3 个答案:

答案 0 :(得分:1)

重叠范围

如果您想要重叠范围:

WITH all_intersections
AS
(
SELECT
    t1.id AS id1, 
    t2.id AS id2, 
    t1.nums * /* intersection */ t2.nums AS nums 
FROM
    tab t1 CROSS JOIN tab t2
WHERE
    t1.id <= t2.id  /* Need only 1/2 + diagonal */
),
unique_nums AS
(
SELECT DISTINCT
    nums
FROM
    all_intersections
WHERE 
    nums <> 'empty' 
)
SELECT 
    nums, 
    array(SELECT DISTINCT id1 AS id 
            FROM all_intersections a1 
           WHERE a1.nums = a0.nums
          UNION
          SELECT DISTINCT id2 AS id 
            FROM all_intersections a2 
           WHERE a2.nums = a0.nums
          ORDER BY id
         ) AS ids
FROM
    unique_nums a0 
ORDER BY
    nums ;

结果如下:

|    nums |     ids |
|---------|---------|
|  [1,10) |     1,2 |
|  [1,20) |       2 |
|   [3,8) | 1,2,3,5 |
| [15,20) |     2,4 |
| [15,25) |       4 |

您可以在http://sqlfiddle.com/#!15/f83d5/5/0

查看

非重叠范围

如果您想获得非重叠范围(例如您的示例),可以使用以下CTE完成此操作:

WITH bounds AS         /* all bounds */
(
SELECT DISTINCT
    lower(nums) AS b
FROM
    tab
UNION
SELECT DISTINCT
    upper(nums) AS b
FROM 
    tab
),
range_bounds AS        /* pairs of consecutive bounds */
(
SELECT
    b, lead(b) OVER (ORDER BY b) AS next_b 
FROM
    bounds
),
ranges AS              /* convert the pairs to ranges */
(
SELECT
    int4range(b, next_b) AS nums
FROM
    range_bounds 
WHERE
    next_b is not null  -- ignore last
)
SELECT                 /* take every range and find intersection with originals */
    nums, 
    ARRAY
      (SELECT id 
        FROM tab
       WHERE tab.nums && ranges.nums
      ) AS ids
FROM 
    ranges ;

执行结果是:

|    nums |     ids |
|---------|---------|
|   [1,3) |     1,2 |
|   [3,8) | 1,2,3,5 |
|  [8,10) |     1,2 |
| [10,15) |       2 |
| [15,20) |     2,4 |
| [20,25) |       4 |

这是你的例子的结果。

这假定:

  • 构建的所有范围都包含下限[并排除上限)。 [在其他情况下,它不会产生正确的结果。]

这个想法是:

  1. 你取得范围的所有界限(无论是低位还是高位)
  2. 对它们进行排序
  3. 从任意两个连续边界中制作范围
  4. 查看与其重叠的原始范围以构建ids
  5. http://sqlfiddle.com/#!15/f83d5/10/0

    上查看

    注意:如果您想通过纯替换来避免CTE,可以进一步压缩

    SELECT 
        nums, ARRAY
              (SELECT id 
                 FROM tab
                WHERE tab.nums && ranges.nums
               ) AS ids
    FROM 
        (SELECT
            int4range(b, next_b) AS nums
        FROM
            (SELECT
                b, lead(b) OVER (ORDER BY b) AS next_b 
            FROM
                (SELECT DISTINCT lower(nums) AS b FROM tab
                 UNION
                 SELECT DISTINCT upper(nums) AS b FROM tab
                ) AS bounds
            ) AS range_bounds 
        WHERE
            next_b is not null
        ) AS ranges 
    ORDER BY
      nums ;
    

    http://sqlfiddle.com/#!15/f83d5/15/0

    上查看

答案 1 :(得分:1)

SELECT uniquenums.nums, array_agg(id) ids
FROM (
        SELECT numsgroup, int4range(min(boundary), max(boundary)) nums
        FROM (
                SELECT boundary, row_number() OVER (ORDER BY boundary, seriesvalue) / 2 AS numsgroup
                FROM (
                        SELECT DISTINCT upper(nums) AS boundary FROM tab
                        UNION
                        SELECT DISTINCT lower(nums) AS boundary FROM tab
                ) AS A
                JOIN (
                        SELECT generate_series(1, 2) AS seriesvalue
                ) AS B ON true
        ) AS A
        GROUP BY numsgroup
        HAVING COUNT(*) > 1
) AS uniquenums
JOIN tab ON tab.nums && uniquenums.nums
GROUP BY uniquenums.nums
ORDER BY uniquenums.nums

它是如何运作的?

  1. 提取所有不同的边界,无论是低层还是高层
  2. 通过将帮助表表达式与两行
  3. 连接来复制每个边界
  4. 为每个结果行分配一个组号,以便为两个连续的边界分配相同的组号
  5. 按这些数字分组并使用连续边界构建新范围
  6. 在标签中查找与刚刚计算的范围重叠的范围
  7. 汇总数组中找到的范围的ID

答案 2 :(得分:1)

select rng as nums, array_agg(id) as ids
from (  
    select int4range(n, lead(n) over (order by n)) as rng
    from (  
        select distinct lower(nums) n
        from tab
        union
        select distinct upper(nums) n
        from tab
        ) s
    ) s
join tab on rng && nums
group by 1
order by 1;

  nums   |    ids    
---------+-----------
 [1,3)   | {1,2}
 [3,8)   | {1,2,3,5}
 [8,10)  | {1,2}
 [10,15) | {2}
 [15,20) | {2,4}
 [20,25) | {4}
(6 rows)