Question

我遇到了一个mssql数据库，有一个SQL查询，比如......

SELECT id, type, start, stop, one, two, three, four
FROM a
UNION ALL
SELECT id, type, start, stop, one, two, three, four
FROM b
UNION ALL
SELECT id, type, start, stop, one, two, three, four
FROM c
ORDER BY type ASC

导致......

row |  id  type  start       stop         one   two    three   four
----+--------------------------------------------------------------
 1  |  1   a     2010-01-01  2010-01-31   100   1000   1000    100
 2  |  1   a     2010-02-01  2010-12-31   100   500    500     50
 3  |  1   b     2010-01-01  2010-01-31   100   NULL   NULL    100
 4  |  1   b     2010-01-01  2010-12-31   100   NULL   NULL    100
 5  |  1   c     2010-01-01  2010-01-31   0     NULL   NULL    100
 6  |  1   c     2010-01-01  2010-12-31   0     NULL   NULL    100

但是，我更愿意选择以下结果...

row |  id  type  start       stop         one   two    three   four
----+--------------------------------------------------------------
 1  |  1   a     2010-01-01  2010-01-31   100   1000   1000    100
 2  |  1   a     2010-02-01  2010-12-31   100   500    500     50
 4  |  1   b     2010-01-01  2010-12-31   100   NULL   NULL    100
 6  |  1   c     2010-01-01  2010-12-31   0     NULL   NULL    100

也就是说，消除了第3行和第5行，因为它们以各种方式连接到第4行和第6行但停止 - 列，而不幸的行有 stop - 列中的最低值将被删除。

我怎样才能做到这一点？我一直在想类似......

SELECT * FROM (
    SELECT id, type, start, stop, one, two, three, four
    FROM a
    UNION ALL
    SELECT id, type, start, stop, one, two, three, four
    FROM b
    UNION ALL
    SELECT id, type, start, stop, one, two, three, four
    FROM c
    ORDER BY type ASC
) AS types
GROUP BY ... HAVING ???

我需要指导，请帮助。

（不，我无法改变任何条件，我必须处理特定的情况。）

Answer 1

这应该有效：

SELECT
     id,
     type,
     start,
     stop,
     one,
     two,
     three,
     four
FROM
     A T1
LEFT OUTER JOIN A T2 ON
     T2.id = T1.id AND
     T2.type = T1.type AND
     T2.start = T1.start AND
     T2.one = T1.one AND
     ...
     T2.stop > T1.stop
WHERE
     T2.id IS NULL     -- This must be a NOT NULL column for this to work

这假设type列与表示名中的值相同。如果表之间可能有重复的行，那么你需要使用你所拥有的子查询而不是A来执行相同的逻辑。如果我的假设是正确的，那么只需用上面的三个UNION ALL查询替换每一个，改变表名。

这个想法是，如果存在匹配的行，但是稍后停止日期，那么您不希望在结果中包含该行。使用LEFT OUTER JOIN，T2.id将为NULL的唯一方法是，如果没有这样的匹配，那么我们可以将它包含在结果集中（这就是为什么id必须是NOT NULL列才能工作。）

既然你说你不能改变数据库我会饶恕你，“这个设计很糟糕”的谴责;）

Answer 2

已经提出并回答了类似的问题。例如：Select uniques, and one of the doubles

你的情况更简单（如果我理解你的问题描述）：

select id, type, start, max(stop), one, two, three, four
    from (...) types
    group by id, type, start, one, two, three, four
    order by ...

代替(...)，您可以从a，b和c中选择。只需省略order by条款。

或者，如果代替（id，type，start） - ＆gt;（一，二，三，四）你有（id，type，start，stop） - ＆gt;（一，二，三，四）（意味着您必须选择其他对应的列到max（stop）），这个查询通常会产生合理的执行计划：

select id, type, start, stop, one, two, three, four
    from (...) types
    where stop = (select max(stop)
                  from (...) t2
                  where t2.id = types.id
                        and t2.type = types.type
                        and t2.start = types.start)

但它取决于数据在源表之间的分布方式以及存在的索引。在某些情况下，上述链接的解决方案可能仍会更好。

从记录集中删除dupe，排除来自欺骗条件的列

2 个答案: