比使用“A UNION(B in A)”更高效的SQL?

时间:2010-01-22 21:27:04

标签: sql union where

编辑1(澄清):感谢您到目前为止的答案!回应令人满意 我想稍微澄清一下这个问题,因为根据答案,我认为我没有正确描述问题的一个方面(而且我确信这是我的错,因为即使是我自己也难以定义它)。 /> 以下是结果:结果集应仅包含tstamp BETWEEN'2010-01-03'和'2010-01-09'的记录,以及 one 记录,其中每个order_num的tstamp为IS NULL在第一组中(总是是每个order_num的null tstamp)。
到目前为止给出的答案似乎包括某个order_num的所有记录,如果有任何 tstamp BETWEEN'2010-01-03'和'2010-01-09' 。例如,如果有另一条记录,其中order_num = 2且tstamp = 2010-01-12 00:00:00,则将包含在结果中。

原始问题:
考虑一个包含id(唯一),order_num,tstamp(时间戳)和item_id(订单中包含的单个项目)的订单表。 tstamp为null,除非订单已被修改,在这种情况下,有另一条记录具有相同的order_num,而tstamp则包含更改发生时的时间戳。

示例...

id  order_num  tstamp               item_id
__  _________  ___________________  _______
 0          1                           100
 1          2                           101
 2          2  2010-01-05 12:34:56      102
 3          3                           113
 4          4                           124
 5          5                           135
 6          5  2010-01-07 01:23:45      136
 7          5  2010-01-07 02:46:00      137
 8          6                           100
 9          6  2010-01-13 08:33:55      105

检索在特定日期范围内修改过一次或多次的所有订单(基于order_num)的最有效的SQL语句是什么?换句话说,对于每个订单,我们需要具有相同order_num的所有记录(包括具有NULL tstamp的那个),对于每个order_num WHERE,order_num中的至少一个具有tstamp NOT NULL和tstamp BETWEEN'2010-01-03'和'2010-01-09'。它是“我们遇到困难时,至少有一个order_num的tstamp NOT NULL”。

结果集应如下所示:

id  order_num  tstamp               item_id
__  _________  ___________________  _______
 1          2                           101
 2          2  2010-01-05 12:34:56      102
 5          5                           135
 6          5  2010-01-07 01:23:45      136
 7          5  2010-01-07 02:46:00      137

我想出的SQL就是这个,它本质上是“A UNION(B in A)”,但它执行起来很慢,我希望有一个更有效的解决方案:

SELECT history_orders.order_id, history_orders.tstamp, history_orders.item_id
FROM
   (SELECT orders.order_id, orders.tstamp, orders.item_id
    FROM orders
    WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09')
    AS history_orders
UNION
SELECT current_orders.order_id, current_orders.tstamp, current_orders.item_id
FROM
   (SELECT orders.order_id, orders.tstamp, orders.item_id
    FROM orders
    WHERE orders.tstamp IS NULL)
    AS current_orders
WHERE current_orders.order_id IN
   (SELECT orders.order_id
    FROM orders
    WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09');

6 个答案:

答案 0 :(得分:3)

也许是子查询:

select * from order o where o.order_num in (select distinct
  order_num from order where tstamp between '2010-01-03' and '2010-01-09')

答案 1 :(得分:1)

除非我误解了,否则这样的事情应该可以解决问题:

SELECT o1.id, o1.order_num, o.tstamp, o.item_id
FROM  orders o1
WHERE EXISTS(
    SELECT * FROM orders o2 
    WHERE o1.order_num = o2.order_num 
        AND o2.tstamp BETWEEN '2010-01-03' AND '2010-01-09')

使用EXISTS的好处是,一旦罚款第一场比赛就会停止。

答案 2 :(得分:1)

我知道复制已经很晚了,但我刚看到这个帖子,我想也许我应该尝试一次,这个查询怎么样,与上述所有解决方案相比,它真的非常小,并且解决了目的。

select * from orders_gc where order_num in 
    (select order_num
     from orders_gc 
     group by order_num 
     having count(id) > 1 and 
     MAX(tstamp) between '03-jan-2010' and '09-jan-2010')

答案 3 :(得分:0)

希望我的问题是正确的。这应该返回订单中已经在提供的时间戳内更改的所有订单。

SELECT o.order_id, o.tstamp, o.item_id
FROM orders o
JOIN ( SELECT DISTINCT o2.order_num
       FROM orders o2
       WHERE o2.tstamp BETWEEN '2010-01-03' AND '2010-01-09' ) o3
ON ( o3.order_num = o.order_num )

答案 4 :(得分:0)

您可以自行加入表格。简化,这看起来像:

select order_id
from orders all_orders
inner join orders not_null_orders
    on all_orders.order_id = not_null_orders.order_id
where
    not_null_orders.tstamp is not null
    and all_orders.tstamp between '2010-01-03' AND '2010-01-09'

答案 5 :(得分:0)

再次感谢您提出的所有建议。我找到了三种有效的解决方案,包括我的原创。最后,我添加了一些性能结果,这些效果并不像我希望的那么好。如果有人能改进这一点,我会很激动!

1)到目前为止找到的最佳解决方案似乎是:

SELECT history_orders.order_id, history_orders.tstamp, history_orders.item_id
FROM
   (SELECT orders.order_id, orders.tstamp, orders.item_id
    FROM orders
    WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09'
    OR orders.tstamp IS NULL)
    AS history_orders
WHERE history_orders.order_id IN
   (SELECT orders.order_id
    FROM orders
    WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09');

2)我也尝试使用EXISTS代替IN,这需要在最后一个SELECT中增加一个WHERE子句:

SELECT history_orders.order_id, history_orders.tstamp, history_orders.item_id
FROM
   (SELECT orders.order_id, orders.tstamp, orders.item_id
    FROM orders
    WHERE orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09'
    OR orders.tstamp IS NULL)
    AS history_orders
WHERE EXISTS
   (SELECT orders.order_id
    FROM orders
    WHERE history_orders.order_id = orders.order_id
    AND orders.tstamp BETWEEN '2010-01-03' AND '2010-01-09');

3)最后是使用UNION的原始解决方案。

评论:
为了评论表大小,我的实际“真实世界”问题涉及4个表(与内连接相连),分别包含98,2189,43897,785656个记录。

性能 - 我运行了三次解决方案,这是我的真实世界结果:
1:52,51,51秒
2:54,54,53 s 3:56,56,56 s