子查询返回不正确的值

时间:2013-08-13 23:54:08

标签: sql oracle

让我先说明我是一名经理而且我已经有很长一段时间没有完成这项工作,你会看到。但由于各种原因,我不得不介绍一些SQL编程,直到我获得更多的人数。是的,我会先说我在这方面是一个不称职的白痴。

我所拥有的是一个非常非常长的SQL语句,其中包含来自各种表和各种子查询的大量选择。查询大约有400行。它运行正常,直到我尝试添加某个子查询。该子查询返回错误的值。当我将子查询分解为几个较短的测试查询以进行故障排除时,它们会返回正确的值。这是不起作用的组合。我确信它必须与我加入的方式有关。

我正在尝试获得存储在两个表中的总收入金额:一个表中包含当前值,另一个表具有历史值。这些值在客户级别上,而customer表与其他两者都是一对多的。两个收入表的结构相同,没有共同的记录。一个是另一个的历史档案。我想要做的就是在客户层面上总结两个表之间的收入值。

这是不起作用的子查询。它应该是current_revenue和historical_revenue的总和:

select c.id1, c.id2,
(select (sum(oe.revenue1)+sum(oe.revenue2)+sum(h.revenue1)+sum(h.revenue2))*.01 
     from order_entry oe, order_history h
     where c.id1 = oe.id1
     and c.id2 = oe.id2
     and c.id1 = h.id1
     and c.id2 = h.id2
     and oe.order_type in ('01','02','03','04')
     and oe.order_status = 'CLOSED'
     and h.order_type in ('01','02','03','04')
     and h.order_status = 'CLOSED') as total_revenue
from customer c
where c.id1 = '1234'
and c.id2 = '5678'
--query incorrectly returns $4460      
--this query is adding the $1500 in twice (see below)

以下是两个有效的测试查询。它们是相同的,除了表名:

select c.id1, c.id2,
(select (sum(oe.revenue1)+sum(oe.revenue2))*.01
     from order_entry oe
     where c.id1 = oe.id1
     and c.id2 = oe.id2
     and oe.order_type in ('01','02','03','04')
     and oe.order_status = 'CLOSED') as current_revenue
from customer c
where c.id1 = '1234'
and c.id2 = '5678'
--query correctly returns $1460


select c.id1, c.id2,
(select (sum(h.revenue1)+sum(h.revenue2))*.01
     from order_history h
     where c.id1 = h.id1
     and c.id2 = h.id2
     and h.order_type in ('01','02','03','04')
     and h.order_status = 'CLOSED') as historical_revenue
from customer c
where c.id1 = '1234'
and c.id2 = '5678'
--query correctly returns $1500

/*
these will be subqueries in another query which needs to return
total revenue = current_revenue + historical_revenue = 1460 + 1500 = 2960
*/

有人可以告诉我为什么组合的子查询不起作用?我再次自由地承认我的愚蠢。我相信我以后会觉得自己像个完全白痴,但我只是需要一些帮助。感谢。

编辑:样本表创建&插入。表设计很差。而且非常大。因此样本。还要注意,我正在构建的SQL语句的大小是我在数据馈送的选择中拉动10MM记录,结果比分解和更新更快。在创建可以在最后与union连接的多个表方面进行分区没有任何合理性。我尝试了各种各样的东西,但是巨大的选择变成了最快的。正如你所注意到的那样,我并不是那些擅长SQL转向的人,包括优化器提示。

谢谢,发条缪斯,求助......我将很快测试你的解决方案。此外,没有专门的报告工具。

create table customer (id1 varchar2(4),id2 varchar2(4), 
first_name varchar2(30),last_name varchar2(30));

insert into customer values ('1234','5678','DAVID','HOOVER');
insert into customer values ('0676','3724','JOHN','BOWER');
insert into customer values ('7281','1766','ANNA','VALENZUELA');
insert into customer values ('1458','0076','MARK','JACKSON');
insert into customer values ('0003','9783','JESSICA','BURNETT');

create table order_entry (id1 varchar2(4),id2 varchar2(4),
order_no number,order_type varchar2(2),order_status varchar2(10), 
revenue1 number(10),revenue2(10));

insert into order_entry values ('1234','5678',238347,'02','CLOSED',1220,0;
insert into order_entry values ('1234','5678',238347,'02','CLOSED',0,240;
insert into order_entry values ('1234','5678',238529,'05','CANCEL',500,700;
insert into order_entry values ('1234','5678',238529,'04','PENDING',871,0;
insert into order_entry values ('0003','9783',198293,'33','CLOSED',870,50;
insert into order_entry values ('0676','3724',219972,'02','CLOSED',375,0;
insert into order_entry values ('0676','3724',219972,'02','PENDING',175,59;
insert into order_entry values ('7281','1766',248221,'04','PENDING',0,999;
insert into order_entry values ('1458','0076',218578,'04','CLOSED',0,99;
insert into order_entry values ('1458','0076',218578,'02','CLOSED',399,0;


create table order_history (id1 varchar2(4),id2 varchar2(4),
order_no number,order_type varchar2(2),order_status varchar2(10), 
revenue1 number(10),revenue2(10));

insert into order_history values ('1234','5678',192832,'01','CLOSED',750,0;
insert into order_history values ('1234','5678',192991,'02','CLOSED',0,750;
insert into order_history values ('0003','9783',138982,'01','CLOSED',299,0;
insert into order_history values ('0676','3724',112729,'01','CLOSED',350,0;
insert into order_history values ('1458','0076',185573,'01','CANCEL',1299,199;

2 个答案:

答案 0 :(得分:0)

首先,您应该明确限定联接,而不是使用implicit-join(逗号分隔FROM子句)语法。这实际上并不能解决你自己的问题,但它可能会使未来的工作变得更容易 - 特别是因为除了'正常'的内连接之外,其他任何东西都变得更难/稍微有点奇怪。

正如@Nikola所提到的那样,问题在于你正在获得'重复'行。你有两个解决方案:

  1. 为连接添加条件,直到不再有重复的行(请注意,如果表中的唯一信息不匹配,这可能很难/不可能!)
  2. 在连接之前执行聚合,保证连接的单行。
  3. 任何一种选择的表现可能更好或更差,具体取决于很多因素。

    如果没有关于您的数据的更多信息,则无法说明是否可以添加其他条件以使行正确“独特”(假设它可能与order_type列有关,我不确定它甚至可能)。所以,这是一个预聚合版本(未经测试):

    SELECT c.id1, c.id2, (current_revenue.revenue + historical_revenue.revenue) * .01
    FROM Customer c
    JOIN (SELECT id1, id2, SUM(revenue1 + revenue2) as revenue
          FROM Order_Entry
          WHERE order_type in ('01', '02', '03', '04')
          AND order_status = 'CLOSED'
          GROUP BY id1, id2) as current_revenue
    ON current_revenue.id1 = c.id1
       AND current_revenue.id2 = c.id2
    JOIN (SELECT id1, id2, SUM(revenue1 + revenue2) as revenue
          FROM Order_History
          WHERE order_type in ('01', '02', '03', '04')
          AND order_status = 'CLOSED'
          GROUP BY id1, id2) as historical_revenue
    ON historical_revenue.id1 = c.id1
       AND historical_revenue.id2 = c.id2
    WHERE c.id1 = '1234'
          AND c.id2 = '5678'
    

    请注意,我不确定Oracle是否足够聪明,可以在执行聚合之前应用客户ID限制 - 也就是说,RDBMS可以通过整个<执行聚合/ em> table,而不仅仅是该客户的行。有几种方法可以解决这种可能性;或者将子查询移动到SELECT子句中,或者将客户ID选择添加到子选择中。

    ...此外,400行非常长。你确定不会以某种方式分解它,或投资像专用的报告工具那样更好吗?

答案 1 :(得分:0)

最简单的解决方案:如果你已经计算了2个正确的值,那就加一点吧:

SQLFiddle for test

select c.id1, c.id2,
  (
    (select (sum(oe.revenue1)+sum(oe.revenue2)) * 0.01  -- current_revenue
         from order_entry oe
         where c.id1 = oe.id1
         and c.id2 = oe.id2
         and oe.order_type in ('01','02','03','04')
         and oe.order_status = 'CLOSED'
    ) 
    +
    (select (sum(h.revenue1)+sum(h.revenue2))*.01    -- historical_revenue
         from order_history h
         where c.id1 = h.id1
         and c.id2 = h.id2
         and h.order_type in ('01','02','03','04')
         and h.order_status = 'CLOSED'
    )
  ) as total_revenue  
from customer c
where c.id1 = '1234'
and c.id2 = '5678'

当然,由于缺乏数据,不可能保证最佳性能,但它只是起作用。