Union All Query需要太长时间

时间:2012-01-29 12:00:17

标签: mysql sql performance union

这个问题已被多次询问我确定,但每个案例都不同。

我在具有2GB RAM的强大计算机上进行了MySQL设置,它没有做太多,所以计算机就足够了。

以下查询已构建为视图:

create view view_orders as

select distinct 

    tbl_orders_order.order_date AS sort_col,
    tbl_orders_order.order_id AS order_id,
_utf8'website' AS src,tbl_order_users.company AS company,
tbl_order_users.phone AS phone,
tbl_order_users.full_name AS full_name,
time_format(tbl_orders_order.order_date,_utf8'%H:%i') AS c_time,
date_format(tbl_orders_order.order_date,_utf8'%d/%m/%Y') AS c_date,
tbl_orders_order.comments AS comments,
tbl_orders_order.tmp_cname AS tmp_cname,
tbl_orders_order.tmp_pname AS tmp_pname,
count(tbl_order_docfiles.docfile_id) AS number_of_files,
(case tbl_orders_order.status when 1 then _utf8'completed' when 2 then _utf8'hc' when 0 then _utf8'not-completed' when 3 then _utf8'hc-canceled' end) AS status,

tbl_orders_order.employee_name AS employee_name,
tbl_orders_order.status_date AS status_date,
tbl_orders_order.cancel_reason AS cancel_reason

 from 
        tbl_orders_order left join tbl_order_users on tbl_orders_order.user_id = tbl_order_users.user_id 
    left join 
        tbl_order_docfiles on tbl_order_docfiles.order_id = tbl_orders_order.order_id

    group by 
        tbl_orders_order.order_id 



union all

select distinct tbl_h.h_date AS sort_col,
(case tbl_h.sub_oid when 0 then tbl_h.order_number else concat(tbl_h.order_number,_utf8'-',tbl_h.sub_oid) end) AS order_id,
(case tbl_h.type when 1 then _utf8'פקס' when 2 then _utf8'email' end) AS src,_utf8'' AS company,
_utf8'' AS phone,_utf8'' AS full_name,time_format(tbl_h.h_date,_utf8'%H:%i') AS c_time,
date_format(tbl_h.h_date,_utf8'%d/%m/%Y') AS c_date,_utf8'' AS comments,tbl_h.client_name AS tmp_cname,
tbl_h.project_name AS tmp_pname,
tbl_h.quantity AS number_of_files,
_utf8'completed' AS status,
tbl_h.computer_name AS employee_name,
_utf8'' AS status_date,
_utf8'' AS cancel_reason 

from tbl_h;

查询使用了UNION,而不是我读了一篇关于UNION ALL的文章,现在使用它。

单独执行查询大约需要3秒钟(UNION需要4.5-5.5秒) 每个部分单独运行几秒钟。

应用程序对此视图进行排序和选择,这使得处理时间更长 - 缓存查询时大约6秒,如果数据发生更改,则大约12秒或更长。

我认为没有其他方法可以合并这两个结果,因为两个排序需要向用户显示,我猜我正在做的事情是错误的。

当然两个表都使用主键。

UPDATE !!!!

它没有帮助,我从联合查询中获取了utf8 / case / date_format,并删除了区别,现在查询需要4秒(甚至更长)。 没有case / date / utf8(只有union)的查询缩短为2.3秒(改进0.3秒)。

将视图view_orders创建为

select *,
    (CASE src
        WHEN 1 THEN
             _utf8'fax' 
        WHEN 2 THEN 
            _utf8'mail' 
        WHEN 3 THEN
            _utf8'website'
    END) AS src,

    time_format(order_date,'%H:%i') AS c_time,
    date_format(order_date,'%d/%m/%Y') AS c_date,

    (CASE status 
        WHEN 1 THEN 
            _utf8'completed' 
        WHEN 2 THEN
             _utf8'hc handling' 
        WHEN 0 THEN
             _utf8'not completed' 
        WHEN 3 THEN
             _utf8'canceled'
    END) AS status

FROM
(
select 

    o.order_date AS sort_col,
    o.order_id,
    3 AS src,

    u.company,
    u.phone,
    u.full_name,

    o.order_date,

    o.comments,
    o.tmp_cname,
    o.tmp_pname,
    count(doc.docfile_id) AS number_of_files,

    o.status,

    o.employee_name,
    o.status_date,
    o.cancel_reason

 from 
        tbl_orders_order o
    LEFT JOIN
        tbl_order_users u ON u.user_id = o.user_id
    LEFT JOIN
        tbl_order_docfiles doc ON doc.order_id = o.order_id

    GROUP BY
        o.order_id 

union all

select 

    h.h_date AS sort_col,
    (case h.sub_oid when 0 then h.order_number else concat(h.order_number,'-',h.sub_oid) end) AS order_id,
    h.type as src,

    '' AS company,
    '' AS phone,
    '' AS full_name,

    h.h_date,

    '' AS comments,
    h.client_name AS tmp_cname,
    h.project_name AS tmp_pname,
    h.quantity AS number_of_files,
    1 AS status,

    h.computer_name AS employee_name,
    '' AS status_date,
    '' AS cancel_reason 

from tbl_h h

3 个答案:

答案 0 :(得分:4)

考虑使用UNIONDISTINCT关键字。你的查询真的会导致重复的行吗?如果是,删除重复项的最佳查询可能是这种形式:

SELECT ... -- No "DISTINCT" here
UNION
SELECT ... -- No "DISTINCT" here

两个子查询中可能不需要DISTINCT。如果无论如何都不可能重复,请尝试使用此表单。这将是查询执行速度最快(无需进一步优化子查询):

SELECT ... -- No "DISTINCT" here
UNION ALL
SELECT ... -- No "DISTINCT" here

基本原理:UNIONDISTINCT都会对您的中间结果集应用"UNIQUE SORT"操作。根据子查询返回的数据量,这可能非常昂贵。这就是为什么省略DISTINCT并将UNION替换为UNION ALL要快得多的原因之一。

UPDATE 另一个想法,如果您必须删除重复项:首先在内部查询中删除重复项,然后仅在外部查询中格式化日期和代码。这会加快"UNIQUE SORT"操作,因为比较32/64-bit integers比比较varchars便宜:

SELECT a, b, date_format(c), case d when 1 then 'completed' else '...' end
FROM (
  SELECT a, b, c, d ... -- No date format here
  UNION
  SELECT a, b, c, d ... -- No date format here
)

答案 1 :(得分:0)

它可能与UNION触发字符集转换有关。例如,一个查询中的cancel_reason定义为utf8,但另一个查询中未指定。

运行此查询时检查是否存在非常高的CPU峰值,这表示转换。

我个人首先会完成原始数据的并集,然后应用case和conversion语句。但我不确定这会对性能产生影响。

答案 2 :(得分:0)

你能试试这个:

SELECT 
    o.order_date AS sort_col,
    o.order_id AS order_id,
    _utf8'website' AS src,
    u.company AS company,
    u.phone AS phone,
    u.full_name AS full_name,
    time_format(o.order_date,_utf8'%H:%i') AS c_time,
    date_format(o.order_date,_utf8'%d/%m/%Y') AS c_date,
    o.comments AS comments,
    o.tmp_cname AS tmp_cname,
    o.tmp_pname AS tmp_pname,
    COALESCE(d.number_of_files, 0) AS number_of_files,
    ( CASE o.status WHEN 1 THEN _utf8'completed'
                    WHEN 2 THEN _utf8'hc' 
                    WHEN 0 THEN _utf8'not-completed' 
                    WHEN 3 THEN _utf8'hc-canceled' 
      END ) AS status, 
    o.employee_name AS employee_name,
    o.status_date AS status_date,
    o.cancel_reason AS cancel_reason    
 FROM 
       tbl_orders_order AS o
   LEFT JOIN 
       tbl_order_users AS u
     ON o.user_id = u.user_id 
   LEFT JOIN
       ( SELECT order_id
              , COUNT(*) AS number_of_files
         FROM tbl_order_docfiles
         GROUP BY order_id 
       ) AS d
     ON d.order_id = o.order_id

UNION ALL

SELECT 
    tbl_h.h_date AS sort_col,
  ...

FROM tbl_h