具有多个union和group by的MySQL查询

时间:2016-08-14 17:54:36

标签: mysql performance query-optimization

我的数据库架构是这样的:

desc SUB:

Array<$NonMaybeType<T>>

desc Cs:

+-------+-------------+------+-----+---------+----------------+
| Field | Type        | Null | Key | Default | Extra          |
+-------+-------------+------+-----+---------+----------------+
| id    | int(11)     | NO   | PRI | NULL    | auto_increment |
| name  | varchar(30) | NO   |     | NULL    |                |
+-------+-------------+------+-----+---------+----------------+

desc Ap:

+--------------+--------------+------+-----+---------+----------------+
| Field        | Type         | Null | Key | Default | Extra          |
+--------------+--------------+------+-----+---------+----------------+
| id           | int(11)      | NO   | PRI | NULL    | auto_increment |
| other_detail | varchar(255) | NO   |     | NULL    |                |
| created_at   | datetime     | NO   |     | NULL    |                |
| sub_id       | int(11)      | NO   | MUL | NULL    |                |
+--------------+--------------+------+-----+---------+----------------+

desc U:

+--------------+--------------+------+-----+---------+----------------+
| Field        | Type         | Null | Key | Default | Extra          |
+--------------+--------------+------+-----+---------+----------------+
| id           | int(11)      | NO   | PRI | NULL    | auto_increment |
| other_detail | varchar(255) | NO   |     | NULL    |                |
| created_at   | datetime     | NO   |     | NULL    |                |
| sub_id       | int(11)      | NO   | MUL | NULL    |                |
+--------------+--------------+------+-----+---------+----------------+

desc TR:

+------------+-------------+------+-----+---------+-------+
| Field      | Type        | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+-------+
| id         | int(11)     | NO   | PRI | NULL    |       |
| type       | varchar(30) | NO   |     | NULL    |       |
| created_at | datetime    | NO   |     | NULL    |       |
| sub_id     | int(11)     | NO   | MUL | NULL    |       |
+------------+-------------+------+-----+---------+-------+

desc PR:

+--------------+--------------+------+-----+---------+----------------+
| Field        | Type         | Null | Key | Default | Extra          |
+--------------+--------------+------+-----+---------+----------------+
| id           | int(11)      | NO   | PRI | NULL    | auto_increment |
| other_detail | varchar(255) | NO   |     | NULL    |                |
| created_at   | datetime     | NO   |     | NULL    |                |
| sub_id       | int(11)      | NO   | MUL | NULL    |                |
+--------------+--------------+------+-----+---------+----------------+

desc ID:

+--------------+--------------+------+-----+---------+----------------+
| Field        | Type         | Null | Key | Default | Extra          |
+--------------+--------------+------+-----+---------+----------------+
| id           | int(11)      | NO   | PRI | NULL    | auto_increment |
| other_detail | varchar(255) | NO   |     | NULL    |                |
| created_at   | datetime     | NO   |     | NULL    |                |
| sub_id       | int(11)      | NO   | MUL | NULL    |                |
+--------------+--------------+------+-----+---------+----------------+

我的需求是找到所有+--------------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +--------------+--------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | other_detail | varchar(255) | NO | | NULL | | | created_at | datetime | NO | | NULL | | | cs_id | int(11) | NO | MUL | NULL | | +--------------+--------------+------+-----+---------+----------------+ 组合表sub_id(从外键ID获取,Cs有CS_id),sub_id的总数}(PR),sub_idTR),sub_idAP)小于N且其sub_id距离一个月内现在和created_at

我的方法是在u.type = 'fixed_value'的每个表格组中查找计数,在后端找到该组,并查找计数。

sub_id

然后在后端通过SELECT count(*) as action_count, SUB.id FROM Ap INNER JOIN SUB ON Ap.sub_id = SUB.id INNER JOIN U ON U.sub_id = SUB.id WHERE (Ap.created_at >= '2016-07-14') AND (Ap.created_at <= '2016-08-15') AND (U.type = "Customer") AND (U.created_at <= '2016-07-14') GROUP BY SUB.id HAVING action_count > 0 SELECT count(*) as action_count, SUB.id FROM PR INNER JOIN SUB ON PR.sub_id = SUB.id INNER JOIN U ON U.sub_id = SUB.id WHERE (PR.created_at >= '2016-07-14') AND (PR.created_at <= '2016-08-15') AND (U.type = "Customer") AND (U.created_at <= '2016-07-14') GROUP BY SUB.id HAVING action_count > 0 SELECT count(*) as action_count, SUB.id FROM TR INNER JOIN SUB ON TR.sub_id = SUB.id INNER JOIN U ON U.sub_id = SUB.id WHERE (TR.created_at >= '2016-07-14') AND (TR.created_at <= '2016-08-15') AND (U.type = "Customer") AND (U.created_at <= '2016-07-14') GROUP BY SUB.id HAVING action_count > 0 SELECT count(*) as action_count, SUB.id FROM ID INNER JOIN Cs on ID.cs_id = Cs.id inner join SUB ON Cs.sub_id = SUB.id INNER JOIN U ON U.sub_id = SUB.id WHERE (ID.created_at >= '2016-07-14') AND (ID.created_at <= '2016-08-15') AND (U.type = "Customer") AND (U.created_at <= '2016-07-14') GROUP BY SUB.id HAVING action_count > 0 再次对获得的结果进行分组,并计算SUB.id的总和,并删除其计数&lt; action_count的所有SUB.id。 Ñ

如何优化此功能?我们知道,80%的SUB_id属于action_count > N类别,因此在后端获取所有后续过滤是非常糟糕的。

我无法删除所有sub_id&lt; N在单个查询中,因为在一个查询中它可能有计数&lt; N,其他&gt; N,并且休息0.因此,它将被认为具有计数&lt; N这是错误的。

在这种情况下,联盟所有分组依据DB的计数是否有用?

Select sum(action_count) as count , sub_id from (SELECT count(*) as action_count, SUB.id as sub_id FROM Ap INNER JOIN  SUB ON Ap.sub_id = SUB.id INNER JOIN U  ON U.sub_id = SUB.id WHERE (Ap.created_at >= '2016-07-14') AND (Ap.created_at <= '2016-08-15') AND (U.type = "Customer") AND  (U.created_at <= '2016-07-14') GROUP BY SUB.id HAVING action_count > 0
union all 
SELECT count(*) as action_count, SUB.id  as sub_id  FROM PR INNER JOIN  SUB ON PR.sub_id = SUB.id INNER JOIN U  ON U.sub_id = SUB.id WHERE (PR.created_at >= '2016-07-14') AND (PR.created_at <= '2016-08-15') AND (U.type = "Customer") AND  (U.created_at <= '2016-07-14') GROUP BY SUB.id HAVING action_count > 0
union all 
SELECT count(*) as action_count, SUB.id  as sub_id  FROM TR INNER JOIN  SUB ON TR.sub_id = SUB.id INNER JOIN U  ON U.sub_id = SUB.id WHERE (TR.created_at >= '2016-07-14') AND (TR.created_at <= '2016-08-15') AND (U.type = "Customer") AND  (U.created_at <= '2016-07-14') GROUP BY SUB.id HAVING action_count > 0
union all 
SELECT count(*) as action_count, SUB.id  as sub_id  FROM ID INNER JOIN Cs on ID.cs_id = Cs.id inner join SUB ON Cs.sub_id = SUB.id INNER JOIN U  ON U.sub_id = SUB.id WHERE (ID.created_at >= '2016-07-14') AND (ID.created_at <= '2016-08-15') AND (U.type = "Customer") AND  (U.created_at <= '2016-07-14') GROUP BY SUB.id HAVING action_count > 0) A group by sub_id having count < 30

但是,这也是在最后阶段做过滤。我该如何进一步优化呢?

注意:我已经在https://codereview.stackexchange.com/问了这个问题,但由于我无法得到解决方案,所以也在此重新发布。对不起。

1 个答案:

答案 0 :(得分:1)

首先,看看是否

SELECT  count(*) as action_count, SUB.id
    FROM  Ap
    INNER JOIN  SUB  ON Ap.sub_id = SUB.id
    INNER JOIN  U  ON U.sub_id = SUB.id
    WHERE  (Ap.created_at >= '2016-07-14')
      AND  (Ap.created_at <= '2016-08-15')
      AND  (U.type = "Customer")
      AND  (U.created_at <= '2016-07-14')
    GROUP BY  SUB.id
    HAVING  action_count > 0 

可以这样写,并得到相同的答案:

SELECT  count(*) as action_count, SUB.id
    FROM  SUB
    WHERE  EXISTS(
        SELECT  *
            FROM  Ap
            WHERE  sub_id = SUB.id
              AND  created_at >= '2016-07-14'
              AND  created_at <= '2016-08-15' 
                 )
      AND  EXISTS(
        SELECT  *
            FROM  U
            WHERE  sub_id = SUB.id
              AND  type = "Customer"
              AND  created_at <= '2016-07-14' 
                 )

要获得速度,请添加以下索引:

U: INDEX(sub_id, type, created_at)
AP: INDEX(sub_id, created_at)

请注意,GROUP BY已被删除,速度有所提升。并且HAVING被消除,从而导致较小的结果集。

现在从这样的查询中构建UNION ALL