仅使用GROUP BY对某些行进行分组

时间:2016-03-15 12:10:47

标签: mysql sql group-by greatest-n-per-group

SCHEMA

我在MySQL数据库中进行了以下设置:

CREATE TABLE items (
  id SERIAL,
  name VARCHAR(100),
  group_id INT,
  price DECIMAL(10,2),
  KEY items_group_id_idx (group_id),
  PRIMARY KEY (id)
);

INSERT INTO items VALUES 
(1, 'Item A', NULL, 10),
(2, 'Item B', NULL, 20),
(3, 'Item C', NULL, 30),
(4, 'Item D', 1,    40),
(5, 'Item E', 2,    50),
(6, 'Item F', 2,    60),
(7, 'Item G', 2,    70);

问题

  

我需要选择:

     
      
  • group_id的所有项目,其值为NULL
  •   
  • group_id标识的每个组中的一个项具有最低价格。
  •   

预期成果

+----+--------+----------+-------+
| id | name   | group_id | price |
+----+--------+----------+-------+
|  1 | Item A |     NULL | 10.00 | 
|  2 | Item B |     NULL | 20.00 | 
|  3 | Item C |     NULL | 30.00 | 
|  4 | Item D |        1 | 40.00 | 
|  5 | Item E |        2 | 50.00 | 
+----+--------+----------+-------+

可能的解决方案1: UNION ALL

的两个查询
SELECT id, name, group_id, price FROM items
WHERE group_id IS NULL
UNION ALL
SELECT id, name, MIN(price) FROM items
WHERE group_id IS NOT NULL
GROUP BY group_id;

/* EXPLAIN */
+----+--------------+------------+------+--------------------+--------------------+---------+-------+------+----------------------------------------------+
| id | select_type  | table      | type | possible_keys      | key                | key_len | ref   | rows | Extra                                        |
+----+--------------+------------+------+--------------------+--------------------+---------+-------+------+----------------------------------------------+
|  1 | PRIMARY      | items      | ref  | items_group_id_idx | items_group_id_idx | 5       | const |    3 | Using where                                  | 
|  2 | UNION        | items      | ALL  | items_group_id_idx | NULL               | NULL    | NULL  |    7 | Using where; Using temporary; Using filesort | 
| NULL | UNION RESULT | <union1,2> | ALL  | NULL               | NULL               | NULL    | NULL  | NULL |                                              | 
+----+--------------+------------+------+--------------------+--------------------+---------+-------+------+----------------------------------------------+

然而,由于WHERE子句中存在更复杂的条件,因此不希望有两个查询,我需要对最终结果进行排序。

可能的解决方案2: GROUP BY表达式(reference

SELECT id, name, group_id, MIN(price) FROM items
GROUP BY CASE WHEN group_id IS NOT NULL THEN group_id ELSE RAND() END;

/* EXPLAIN */
+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                           |
+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
|  1 | SIMPLE      | items | ALL  | NULL          | NULL | NULL    | NULL |    7 | Using temporary; Using filesort | 
+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+

解决方案2似乎更快更简单易用,但我想知道在性能方面是否有更好的方法。

更新

根据@axiac引用的文档,此查询在SQL92及更早版本中是非法的,并且可能仅适用于MySQL。

3 个答案:

答案 0 :(得分:1)

this answer根据@axiac,在兼容性和性能方面提供了更好的解决方案。

SQL Antipatterns book, Chapter 15: Ambiguous Groups也解释了这一点。

为了提高性能,还为(group_id, price, id)添加了组合索引。

  

<强>解

SELECT a.id, a.name, a.group_id, a.price
FROM items a
LEFT JOIN items b 
ON a.group_id = b.group_id 
AND (a.price > b.price OR (a.price = b.price and a.id > b.id))
WHERE b.price is NULL;

有关详细信息,请参阅explanation on how it works

作为副作用的意外情况,此查询适用于我需要包含所有记录且group_id等于NULL AND 每个组中一个项目,价格最低。

  

<强> RESULT

+----+--------+----------+-------+
| id | name   | group_id | price |
+----+--------+----------+-------+
|  1 | Item A |     NULL | 10.00 | 
|  2 | Item B |     NULL | 20.00 | 
|  3 | Item C |     NULL | 30.00 | 
|  4 | Item D |        1 | 40.00 | 
|  5 | Item E |        2 | 50.00 | 
+----+--------+----------+-------+
  

<强> EXPLAIN

+----+-------------+-------+------+-------------------------------+--------------------+---------+----------------------------+------+--------------------------+
| id | select_type | table | type | possible_keys                 | key                | key_len | ref                        | rows | Extra                    |
+----+-------------+-------+------+-------------------------------+--------------------+---------+----------------------------+------+--------------------------+
|  1 | SIMPLE      | a     | ALL  | NULL                          | NULL               | NULL    | NULL                       |    7 |                          | 
|  1 | SIMPLE      | b     | ref  | PRIMARY,id,items_group_id_idx | items_group_id_idx | 5       | agi_development.a.group_id |    1 | Using where; Using index | 
+----+-------------+-------+------+-------------------------------+--------------------+---------+----------------------------+------+--------------------------+

答案 1 :(得分:0)

您可以使用where条件执行此操作:

SQLFiddle Demo

select t.*
from t
where t.group_id is null or
      t.price = (select min(t2.price)
                 from t t2
                 where t2.group_id = t.group_id
                );

请注意,如果给定组有多个行,则返回所有具有最低价格的行。

编辑:

我相信以下解决了多行的问题:

select t.*
from t
where t.group_id is null or
      t.id = (select t2.id
              from t t2
              where t2.group_id = t.group_id
              order by t2.price asc
              limit 1
             );

不幸的是,SQL Fiddle现在不适合我,所以我无法测试它。

答案 2 :(得分:0)

如果group_id始终为正值,则可以在不GUID / RAND的情况下简化它:

SELECT id, name, min(price) FROM items
GROUP BY COALESCE(group_id, -id); -- id is already unique

但是如果您更改插入顺序,两个查询都会返回正确的结果,我会在它再次工作时添加一个小提琴......

戈登的查询应该按预期工作,或者您使用旧技巧获取MIN的另一列:搭载

将多个列连接为固定长度字符串,MIN列为#1,并在此字符串上应用MIN。在下一步中,您将使用匹配的SUBSTRING

再次提取列
SELECT
   CASE WHEN grp > 0 THEN grp ELSE NULL END AS group_id
   ,CAST(SUBSTRING(x FROM 1 FOR 13) AS DECIMAL(10,2)) AS price
   ,SUBSTRING(x FROM 24) AS NAME
FROM
 (
   SELECT COALESCE(group_id, -id) AS grp
      -- results in a string like this
      -- '        50.00         5Item E'
      ,MIN(LPAD(CAST(price AS VARCHAR(13)),13) 
           || LPAD(CAST(id AS VARCHAR(10)),10)
           || NAME) AS x
   FROM items
   GROUP BY grp
 ) AS dt;