SQL:了解SELECT DISTINCT如何消除重复项

时间:2019-04-19 01:49:36

标签: mysql sql

我有下表叫做genkeyword:

---------------------------------------------------------------------------
|  id     |      title           |   genre       | keyword      |    year |
----------------------------------------------------------------------------
| 315     |  Harry Potter        |   drama       | magic        |   2011  |
| 315     |  Harry Potter        |   mystery     | magic        |   2011  |
| 315     |  Harry Potter        |   adventure   | magic        |   2011  |
| 315     |  Harry Potter        |   fantasy     | magic        |   2011  |
| 315     |  Harry Potter        |   drama       | witch        |   2011  |
| 315     |  Harry Potter        |   mystery     | witch        |   2011  |
| 315     |  Harry Potter        |   adventure   | witch        |   2011  |
| 315     |  Harry Potter        |   fantasy     | witch        |   2011  |
| 407     |  Cinderella          |   fantasy     | prince       |   2015  |
| 407     |  Cinderella          |   drama       | prince       |   2015  |
| 407     |  Cinderella          |   fantasy     | prince       |   2015  |
| 407     |  Cinderella          |   drama       | prince       |   2015  |
| 826     |  The Shape of Water  |   horror      | scientist    |   2017  |
| 826     |  The Shape of Water  |   adventure   | scientist    |   2017  |
| 826     |  The Shape of Water  |   thriller    | scientist    |   2017  |
| 826     |  The Shape of Water  |   drama       | scientist    |   2017  |
| 826     |  The Shape of Water  |   horror      | friendship   |   2017  |
| 826     |  The Shape of Water  |   adventure   | friendship   |   2017  |
| 826     |  The Shape of Water  |   thriller    | friendship   |   2017  |
| 826     |  The Shape of Water  |   drama       | friendship   |   2017  |
---------------------------------------------------------------------------

我有以下查询,该查询获取上表中每部电影与哈利·波特相同的所有流派的频率:

select title, year, count(distinct genre) as genre_freq from genkeyword
where genre in (select genre from genkeyword where title='Harry Potter') and 
title <> 'Harry Potter' group by 
title, year order by genre_freq desc;

输出应为:

--------------------------------------------------
| title                |    year   |    genre_freq |
---------------------------------------------------
| Cinderella           |    2015   |      2        |
| The Shape of Water   |    2017   |      2        |
----------------------------------------------------

但是,我在准确了解查询中count(distinct genre)的工作方式时遇到了麻烦。我知道SELECT DISTINCT仅返回不同的值,并从结果中消除重复的记录。我不确定count(distinct genre)实际上何时删除重复的记录。我真的很想了解查询在后台执行的操作。

到目前为止我所知道的:

对于genkeyword中的每个元组:

  • “其中的流派(从genkeyword中选择流派,其中title ='Harry Potter')”,检索所有流派,其中genre属性的值是Harry Potter中的流派。
  • 如果正在考虑的元组中的体裁在where子句返回的结果集中,则按count(distinct genre)计数。同样,被考虑的元组中的电影值不能是哈利·波特,否则就不会被计算在内。

但是,计数(独特类型)何时真正删除重复项?任何见解都会受到赞赏。

1 个答案:

答案 0 :(得分:1)

简而言之,COUNT(DISTINCT [Colnum])会执行DISTINCT来删除COUNT之前的重复的colnum值。

根据您的样本数据和查询条件。

| title              | genre     | year |
| ------------------ | --------- | ---- |
| Cinderella         | fantasy   | 2015 |
| Cinderella         | drama     | 2015 |
| Cinderella         | fantasy   | 2015 |
| Cinderella         | drama     | 2015 |
| The Shape of Water | adventure | 2017 |
| The Shape of Water | drama     | 2017 |
| The Shape of Water | adventure | 2017 |
| The Shape of Water | drama     | 2017 |

使用count(distinct genre)时,您将删除重复的genre

您可以得到count这样的结果。

| title              | year | genre     |
| ------------------ | ---- | --------- |
| Cinderella         | 2015 | fantasy   |
| Cinderella         | 2015 | drama     |
| The Shape of Water | 2017 | adventure |
| The Shape of Water | 2017 | drama     |

因此,使用查询时您将获得帮助。

| title                |    year   |    genre_freq  |
 ----------------------|-----------|----------------|
| Cinderella           |    2015   |      2         |
| The Shape of Water   |    2017   |      2         |