Postgres最常见的值查询

时间:2016-02-06 19:43:15

标签: sql postgresql greatest-n-per-group

我试图弄清楚如何构建一些查询,我有点迷失。

表:

CREATE TABLE dv_customer(
   customer_id INTEGER PRIMARY KEY,
   first_name VARCHAR(50),
   last_name VARCHAR(50),
   email VARCHAR(50),
   address_id INTEGER,
   active BOOLEAN
);

CREATE TABLE dv_address(
    address_id INTEGER PRIMARY KEY,
    address VARCHAR(50),
    address2 VARCHAR(50),
    district VARCHAR(50),
    city_id INTEGER,
    postal_code VARCHAR(50),
    phone VARCHAR(50)
);

CREATE TYPE MPAA_RATING AS ENUM(
'G',
'PG',
'PG-13',
'R',
'NC-17'
);

CREATE TABLE dv_film(
    film_id INTEGER PRIMARY KEY,
    title VARCHAR(50),
    description TEXT,
    length SMALLINT,
    rating MPAA_RATING,
    release_year SMALLINT
);

CREATE TABLE cb_customers(
    last_name VARCHAR(50),
    first_name VARCHAR(50),
    PRIMARY KEY (last_name, first_name)
);

CREATE TABLE cb_books(
    title VARCHAR(50),
    author_id INTEGER,
    edition SMALLINT,
    publisher VARCHAR(50),
    PRIMARY KEY (title, author_id, edition)
);

CREATE TABLE cb_authors(
    author_id INTEGER PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50)
);

CREATE TABLE mg_customers(
    customer_id INTEGER PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    email VARCHAR(50),
    address_id INTEGER,
    active BOOLEAN
);

我需要弄清楚以下查询:

居住在拥有最多客户的地区的所有客户的名字和姓氏是什么?

到目前为止:

SELECT x.first_name, x.last_name
    FROM dv_customer x, dv_address y    
    WHERE x.address_id = y.address_id 
    AND (SELECT count(district)
    FROM dv_address >= SELECT count(district) FROM dv_address
   );

根据每本书写的书数排名前十位作者的名字和姓氏是什么?我想按作业数量的降序排列作者姓名和书籍数量。

到目前为止:

SELECT x.first_name, x.last_name, count(y.title)
    FROM cb_authors x, cb_books y
    GROUP BY first_name, last_name
    ORDER BY count(*) DESC
    LIMIT 10;

我知道这些有点混乱,但它们是我似乎无法弄清楚的唯一查询。任何帮助,将不胜感激。我是Postgres noob,只想弄清楚它是如何工作的。

2 个答案:

答案 0 :(得分:1)

  

按照每本书写的书数排名前十名作者的名字和姓氏是什么

这种查询通常使用窗口函数完成:

select first_name, last_name, num_books
from (
  SELECT x.first_name, x.last_name, 
         dense_rank() over (order by count(y.title) desc) as rnk, 
         count(*) as num_books
  FROM cb_authors x
    join cb_books y on x.author_id = y.author_id
  GROUP BY x.author_id
) t
where rnk <= 10

您的from子句FROM cb_authors x, cb_books y缺少连接条件,因此在两个表之间创建了一个笛卡尔连接。这是一个很好的例子,说明为什么where子句中的隐式连接是一件坏事。如果您习惯使用明确的JOIN运算符,您将永远不会错过连接条件。

以上也使用x.author_id,这足以进行分组,因为它是列的主键,而选择列表中的所有其他(非分组)列在功能上都依赖于它。

答案 1 :(得分:0)

以下查询将为您提供客户最多的地区

select district
from dv_address
group by district
order by count(*) desc
limit 1

然后,您可以使用子查询选择居住在该区域的所有客户

select c.* from dv_customer c
join dv_address a on c.address_id = a.address_id
where a.district = (
    select district
    from dv_address
    group by district
    order by count(*) desc
    limit 1
)

同样,您可以使用此查询获得前10名author_id&#39。

select author_id 
from cb_books
group by author_id
order by count(*) desc
limit 10

同样,使用dervied table

select a.*, t.cnt from cb_authors a
join (
    select author_id, count(*) cnt
    from cb_books
    group by author_id
    order by count(*) desc
    limit 10
) t on t.author_id = a.author_id
order by t.cnt desc