选择按具有最大聚合的列分组的行

时间:2017-12-11 04:15:27

标签: mysql group-by having

鉴于以下数据集,我如何找到那些具有“已接受”决定的大多数ApplicationID的引用的电子邮件地址?

CREATE TABLE IF NOT EXISTS `EmailReferences` (
  `ApplicationID` INT NOT NULL,
  `Email` VARCHAR(45) NOT NULL,
  PRIMARY KEY (`ApplicationID`, `Email`)
);
INSERT INTO EmailReferences (ApplicationID, Email)
VALUES
(1, 'ref10@test.org'), (1, 'ref11@test.org'), (1, 'ref12@test.org'),
(2, 'ref20@test.org'), (2, 'ref21@test.org'), (2, 'ref22@test.org'),
(3, 'ref11@test.org'), (3, 'ref31@test.org'), (3, 'ref32@test.org'),
(4, 'ref40@test.org'), (4, 'ref41@test.org'), (4, 'ref42@test.org'),
(5, 'ref50@test.org'), (5, 'ref51@test.org'), (5, 'ref52@test.org'),
(6, 'ref60@test.org'), (6, 'ref11@test.org'), (6, 'ref62@test.org'),
(7, 'ref70@test.org'), (7, 'ref71@test.org'), (7, 'ref72@test.org'),
(8, 'ref10@test.org'), (8, 'ref81@test.org'), (8, 'ref82@test.org')
;

CREATE TABLE IF NOT EXISTS `FinalDecision` (
  `ApplicationID` INT NOT NULL,
  `Decision` ENUM('Accepted', 'Denied') NOT NULL,
  PRIMARY KEY (`ApplicationID`)
);
INSERT INTO FinalDecision (ApplicationID, Decision)
VALUES
(1, 'Accepted'), (2, 'Denied'),
(3, 'Accepted'), (4, 'Denied'),
(5, 'Denied'),   (6, 'Denied'),
(7, 'Denied'),   (8, 'Accepted')
;

小提琴:http://sqlfiddle.com/#!9/03bcf2/1

最初,我正在使用LIMIT 1ORDER BY CountDecision DESC,如下所示:

SELECT  er.email, COUNT(fd.Decision) AS CountDecision
FROM    EmailReferences AS er
JOIN    FinalDecision AS fd ON er.ApplicationID = fd.ApplicationID
WHERE   fd.Decision = 'Accepted'
GROUP   BY er.email
ORDER   BY CountDecision DESC
LIMIT   1
;

然而,我想到我可以有多个电子邮件地址,这些地址引用了不同的“最接受”的决定(即,可以说是一个平局),那些将被过滤掉(是正确的措辞吗?) LIMIT关键字。

然后,我尝试了对上述查询的变体,将ORDER BYLIMIT行替换为:

HAVING MAX(CountDecision)

但我意识到这只是半个陈述:MAX(CountDecision)需要与某些东西进行比较。我只是不知道是什么。

任何指针都会非常感激。谢谢!

注意:这是用于家庭作业。

更新:要明确,我正在尝试从Email中找到EmailReferences的值和数量。但是,我只希望行FinalDecision.Decision = 'Accepted'(匹配ApplicantID s)。根据我的数据,结果应该

Email          | CountDecision
---------------+--------------
ref10@test.org | 2
ref11@test.org | 2

3 个答案:

答案 0 :(得分:0)

基本上你需要做两件事......首先,你需要找到maxCount是什么,然后找到最大数量的记录。

现在,您可以在单个嵌套查询中组合这两个步骤,或将结果存储在变量中并在第二个查询中使用它。我个人试图避免内部查询,因为它们会导致性能问题并且读取起来更复杂,因此我在这里使用变量选项:

-- Find out what max count is and store it in a variable
SELECT  @maxcount := COUNT(fd.Decision) AS CountDecision
FROM    EmailReferences AS er
JOIN    FinalDecision AS fd ON er.ApplicationID = fd.ApplicationID
WHERE   fd.Decision = 'Accepted'
GROUP   BY er.email
ORDER BY CountDecision desc
Limit 1;

-- get emails with @maxcount
SELECT  er.Email, COUNT(fd.Decision) AS CountDecision
FROM    EmailReferences AS er
JOIN    FinalDecision AS fd ON er.ApplicationID = fd.ApplicationID
WHERE   fd.Decision = 'Accepted'
GROUP   BY er.email
HAVING  COUNT(fd.Decision) = @maxcount;

答案 1 :(得分:0)

MySQL仍然缺乏窗口功能,但是当版本8准备就绪时,这变得更容易了。所以对于fuure参考,或者像Mariadb这样已经具有窗口函数的数据库:

CREATE TABLE IF NOT EXISTS `EmailReferences` (
  `ApplicationID` INT NOT NULL,
  `Email` VARCHAR(45) NOT NULL,
  PRIMARY KEY (`ApplicationID`, `Email`)
);
     

INSERT INTO EmailReferences (ApplicationID, Email)
VALUES
(1, 'ref10@test.org'), (1, 'ref11@test.org'), (1, 'ref12@test.org'),
(2, 'ref20@test.org'), (2, 'ref21@test.org'), (2, 'ref22@test.org'),
(3, 'ref30@test.org'), (3, 'ref31@test.org'), (3, 'ref32@test.org'),
(4, 'ref40@test.org'), (4, 'ref41@test.org'), (4, 'ref42@test.org'),
(5, 'ref50@test.org'), (5, 'ref51@test.org'), (5, 'ref52@test.org'),
(6, 'ref60@test.org'), (6, 'ref11@test.org'), (6, 'ref62@test.org'),
(7, 'ref70@test.org'), (7, 'ref71@test.org'), (7, 'ref72@test.org'),
(8, 'ref10@test.org'), (8, 'ref81@test.org'), (8, 'ref82@test.org')
;
     

CREATE TABLE IF NOT EXISTS `FinalDecision` (
  `ApplicationID` INT NOT NULL,
  `Decision` ENUM('Accepted', 'Denied') NOT NULL,
  PRIMARY KEY (`ApplicationID`)
);
     

INSERT INTO FinalDecision (ApplicationID, Decision)
VALUES
(1, 'Accepted'), (2, 'Denied'),
(3, 'Accepted'), (4, 'Denied'),
(5, 'Denied'),   (6, 'Denied'),
(7, 'Denied'),   (8, 'Accepted')
;
     

select email, CountDecision
from (
     SELECT   er.email, COUNT(fd.Decision) AS CountDecision
            , max(COUNT(fd.Decision)) over() maxCountDecision
     FROM EmailReferences AS er
     JOIN FinalDecision AS fd ON er.ApplicationID = fd.ApplicationID
     WHERE    fd.Decision = 'Accepted'
     GROUP    BY er.email
     ) d
where CountDecision = maxCountDecision
email          | CountDecision
:------------- | ------------:
ref10@test.org |             2

dbfiddle here

答案 2 :(得分:0)

例如......

SELECT a.*
  FROM 
     ( SELECT x.email
            , COUNT(*) total
         FROM emailreferences x
         JOIN finaldecision y
           ON y.applicationid = x.applicationid
        WHERE y.decision = 'accepted'
        GROUP
           BY x.email
     ) a
  JOIN
     ( SELECT COUNT(*) total
         FROM emailreferences x
         JOIN finaldecision y
           ON y.applicationid = x.applicationid
        WHERE y.decision = 'accepted'
        GROUP
           BY x.email
        ORDER 
           BY total DESC 
        LIMIT 1
     ) b
    ON b.total = a.total;