如何优化在 HAVING 子句中具有内部 SELECT 的查询

时间:2021-01-09 19:56:30

标签: mysql

我有一个查询,它根据在比赛中得分的平均得分来选择排名前三的篮球运动员。如果篮球运动员参加了超过 50% 的球队比赛,他就可以进入这个前三名单:

SELECT games_stats.player, AVG(games_stats.points) AS points_avg
FROM games_stats
WHERE EXISTS (
SELECT *
FROM players
WHERE games_stats.player = players.id AND status = 'active') AND season = 28293
GROUP BY games_stats.player
HAVING COUNT(games_stats.game) >= ((
SELECT COUNT(*)
FROM games
WHERE home IN (
SELECT team
FROM teams_players
WHERE season='28293' AND player=games_stats.player) AND season='28293' AND (STATUS='finished' OR STATUS='complete')) + (
SELECT COUNT(*)
FROM games
WHERE away IN (
SELECT team
FROM teams_players
WHERE season='28293' AND player=games_stats.player) AND season='28293' AND (STATUS='finished' OR STATUS='complete'))) / 2
ORDER BY points_avg DESC
LIMIT 3

问题是这个查询在服务器资源和运行时间上非常昂贵:它甚至需要执行 0.54 秒,而且我的篮球网站的服务器经常因此过载,有时甚至崩溃。缓存查询结果是不够的,而且只能提供一点帮助,因为篮球比赛几乎每天都在进行,统计数据几乎每天都在更新。

我有一个想法,希望它能减少执行时间:我试图减少这个重复的子查询:

SELECT team
FROM teams_players
WHERE season='28293' AND player=games_stats.player

这意味着我想把我的查询变成这样:

SELECT games_stats.player, AVG(games_stats.points) AS points_avg, CONCAT(SELECT team FROM teams_players WHERE season=28293 AND teams_players.player=games_stats.player) AS ids_of_teams
FROM games_stats
WHERE EXISTS (
SELECT *
FROM players
WHERE games_stats.player = players.id AND status = 'active') AND season = 28293
GROUP BY games_stats.player
HAVING COUNT(games_stats.game) >= ((
SELECT COUNT(*)
FROM games
WHERE home IN ids_of_teams AND season='28293' AND (STATUS='finished' OR STATUS='complete')) + (
SELECT COUNT(*)
FROM games
WHERE away IN ids_of_teams AND season='28293' AND (STATUS='finished' OR STATUS='complete'))) / 2
ORDER BY points_avg DESC
LIMIT 3

不幸的是,CONCAT() 返回一串串联的团队 ID(我需要一个数组)。那么,主要问题:如何减少/优化这个重复的子查询?如何设置一个“存储”一组 ID 的“字段”,在重复的子查询中得到?

编辑:现在我发现我的问题是错误的 - 似乎问题出在不同的地方。我有一个问题,我是否可以优化具有内部 SELECT 查询的 HAVING 子句。

顺便说一句,您有其他想法如何编写关于最佳球员及其统计数据的更有效查询吗?请注意,我必须“选择这些前 3 名球员,他们的球队上场率超过 50%”。


Database structure

数据库结构说明: 表“players”存储有关篮球联赛中每个球员的数据。球员可以在下个赛季或当前赛季更换球队,所以数据透视表“teams_players”描述了球员在他的职业生涯中出现的球队;

数据透视表“teams_players”有外键“team”、“player”和“season”,引用表“teams”、“players”和“seasons”的ID。

表“games”存储有关游戏的数据; “主场”和“客场”字段存储了比赛中对方球队的ID;

表“games_stats”按游戏存储每个玩家的统计数据。它有外键“game”,引用 games.id。它还有外键“player”,引用了players.id。

EDIT:EXPLAIN 的输出:

Query Explain output

2 个答案:

答案 0 :(得分:0)

我认为在临时表的帮助下,它应该更简单、更高效。用实际的小提琴很难说:

SET @var_season := 28293;

DROP TEMPORARY TABLE IF EXISTS tmp_team_games_played;
CREATE TEMPORARY TABLE tmp_team_games_played
    (PRIMARY KEY (id))
SELECT t.id, COUNT(0) AS Count
FROM games g
JOIN teams t ON t.id IN (g.home, g.away)
WHERE TRUE
    AND g.season = @var_season AND (g.STATUS IN ('finished', 'complete'))
GROUP BY t.id
;

DROP TEMPORARY TABLE IF EXISTS tmp_player_team_points;
CREATE TEMPORARY TABLE tmp_player_team_points
    (PRIMARY KEY (player, team))
SELECT gs.player, gs.team, SUM(gs.points) AS points_in_team_games
FROM games_stats gs
JOIN players p ON gs.player = p.id AND p.status = 'active' -- reorder based on index
WHERE TRUE
    AND gs.season = @var_season
GROUP BY gs.player, gs.team
;

SELECT tptp.player, AVG(tptp.points_in_team_games) AS points_avg
FROM tmp_player_team_points tptp
JOIN tmp_team_games_played tgp ON tptp.team = tgp.id
GROUP BY tptp.player
-- I took the liberty to make a player play more than a half
-- as opposed to greater than or eqaul to half
HAVING COUNT(0) > (MIN(tgp.Count) * 2)
ORDER BY points_avg DESC
LIMIT 3
;

答案 1 :(得分:0)

(太复杂)HAVING 子句让我觉得这个查询可以用其他方式编写。

你能检查一下,并对结果发表评论吗?:

SELECT 
  games_stats.player, 
  AVG(games_stats.points) AS points_avg
FROM games_stats
INNER JOIN (
  select team
  from games
  inner join teams_players on (teams_players.team=home OR teams_players.team=away)
                           and teams_players.season=games.season
  where games.season=28293
    and (games.status='finished' or games.status='complete')
  ) x on x.team=games_stats.team
WHERE 
  season=28293
  and EXISTS (
    SELECT *
    FROM players
    WHERE games_stats.player = players.id AND status = 'active') 
GROUP BY games_stats.player
ORDER BY points_avg DESC
LIMIT 3;
相关问题