提高SQL查询性能

时间:2011-01-14 10:30:45

标签: mysql

我有三个表格,用于存储实际人员数据(person),团队(team)和条目(athlete)。这三个表的模式是:

Database schema

每支队伍中可能有两名或更多运动员。

我正在尝试创建一个查询来生成最频繁的对,这意味着以两人一组进行游戏的人。我提出了以下问题:

SELECT p1.surname, p1.name, p2.surname, p2.name, COUNT(*) AS freq
FROM person p1, athlete a1, person p2, athlete a2
WHERE
    p1.id = a1.person_id AND
    p2.id = a2.person_id AND
    a1.team_id = a2.team_id AND
    a1.team_id IN
          ( SELECT team.id
            FROM team, athlete
            WHERE team.id = athlete.team_id
            GROUP BY team.id
            HAVING COUNT(*) = 2 )
GROUP BY p1.id
ORDER BY freq DESC

显然这是一个消耗资源的查询。有没有办法改善它?

5 个答案:

答案 0 :(得分:4)

SELECT id
FROM team, athlete
WHERE team.id = athlete.team_id
GROUP BY team.id
HAVING COUNT(*) = 2

性能提示1:您只需要athlete表格。

答案 1 :(得分:2)

您可以考虑以下方法,使用触发器来维护您的团队和人员表中的计数器,这样您就可以轻松找出哪些团队有2名或更多运动员以及哪些人在2个或更多团队中。

(注意:我已经从你的运动员表中删除了代理id键,转而使用了一个更好地强化数据完整性的复合键。我还将运动员重命名为team_athlete)

drop table if exists person;
create table person
(
person_id int unsigned not null auto_increment primary key,
name varchar(255) not null,
team_count smallint unsigned not null default 0
)
engine=innodb;

drop table if exists team;
create table team 
(
team_id int unsigned not null auto_increment primary key,
name varchar(255) not null,
athlete_count smallint unsigned not null default 0,
key (athlete_count) 
)
engine=innodb;

drop table if exists team_athlete;
create table team_athlete
(
team_id int unsigned not null,
person_id int unsigned not null,
primary key (team_id, person_id), -- note clustered composite PK
key person(person_id) -- added index
)
engine=innodb;

delimiter #

create trigger team_athlete_after_ins_trig after insert on team_athlete
for each row
begin
  update team set athlete_count = athlete_count+1 where team_id = new.team_id;
  update person set team_count = team_count+1 where person_id = new.person_id;
end#

delimiter ;

insert into person (name) values ('p1'),('p2'),('p3'),('p4'),('p5');
insert into team (name) values ('t1'),('t2'),('t3'),('t4');

insert into team_athlete (team_id, person_id) values
(1,1),(1,2),(1,3),
(2,3),(2,4),
(3,1),(3,5);

select * from team_athlete;
select * from person;
select * from team;

select * from team where athlete_count >= 2;
select * from person where team_count >= 2;

修改

添加以下内容作为最初被误解的问题:

创建一个仅包含2人团队的视图。

drop view if exists teams_with_2_players_view;

create view teams_with_2_players_view as
select
 t.team_id,
 ta.person_id,
 p.name as person_name
from
 team t
inner join team_athlete ta on t.team_id = ta.team_id
inner join person p on ta.person_id = p.person_id
where
 t.athlete_count = 2;

现在使用视图查找最常出现的人对。

select 
 p1.person_id as p1_person_id,
 p1.person_name as p1_person_name,
 p2.person_id as p2_person_id,
 p2.person_name as p2_person_name,
 count(*) as counter
from
 teams_with_2_players_view p1
inner join teams_with_2_players_view p2 on 
  p2.team_id = p1.team_id and p2.person_id > p1.person_id
group by
 p1.person_id, p2.person_id
order by
 counter desc;

希望这会有所帮助:)

编辑2 检查效果

select count(*) as counter from person;

+---------+
| counter |
+---------+
|   10000 |
+---------+
1 row in set (0.00 sec)

select count(*) as counter from team;

+---------+
| counter |
+---------+
|  450000 |
+---------+
1 row in set (0.08 sec)

select count(*) as counter from team where athlete_count = 2;

+---------+
| counter |
+---------+
|  112644 |
+---------+
1 row in set (0.03 sec)

select count(*) as counter from team_athlete;

+---------+
| counter |
+---------+
| 1124772 |
+---------+
1 row in set (0.21 sec)

explain
select 
 p1.person_id as p1_person_id,
 p1.person_name as p1_person_name,
 p2.person_id as p2_person_id,
 p2.person_name as p2_person_name,
 count(*) as counter
from
 teams_with_2_players_view p1
inner join teams_with_2_players_view p2 on 
  p2.team_id = p1.team_id and p2.person_id > p1.person_id
group by
 p1.person_id, p2.person_id
order by
 counter desc
limit 10;

+----+-------------+-------+--------+---------------------+-------------+---------+---------------------+-------+----------------------------------------------+
| id | select_type | table | type   | possible_keys       | key         | key_len | ref                 | rows  | Extra                                        |
+----+-------------+-------+--------+---------------------+-------------+---------+---------------------+-------+----------------------------------------------+
|  1 | SIMPLE      | t     | ref    | PRIMARY,t_count_idx | t_count_idx | 2  | const               | 86588 | Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | t     | eq_ref | PRIMARY,t_count_idx | PRIMARY     | 4  | foo_db.t.team_id    |     1 | Using where                                  |
|  1 | SIMPLE      | ta    | ref    | PRIMARY,person      | PRIMARY     | 4  | foo_db.t.team_id    |     1 | Using index                                  |
|  1 | SIMPLE      | p     | eq_ref | PRIMARY             | PRIMARY     | 4  | foo_db.ta.person_id |     1 |                                              |
|  1 | SIMPLE      | ta    | ref    | PRIMARY,person      | PRIMARY     | 4  | foo_db.t.team_id    |     1 | Using where; Using index                     |
|  1 | SIMPLE      | p     | eq_ref | PRIMARY             | PRIMARY     | 4  | foo_db.ta.person_id |     1 |                                              |
+----+-------------+-------+--------+---------------------+-------------+---------+---------------------+-------+----------------------------------------------+

6 rows in set (0.00 sec)

select 
 p1.person_id as p1_person_id,
 p1.person_name as p1_person_name,
 p2.person_id as p2_person_id,
 p2.person_name as p2_person_name,
 count(*) as counter
from
 teams_with_2_players_view p1
inner join teams_with_2_players_view p2 on 
  p2.team_id = p1.team_id and p2.person_id > p1.person_id
group by
 p1.person_id, p2.person_id
order by
 counter desc
limit 10;

+--------------+----------------+--------------+----------------+---------+
| p1_person_id | p1_person_name | p2_person_id | p2_person_name | counter |
+--------------+----------------+--------------+----------------+---------+
|          221 | person 221     |          739 | person 739     |       5 |
|          129 | person 129     |          249 | person 249     |       5 |
|          874 | person 874     |          877 | person 877     |       4 |
|          717 | person 717     |          949 | person 949     |       4 |
|          395 | person 395     |          976 | person 976     |       4 |
|          415 | person 415     |          828 | person 828     |       4 |
|          287 | person 287     |          470 | person 470     |       4 |
|          455 | person 455     |          860 | person 860     |       4 |
|           13 | person 13      |           29 | person 29      |       4 |
|            1 | person 1       |          743 | person 743     |       4 |
+--------------+----------------+--------------+----------------+---------+
10 rows in set (2.02 sec)

答案 2 :(得分:0)

是否应该有一个额外的约束a1.person_id!= a2.person_id,以避免与同一个玩家创建一对?这可能不会影响结果的最终排序,但会影响计数的准确性。

如果可能的话,你可以在团队表中添加一个名为athlete_count(带索引)的列,只要将一个玩家添加或移除到团队中就可以更新这个列,这可以避免需要遍历整个运动员表的子查询找到两个球员队伍。

UPDATE1: 此外,如果我正确理解原始查询,当您按p1.id分组时,您只能获得玩家在双人游戏团队中玩的次数而不是该玩家本身的次数。您可能必须分组BY p1.id,p2.id。

答案 3 :(得分:0)

基于完全两个团队的修订

通过最内部的两个人预聚合,我可以使用MIN()和MAX()将每个团队中的personA和PersonB分成一队。这样,该人的ID将始终处于低 - 高对设置中,以便与未来的团队进行比较。然后,我可以通过所有团队中的公共Mate1和Mate2查询COUNT,并直接获取他们的名字。

SELECT STRAIGHT_JOIN
      p1.surname, 
      p1.name, 
      p2.surname, 
      p2.name, 
      TeamAggregates.CommonTeams
   from 
     ( select PreQueryTeams.Mate1,
              PreQueryTeams.Mate2,
              count(*) CommonTeams
           from
              ( SELECT team_id, 
                       min( person_id ) mate1, 
                       max( person_id ) mate2
                   FROM 
                       athlete 
                   group by 
                       team_id 
                   having count(*) = 2 ) PreQueryTeams
           group by
              PreQueryTeams.Mate1,
              PreQueryTeams.Mate2  ) TeamAggregates,
      person p1,
      person p2
   where
          TeamAggregates.Mate1 = p1.Person_ID
      and TeamAggregates.Mate2 = p2.Person_ID
   order by 
      TeamAggregates.CommonTeams

有任意队伍的队伍的原始答案

我会做以下事情。内部预先查询首先加入每个团队中人员的所有可能组合,但是具有person1< person2将消除与person1和person2相同的人数。此外,将根据更高编号的人ID防止反向...例如

athlete   person   team
1         1        1   
2         2        1
3         3        1
4         4        1
5         1        2
6         3        2
7         4        2
8         1        3
9         4        3

So, from team 1 you would get person pairs of
1,2    1,3   1,4      2,3     2,4    3,4
and NOT get reversed duplicates such as 
2,1    3,1   4,1      3,2     4,2    4,3
nor same person
1,1    2,2   3,3   4,4 


Then from team 2, you would hav pairs of
1,3   1,4   3,4

Finally in team 3 the single pair of 
1,4

thus teammates 1,4 have occured in 3 common teams.

SELECT STRAIGHT_JOIN
      p1.surname, 
      p1.name, 
      p2.surname, 
      p2.name, 
      PreQuery.CommonTeams
   from 
      ( select
            a1.Person_ID Person_ID1,
            a2.Person_ID Person_ID2,
            count(*) CommonTeams
         from 
            athlete a1,
            athlete a2
         where
                a1.Team_ID = a2.Team_ID
            and a1.Person_ID < a2.Person_ID
         group by 
            1, 2
         having CommonTeams > 1 ) PreQuery,
      person p1,
      person p2
   where 
          PreQuery.Person_ID1 = p1.id
      and PreQuery.Person_ID2 = p2.id
   order by 
      PreQuery.CommonTeams

答案 4 :(得分:0)

这里有一些改进SQL选择查询性能的提示,如:

  • 使用SET NOCOUNT ON有助于减少网络流量 提高绩效。
  • 使用完全合格的程序名称(例如 database.schema.objectname
  • 使用sp_executesql代替execute进行动态查询
  • 不要select *使用select column1,column2,.. IF EXISTSSELECT操作
  • 避免像sp_procedureName Becouse那样命名用户存储过程, 如果我们使用存储过程名称以sp_开头,则首先使用SQL 在master db中搜索。所以它可以降低查询性能。