MySQL计数(*),组BY和INNER JOIN

时间:2011-07-13 12:04:18

标签: mysql performance join count group-by

我对MySQL 5.1的查询非常糟糕。 我简化了我加入的2个表:

CREATE TABLE  `jobs` (
`id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`title` VARCHAR( 255 ) NOT NULL
) ENGINE = MYISAM ;

AND

CREATE TABLE `jobsCategories` (
 `jobID` int(11) NOT NULL,
 `industryID` int(11) NOT NULL,
 KEY `jobID` (`jobID`),
 KEY `industryID` (`industryID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

查询很简单:

SELECT count(*) as nb,industryID 
FROM  jobs J 
INNER JOIN jobsCategories C ON C.jobID=J.id 
GROUP BY industryID 
ORDER BY nb DESC;

我在jobs表中获得了大约150000条记录,在jobsCategories表中获得了350000条记录,我有30个行业;

查询大约需要50秒才能执行!!!

你知道为什么需要这么长时间吗?我怎样才能优化这个数据库的结构?查询的粗略显示,99%的执行时间花在复制tmp表上。

EXPLAIN <query> gives me : 


*************************** 1. row ***************************
       id: 1
select_type: SIMPLE
    table: J
     type: index
possible_keys: PRIMARY
      key: PRIMARY
  key_len: 4
      ref: NULL
     rows: 178950
    Extra: Using index; Using temporary; Using filesort
*************************** 2. row ***************************
       id: 1
 select_type: SIMPLE
    table: C
     type: ref
possible_keys: jobID
      key: jobID
  key_len: 8
      ref: J.id
     rows: 1
    Extra: Using where
2 rows in set (0.00 sec)

关于记忆:

free -m  : 

total       used       free     shared    buffers     cached
Mem:          2011       1516        494          0          8       1075
-/+ buffers/cache:        433       1578
Swap:         5898        126       5772

使用下面建议的FORCE INDEX

select count(*) as nb, industryID 
from 
    jobs J 
    inner join jobsCategories C force index (industryID) on (C.jobID = J.id )
group by industryID 
order by nb DESC;

SHOW PROFILE;

给了我:

+----------------------+----------+
| Status               | Duration |
+----------------------+----------+
| starting             | 0.000095 |
| Opening tables       | 0.000014 |
| System lock          | 0.000008 |
| Table lock           | 0.000007 |
| init                 | 0.000032 |
| optimizing           | 0.000011 |
| statistics           | 0.000032 |
| preparing            | 0.000016 |
| Creating tmp table   | 0.000031 |
| executing            | 0.000003 |
| Copying to tmp table | 3.301305 |
| Sorting result       | 0.000028 |
| Sending data         | 0.000024 |
| end                  | 0.000003 |
| removing tmp table   | 0.000009 |
| end                  | 0.000004 |
| query end            | 0.000003 |
| freeing items        | 0.000029 |
| logging slow query   | 0.000003 |
| cleaning up          | 0.000003 |
+----------------------+----------+

我猜我的RAM(2Gb)还不够大。我怎么能确定是这种情况呢?

3 个答案:

答案 0 :(得分:4)

首先,我认为您不需要加入表作业以获得相同的结果(除非您在表 jobsCategories 中有一些垃圾数据):< / p>

select count(*) as nb, industryID 
from jobsCategories
group by industryID 
order by nb DESC;

否则,您可以尝试强制 industryID

上的索引
select count(*) as nb, industryID 
from 
    jobs J 
    inner join jobsCategories C force index (industryID) on (C.jobID = J.id )
group by industryID 
order by nb DESC;

答案 1 :(得分:0)

将您的表格更改为InnoDB =)InnoDB可以很好地管理大表格,而COUNT(*)可以更快地管理

http://www.mysqlperformanceblog.com/2009/01/12/should-you-move-from-myisam-to-innodb/

祝你好运

修改 经过测试,当没有COUNT(*)子句时,使用WHERE时,MyISAM似乎比InnoDB更快:

http://www.mysqlperformanceblog.com/2006/12/01/count-for-innodb-tables/

无论如何,我已经测试了你使用MyISAM表模拟你拥有的表(150k Jobs和300k JobsCategories)的确切查询,花了1.5秒,所以也许你的问题在别处..这就是我可以告诉你的全部= P < / p>

答案 2 :(得分:0)

希望我不会误解读数,但从我看到的情况来看,你不需要任何加入。由于您的分组是每个行业中有多少个工作,所有这些都在您的工作类别表中,为什么要加入到工作标题的实际工作表中,因为甚至没有返回

select IndustryID,
       count(*) JobsPerIndustry
   from JobCategories
   group by IndustryID

编辑评论/反馈......

这肯定会有所不同...添加与作业相关联的条件...确保您的Jobs表格中包含您希望允许限制的元素的索引...然后按照您最初的类似查询。确保您的Jobs表具有CountryID的索引。

SELECT
      count(*) as nb,
      industryID 
   FROM  jobs J 
      JOIN jobsCategories C 
         ON J.ID = C.jobID
   WHERE 
      J.countryID=1234
   GROUP BY 
      industryID 
   ORDER BY 
      nb DESC;