“select max in group”的最佳性能查询?

时间:2008-09-18 19:13:01

标签: sql database

我有一个简单的表评论(id INT, revision INT, comment VARCHAR(140)),其中包含以下内容:

1|1|hallo1|
1|2|hallo2|
1|3|hallo3|
2|1|hallo1|
2|2|hallo2|

我正在搜索一条SQL语句,该语句将返回具有最高版本的每条评论:

1|3|hallo3|
2|2|hallo2|

我想出了这个解决方案:

select id, revision, comment 
  from comments 
  where revision = (
      select max(revision) 
        from comments as f 
        where f.id = comments.id
  );

但是在大​​型数据集上它非常慢。有没有更好的查询来实现这个目标?

8 个答案:

答案 0 :(得分:11)

这是一种方法,通过适当的索引不会非常慢,并且它不使用子选择:

SELECT comments.ID, comments.revision, comments.comment FROM comments 
LEFT OUTER JOIN comments AS maxcomments 
ON maxcomments.ID= comments.ID
AND maxcomments.revision > comments.revision
WHERE maxcomments.revision IS NULL

改编自此处的查询: http://www.xaprb.com/blog/2007/03/14/how-to-find-the-max-row-per-group-in-sql-without-subqueries/

(来自谷歌搜索:sql的max group)

答案 1 :(得分:6)

  1. 确保已正确设置索引。对id进行索引,修改会很好。

  2. 以下是对您的查询的不同看法。尚未检查其执行计划,但如果您设置好索引,它应该有所帮助:

    SELECT c.* 
      FROM comments c
      INNER JOIN (
            SELECT id,max(revision) AS maxrev 
              FROM comments 
              GROUP BY id
      ) b
        ON c.id=b.id AND c.revision=b.maxrev
    
  3. 编辑添加:

    1. 如果您使用的是SQL Server,则可能还需要查看索引视图:
      http://www.microsoft.com/technet/prodtechnol/sql/2005/impprfiv.mspx
    2. 再次编辑以添加信息:

      Subquery:
      25157 records
      2 seconds
      Execution plan includes an Index Seek (82%) base and a Segment (17%)
      
      Left Outer Join:
      25160 records
      3 seconds
      Execution plan includes two Index Scans @ 22% each with a Right Outer Merge at 45% and a Filter at 11%
      

      我仍然会使用子查询。

答案 2 :(得分:4)

使用我们的一个表测试,总共有近100万行。索引存在于FIELD2和FIELD3两个字段中。查询在我们的开发框中在3秒内返回83953行。

select
FIELD1, FIELD2, FIELD3
from
OURTABLE (nolock) T1
WHERE FIELD3 = 
(
SELECT MAX(FIELD3) FROM 
OURTABLE T2 (nolock)
WHERE T1.FIELD2=T2.FIELD2
)
ORDER BY FIELD2 DESC

答案 3 :(得分:1)

分析将是我的建议。

select id, max_revision, comment
from (select c.id, c.comment, c.revision, max(c.revision)over(partition by c.id) as max_revision
      from comments c)
where revision = max_revision;

答案 4 :(得分:0)

来自左侧字段的想法,但是如何在表格中添加额外的字段:

CurrentRevision bit not null

然后,当您进行更改时,请在新版本上设置标记,并将其删除所有以前的版本。

您的查询将变为:

select  Id,
        Comment
from    Comments
where   CurrentRevision = 1

这对数据库来说要容易得多,因此要快得多。

答案 5 :(得分:0)

一种非常干净的方式来做“最新的x by id”类型查询就是这样。正确索引也应该很容易。

SELECT id, revision, comment 
FROM comments
WHERE (id, revision) IN (
  SELECT id, MAX(revision)
  FROM comments
  -- WHERE clause comes here if needed
  GROUP BY id
)

答案 6 :(得分:0)

对于大表,我发现这个解决方案可以有更好的性能:

    SELECT c1.id, 
           c1.revision, 
           c1.comment 
      FROM comments c1 
INNER JOIN ( SELECT id, 
                max(revision) AS max_revision
               FROM comments 
           GROUP BY id ) c2
        ON c1.id = c2.id
       AND c1.revision = c2.max_revision

答案 7 :(得分:0)

没有子选择(或临时表):

SELECT c1.ID, c1.revision, c1.comment 
FROM comments AS c1
LEFT JOIN comments AS c2 
    ON c1.ID = c2.ID
    AND c1.revision < c2.revision
WHERE c2.revision IS NULL