有效地选择最近的答案

时间:2013-01-25 14:59:27

标签: sql sql-server sql-server-2008 tsql

SQL小提琴:http://sqlfiddle.com/#!3/9b459/6

我有一个表格,其中包含“你会参加这个活动吗?”这个问题的答案。每个用户可能会多次响应,所有答案都会存储在表格中。通常我们只对最新的答案感兴趣,并且我正在尝试构建一个有效的查询。我正在使用SQL Server 2008 R2。

一个事件的表格内容:

Table contents

Column types: int, int, datetime, bit
Primary key: (EventId, MemberId, Timestamp)

请注意,会员18首先回答否及以后是,会员20首先回答是,然后回答否,会员11回答否,之后回答否。我想过滤掉这些成员的第一个答案。此外,可能有多个应该过滤的答案 - 例如,用户可能会回答是,是,否,是,否,否,否。

我尝试了一些不同的想法,并通过输入所有查询,选择显示估计执行计划并比较每个查询的总成本(百分比),在SQL Server Management Studio中对它们进行了评估。这是评估绩效的好方法吗?

到目前为止测试的不同查询:

-----------------------------------------------------------------
-- Subquery to select Answer (does not include Timestamp)
-- Cost: 63 %
-----------------------------------------------------------------
select distinct a.EventId, a.MemberId,
(
  select top 1 Answer
  from    Attendees
  where EventId   = a.EventId
  and   MemberId  = a.MemberId
  order by Timestamp desc
) as Answer
from    Attendees a
where a.EventId = 68

-----------------------------------------------------------------
-- Where with subquery to find max(Timestamp)
-- Cost: 13 %
-----------------------------------------------------------------
select a.EventId, a.MemberId, a.Timestamp, a.Answer
from     Attendees a
where  a.EventId = 68
and    a.Timestamp =
(
  select max(Timestamp)
  from     Attendees
  where  EventId  = a.EventId
  and    MemberId = a.MemberId
)
order by a.TimeStamp;

-----------------------------------------------------------------
-- Group by to find max(Timestamp)
-- Subquery to select Answer matching max(Timestamp)
-- Cost: 23 %
-----------------------------------------------------------------
select a.EventId, a.MemberId, max(a.Timestamp),
(
  select top 1 Answer
  from    Attendees
  where EventId   = a.EventId
  and   MemberId  = a.MemberId
  and   Timestamp = max(a.Timestamp)
) as Answer
from    Attendees a
where a.EventId = 68
group by a.EventId, a.MemberId
order by max(a.TimeStamp);

避免为每个成员使用子查询会很好。在上一个查询中,我尝试使用group by,但仍然必须使用“回答”列的子查询。我真的很喜欢这样的东西,但那当然不是有效的SQL:

select a.EventId, a.MemberId, max(a.Timestamp), a.Answer <-- Picked from the line selected by max(a.Timestamp)
from  Attendees a
where a.EventId = 68
group by a.EventId, a.MemberId
order by max(a.TimeStamp);

有效查询的其他任何想法?


编辑:

SQL Fiddle印象非常深刻,我现在输入了我的实际数据: http://sqlfiddle.com/#!3/9b459/6

3 个答案:

答案 0 :(得分:7)

SQL Server 2008支持公用表表达式和窗口函数。

WITH recordsList
AS
(
    SELECT  EventID, MemberID, TimeStamp, Answer,
            ROW_NUMBER() OVER (PARTITION BY EventID, MemberID
                                ORDER BY Timestamp DESC) rn
    FROM    tableName
)
SELECT  EventID, MemberID, TimeStamp, Answer
FROM    recordsList
WHERE   rn = 1

答案 1 :(得分:3)

我也更喜欢CTE方法,但这是使用应该有效的子查询的另一种选择:

SELECT T.EventId, T.MemberId, T.TimeStamp, T.Answer
FROM TableName T
 JOIN (
   SELECT EventId, MemberId, Max(Timestamp) MaxTimeStamp
   FROM TableName
   GROUP BY EventId, MemberId ) T2 ON T.EventId = T2.EventId 
    AND T.MemberId = T2.MemberId 
    AND T.TimeStamp = T2.MaxTimeStamp

话虽如此,我想CTE会有更好的表现。

编辑 - 不再确定性能 - 这两者都是SQL Fiddle - 您可以看到每个的执行计划。

祝你好运。

答案 2 :(得分:3)

还有一个选择

SELECT a.EventId, a.MemberId, a.Timestamp, a.Answer
FROM Attendees a
WHERE a.EventId = 68 AND EXISTS (
              SELECT 1
              FROM Attendees
              WHERE EventId = a.EventId             
              GROUP BY MemberId
              HAVING  MAX(Timestamp) = a.Timestamp                      
                      AND MemberId  = a.MemberId
              )

SQLFiddle上的演示