COUNT(DISTINCT column_name)SQL Server 2008中的差异与COUNT(column_name)?

时间:2011-09-30 17:01:54

标签: sql sql-server tsql sql-server-2008

我遇到了一个让我疯狂的问题。   当运行下面的查询时,我得到 233,769

的计数
 SELECT COUNT(distinct  Member_List_Link.UserID)  
 FROM Member_List_Link  with (nolock)   
 INNER JOIN MasterMembers with (nolock)  
     ON Member_List_Link.UserID = MasterMembers.UserID   
  WHERE MasterMembers.Active = 1 And
        Member_List_Link.GroupID = 5 AND 
        MasterMembers.ValidUsers = 1 AND 
        Member_List_Link.Status = 1

但是,如果我运行相同的查询而不使用 distinct关键字,我会得到 233,748 的计数

 SELECT COUNT(Member_List_Link.UserID)  
 FROM Member_List_Link  with (nolock)   
 INNER JOIN MasterMembers with (nolock)
   ON Member_List_Link.UserID = MasterMembers.UserID   
 WHERE MasterMembers.Active = 1 And Member_List_Link.GroupID = 5 
  AND MasterMembers.ValidUsers = 1 AND Member_List_Link.Status = 1

为了测试,我重新创建了所有表并将它们放入临时表并再次运行查询:

  SELECT COUNT(distinct  #Temp_Member_List_Link.UserID)  
  FROM #Temp_Member_List_Link  with (nolock)   
  INNER JOIN #Temp_MasterMembers with (nolock)
    ON #Temp_Member_List_Link.UserID = #Temp_MasterMembers.UserID   
  WHERE #Temp_MasterMembers.Active = 1 And 
        #Temp_Member_List_Link.GroupID = 5 AND 
        #Temp_MasterMembers.ValidUsers = 1 AND 
        #Temp_Member_List_Link.Status = 1

没有distinct关键字

  SELECT COUNT(#Temp_Member_List_Link.UserID)  
  FROM #Temp_Member_List_Link  with (nolock)   
  INNER JOIN #Temp_MasterMembers with (nolock)
    ON #Temp_Member_List_Link.UserID = #Temp_MasterMembers.UserID   
  WHERE #Temp_MasterMembers.Active = 1 And 
        #Temp_Member_List_Link.GroupID = 5 AND 
        #Temp_MasterMembers.ValidUsers = 1 AND 
        #Temp_Member_List_Link.Status = 1

在旁注中,我通过简单地运行(select * from Member_List_Link into #temp...

来重新创建临时表

现在当我检查COUNT(列)与COUNT(不同列)与这些临时表之间的区别时,我看不到任何内容!

那么为什么原始表格存在差异?

我正在运行SQL Server 2008(开发版)。

更新 - 包括统计资料

PhysicalOp列仅用于第一个查询(无明显)

NULL
Compute Scalar
Stream Aggregate
Clustered Index Seek

PhysicalOp列仅用于第一个查询(带有不同的)

NULL
Compute Scalar
Stream Aggregate
Parallelism
Stream Aggregate
Hash Match
Hash Match
Bitmap
Parallelism
Index Seek
Parallelism
Clustered Index Scan

第一个查询的行和执行(无明显)

1   1
0   0
1   1
1   1

第二个查询的行和执行(具有不同的)

Rows    Executes
1   1
0   0
1   1
16  1
16  16
233767  16
233767  16
281901  16
281901  16
281901  16
234787  16
234787  16

将OPTION(MAXDOP 1)添加到第二个查询(带有不同的)

Rows Executes

1           1
0           0
1           1
233767          1
233767          1
281901          1
548396          1

由此产生的PhysicalOp

NULL
Compute Scalar
Stream Aggregate
Hash Match
Hash Match
Index Seek
Clustered Index Scan

6 个答案:

答案 0 :(得分:4)

FROM http://msdn.microsoft.com/en-us/library/ms187373.aspx NOLOCK相当于READUNCOMMITTED。有关更多信息,请参阅本主题后面的READUNCOMMITTED。

如果它们是转换的主题,则READUNCOMMITED将读取行两次 - 因为当事务处于IN过程时,数据库中存在roll foward和roll back行。

默认情况下,所有查询都被读取,这将排除未提交的行

当您插入临时表时,select将只提供已提交的行 - 我相信这涵盖了您尝试解释的所有症状

答案 1 :(得分:1)

我想我已经得到了你问题的答案,但首先告诉我userid是原始表中的主键吗?

如果是,那么CTAS查询创建临时表不会复制原始表的任何主键,它只复制NOT NULL约束,而不是主键的一部分..精细?

现在发生了什么事情你的原始表有一个主键所以count(distinct column_name)不包含带有空记录的元组,当你创建临时表时,主键不会被复制,因此NOT NULL约束不会到达临时表! !

对你来说很清楚吗?

答案 2 :(得分:1)

重现这种行为很难,所以我在黑暗中挣扎:

WITH(NOLOCK)语句可以读取未提交的数据。我猜你已经补充说不为你的用户锁定任何东西?如果删除它们并发出

SET TRANSACTION ISOLATION LEVEL READ COMMITTED

在执行查询之前,您应该获得更可靠的结果。但是,表格可能会在执行查询时收到锁定。

如果这不起作用,我的猜测是DISTINCT使用索引进行优化。检查查询计划,并根据需要重建索引。可能是你问题的根源。

答案 3 :(得分:0)

你得到什么结果

SELECT count(*) FROM (
    SELECT distinct  Member_List_Link.UserID
    FROM Member_List_Link  with (nolock)
    INNER JOIN MasterMembers with (nolock)
      ON Member_List_Link.UserID = MasterMembers.UserID
    WHERE MasterMembers.Active = 1 And
         Member_List_Link.GroupID = 5 AND 
         MasterMembers.ValidUsers = 1 AND
         Member_List_Link.Status = 1
) as m

AND WITH:

SELECT count(*) FROM (
    SELECT distinct  Member_List_Link.UserID
    FROM Member_List_Link  
    INNER JOIN MasterMembers
      ON Member_List_Link.UserID = MasterMembers.UserID
    WHERE MasterMembers.Active = 1 And
         Member_List_Link.GroupID = 5 AND 
         MasterMembers.ValidUsers = 1 AND
         Member_List_Link.Status = 1
) as m

答案 4 :(得分:0)

Ray,请尝试以下

SELECT COUNT(*)
FROM 
(
    SELECT Member_List_Link.UserID, ROW_NUMBER() OVER (PARTITION BY Member_List_Link.UserID ORDER BY (SELECT NULL)) N
    FROM Member_List_Link  with (nolock)   
    INNER JOIN MasterMembers with (nolock)  
        ON Member_List_Link.UserID = MasterMembers.UserID   
     WHERE MasterMembers.Active = 1 And
           Member_List_Link.GroupID = 5 AND 
           MasterMembers.ValidUsers = 1 AND 
           Member_List_Link.Status = 1
) A
WHERE N = 1

答案 5 :(得分:-1)

当您使用带有不同列的count时,它不会计算值为null的列。

create table #tmp(name char(4)null)

插入#tmp值(null)

插入#tmp值(null)

插入#tmp值(“AAA”)

查询: - 1 GT;从#tmp中选择count(*) 2 - ;去


       3

1>从#tmp中选择count(不同的名称) 2 - ;去


       1

1>从#tmp中选择不同的名称 2 - ;走  名称


NULL

AAA

但它适用于派生表

1> select count(*)from(从#tmp中选择不同的名字)a

2 - ;去


       2

注意: - 我在Sybase中测试了它