Question

有没有更好的方法来执行这样的查询：

SELECT COUNT(*) 
FROM (SELECT DISTINCT DocumentId, DocumentSessionId
      FROM DocumentOutputItems) AS internalQuery

我需要计算此表中不同项目的数量，但不同的是超过两列。

我的查询工作正常，但我想知道是否只使用一个查询就可以获得最终结果（不使用子查询）

Answer 1

如果您尝试提高性能，可以尝试在两列的散列值或连接值上创建持久计算列。

一旦持久化，如果列是确定性的并且您正在使用“理智的”数据库设置，则可以对其进行索引和/或在其上创建统计信息。

我相信计算列的一个独特计数将等同于您的查询。

Answer 2

编辑：从不太可靠的仅校验和查询进行更改我发现了一种方法（在SQL Server 2005中）对我来说非常好用，我可以使用尽可能多的列（通过将它们添加到CHECKSUM（）函数中）。 REVERSE（）函数将整数转换为varchars以使明显更加可靠

SELECT COUNT(DISTINCT (CHECKSUM(DocumentId,DocumentSessionId)) + CHECKSUM(REVERSE(DocumentId),REVERSE(DocumentSessionId)) )
FROM DocumentOutPutItems

Answer 3

您不喜欢的现有查询是什么？如果您担心两列中的DISTINCT不会返回唯一的排列，为什么不尝试呢？

它确实可以像您在Oracle中所期望的那样工作。

SQL> select distinct deptno, job from emp
  2  order by deptno, job
  3  /

    DEPTNO JOB
---------- ---------
        10 CLERK
        10 MANAGER
        10 PRESIDENT
        20 ANALYST
        20 CLERK
        20 MANAGER
        30 CLERK
        30 MANAGER
        30 SALESMAN

9 rows selected.


SQL> select count(*) from (
  2  select distinct deptno, job from emp
  3  )
  4  /

  COUNT(*)
----------
         9

SQL>

修改

我带着分析走了一条死胡同，但答案显然令人沮丧......

SQL> select count(distinct concat(deptno,job)) from emp 2 / COUNT(DISTINCTCONCAT(DEPTNO,JOB)) --------------------------------- 9 SQL>

编辑2

鉴于以下数据，上面提供的连接解决方案将错误计算：

col1 col2 ---- ---- A AA AA A

所以我们要包含一个分隔符......

select col1 + '*' + col2 from t23 /

显然，所选的分隔符必须是一个字符或一组字符，它们永远不会出现在任一列中。

Answer 4

如下：

select count(*)
from
  (select count(*) cnt
   from DocumentOutputItems
   group by DocumentId, DocumentSessionId) t1

可能只是和你一样，但它避免了DISTINCT。

Answer 5

To run as a single query, concatenate the columns, then get the distinct count of instances of the concatenated string.

SELECT count(DISTINCT concat(DocumentId, DocumentSessionId)) FROM DocumentOutputItems;

In MySQL you can do the same thing without the concatenation step as follows:

SELECT count(DISTINCT DocumentId, DocumentSessionId) FROM DocumentOutputItems;

This feature is mentioned in the MySQL documentation:

http://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_count-distinct

Answer 6

这是一个没有子选择的较短版本：

SELECT COUNT(DISTINCT DocumentId, DocumentSessionId) FROM DocumentOutputItems

它在MySQL中运行良好，我认为优化器更容易理解这个。

编辑：显然我误读了MSSQL和MySQL - 对不起，但也许它有帮助。

Answer 7

当我用Google搜索我自己的问题时发现了这个，发现如果算上DISTINCT对象，就会得到正确的数字返回（我正在使用MySQL）

SELECT COUNT(DISTINCT DocumentID) AS Count1, 
  COUNT(DISTINCT DocumentSessionId) AS Count2
  FROM DocumentOutputItems

Answer 8

您的查询没有任何问题，但您也可以这样做：

WITH internalQuery (Amount)
AS
(
    SELECT (0)
      FROM DocumentOutputItems
  GROUP BY DocumentId, DocumentSessionId
)
SELECT COUNT(*) AS NumberOfDistinctRows
  FROM internalQuery

Answer 9

希望这有效我正在撰写关于prima vista的文章

SELECT COUNT(*) 
FROM DocumentOutputItems 
GROUP BY DocumentId, DocumentSessionId

Answer 10

如果你只有一个字段“DISTINCT”，你可以使用：

SELECT COUNT(DISTINCT DocumentId) 
FROM DocumentOutputItems

并且返回与原始查询计划相同的查询计划，如使用SET SHOWPLAN_ALL ON进行测试。但是你使用两个字段，所以你可以尝试一些疯狂的事情：

    SELECT COUNT(DISTINCT convert(varchar(15),DocumentId)+'|~|'+convert(varchar(15), DocumentSessionId)) 
    FROM DocumentOutputItems

但如果涉及NULL，则会出现问题。我只是坚持原始查询。

Answer 11

许多（大多数？）SQL数据库可以使用值这样的元组，因此您可以这样做： SELECT COUNT(DISTINCT (DocumentId, DocumentSessionId)) FROM DocumentOutputItems; 如果您的数据库不支持此功能，可以根据@ oncel-umut-turer对CHECKSUM或其他标量函数的建议进行模拟，例如提供良好的唯一性，例如： COUNT(DISTINCT CONCAT(DocumentId, ':', DocumentSessionId))。

元组的相关用法是执行IN个查询，例如： SELECT * FROM DocumentOutputItems WHERE (DocumentId, DocumentSessionId) in (('a', '1'), ('b', '2'));

Answer 12

我希望MS SQL也可以执行COUNT（DISTINCT A，B）之类的操作。但它不能。

起初JayTee的答案在某些测试后似乎是我的解决方案.CHECKSUM（）无法创建唯一值。一个简单的例子是，CHECKSUM（31,467,519）和CHECKSUM（69,1120,823）给出了相同的答案，即55.

然后我做了一些研究，发现Microsoft不建议使用CHECKSUM进行更改检测。有些论坛建议使用

SELECT COUNT(DISTINCT CHECKSUM(value1, value2, ..., valueN) + CHECKSUM(valueN, value(N-1), ..., value1))

但这也不是很有用。

您可以按TSQL CHECKSUM conundrum中的建议使用HASHBYTES（）函数。然而，这也不太可能无法返回独特的结果。

我建议使用

SELECT COUNT(DISTINCT CAST(DocumentId AS VARCHAR)+'-'+CAST(DocumentSessionId AS VARCHAR)) FROM DocumentOutputItems

Answer 13

我已经使用了这种方法，并且对我有用。

SELECT COUNT(DISTINCT DocumentID || DocumentSessionId) 
FROM  DocumentOutputItems

对于我来说，它可以提供正确的结果。

Answer 14

它对我有用。在oracle：

SELECT SUM(DECODE(COUNT(*),1,1,1))
FROM DocumentOutputItems GROUP BY DocumentId, DocumentSessionId;

在jpql中：

SELECT SUM(CASE WHEN COUNT(i)=1 THEN 1 ELSE 1 END)
FROM DocumentOutputItems i GROUP BY i.DocumentId, i.DocumentSessionId;

Answer 15

这个怎么样，

Select DocumentId, DocumentSessionId, count(*) as c 
from DocumentOutputItems 
group by DocumentId, DocumentSessionId;

这将使我们得到DocumentId和DocumentSessionId

的所有可能组合的计数

Answer 16

您可以只使用两次计数功能。

在这种情况下，它将是：

SELECT COUNT (DISTINCT DocumentId), COUNT (DISTINCT DocumentSessionId) 
FROM DocumentOutputItems

Answer 17

如果您使用固定长度的数据类型，则可以强制转换为binary，以非常轻松，快速地完成此操作。假设DocumentId和DocumentSessionId均为int，因此它们的长度为4个字节...

SELECT COUNT(DISTINCT CAST(DocumentId as binary(4)) + CAST(DocumentSessionId as binary(4)))
FROM DocumentOutputItems

我的特定问题要求我将SUM除以各种外键和日期字段的不同组合的COUNT，再按另一个外键分组，并偶尔按某些值或键进行过滤。该表非常大，使用子查询会大大增加查询时间。而且由于复杂性，统计信息根本不是一个可行的选择。 CHECKSUM解决方案的转换也太慢了，特别是由于各种数据类型的结果，我不能冒险说它的不可靠性。

但是，使用上述解决方案实际上并没有增加查询时间（与仅使用SUM相比），并且应该是完全可靠的！它应该能够在类似情况下帮助其他人，所以我将其发布在这里。

Answer 18

此代码在2个参数上使用了distinct，并提供了特定于那些不同值的行数的行数计数。它在MySQL中像魅力一样对我有用。

<?php $field = get_field_object('custom_field_name'); if( $field['choices'] ): ?>

<ul>
    <?php foreach( $field['choices'] as $value => $label ): ?>
        <li><?php echo $label; ?></li>
    <?php endforeach; ?>
</ul>
<?php endif; ?>

在多列上计算DISTINCT

18 个答案: