我正在使用SQL,想知道如何获得2列中值相等的所有行。例如,假设此表:
+----+---------+
| ID | Version |
+----+---------+
| AB | 1 |
| AB | 1 |
| BA | 2 |
| BA | 2 |
| CB | 1 |
+----+---------+
我想选择ID和版本与ID和版本列中具有相同值的其他行匹配的所有行。换句话说,我想找到重复的值。因此所需的输出将是:
+----+---------+
| ID | Version |
+----+---------+
| AB | 1 |
| AB | 1 |
| BA | 2 |
| BA | 2 |
+----+---------+
如何在具有一百万行以上的表中尽可能高效地执行此操作?
答案 0 :(得分:1)
最简单的方法可能是窗口函数:
select t.*
from (select t.*,
count(*) over (partition by id, version) as cnt
from t
) t
where cnt >= 2;
如果您在(id, version)
(或(version, id)
)上有索引,那么数据库引擎应该能够利用它。
答案 1 :(得分:1)
如果每个组需要重复计数,请使用GROUP BY ... HAVING
。如果您需要统计重复行的总数,请对分组依据使用另一种汇总。
对于Oracle(fiddle),可以一步完成:
with a as (
select 1 as id, 'A' as v from dual union all
select 1 as id, 'A' as v from dual union all
select 1 as id, 'B' as v from dual union all
select 1 as id, 'B' as v from dual union all
select 2 as id, 'C' as v from dual
)
select sum(count(1)) as total_duplicates
, count(count(1)) as duplicate_groups
from a
group by id, v
having count(1) > 1
例如,对于SQL Server(fiddle),它不起作用,因此我在其上添加了另一个选择
with a as (
select 1 as id, 'A' as v union all
select 1 as id, 'A' as v union all
select 1 as id, 'B' as v union all
select 1 as id, 'B' as v union all
select 2 as id, 'C' as v
)
select sum(cnt) as total_duplicates
, count(cnt) as duplicate_groups
from (
select count(1) as cnt
from a
group by id, v
having count(1) > 1
) as q
| total_duplicates | duplicate_groups |
+-------------------------------------+
| 4 | 2 |