Question

我有如下表格：

id | col1 | col2 | col3   | col4
---+------+------+--------+-----------
 1 | abc  | 23   | data1  | otherdata1
 2 | def  | 41   | data2  | otherdata2
 3 | ghi  | 41   | data3  | otherdata3
 4 | jkl  | 58   | data4  | otherdata4
 5 | mno  | 23   | data1  | otherdata5
 6 | pqr  | 41   | data3  | otherdata6
 7 | stu  | 76   | data2  | otherdata7

如何快速选择col2 + col3没有重复的行？表中有超过1500万行，因此加入可能不合适。

最终结果应如下所示：

id | col1 | col2 | col3   | col4
---+------+------+--------+-----------
 2 | def  | 41   | data2  | otherdata2
 4 | jkl  | 58   | data4  | otherdata4
 7 | stu  | 76   | data2  | otherdata7

Answer 1

不确定这会有多快，但这应该有效：

select id, col1, col2, col3, col4
from (
  select id, col1, col2, col3, col4, 
         count(*) over (partition by col2, col3) as cnt
  from the_table
) t
where cnt = 1
order by id;

Answer 2

窗口功能绝对是一种可能性。但是，如果你关心性能，也值得尝试另一种方法并比较速度。

脑海中浮现出

NOT EXISTS：

select t.*
from table t
where not exists (select 1
                  from table t2
                  where t2.col2 = t.col2 and t2.col3 = t.col3 and
                        t2.id <> t.id
                 );

这可以利用table(col2, col3)上的索引。

Answer 3

试试这个..

select * from 
(
select id,col1,col2,col3,col4
,row_number() over (partition by col2,col3 order by col2,col3 desc  ) as rnm
from
table
)  x where  rnm =1;

在某些列中排除具有相同值的行

3 个答案: