从oracle表中选择重复值

时间:2013-05-29 13:20:11

标签: sql oracle

我有一个oracle表,由于其他原因没有设置任何pk。它有5列,我希望能够删除重复的记录(如果5列值相同,它们是重复的)。我已经提出了这个SQL,但看起来这并没有找到重复的值:

SELECT DATE_TIME, SITE, RESPONSE_TIME, AVAIL_PERCENT, AGENT
FROM table_name
GROUP BY DATE_TIME, SITE, RESPONSE_TIME, AVAIL_PERCENT, AGENT

HAVING COUNT(*) > 1

样本记录:

DATE_TIME                   SITE                                                                        RESPONSE_TIME AVAIL_PERCENT AGENT
20-Apr-13 04.23.00.00 AM    Live Site (TxP)[IE]-Logon To My Accounts - User Time (seconds)[Geo Mean]    8.2610  100.00  45693
20-Apr-13 10.23.00.00 AM    Live Site (TxP)[IE]-Logon To My Accounts - User Time (seconds)[Geo Mean]    6.2900  100.00  45693
24-Apr-13 07.22.00.00 AM    Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean]    3.7300  100.00  45693
24-Apr-13 03.52.00.00 AM    Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean]    3.7180  100.00  45693
08-May-13 06.52.00.00 AM    Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean]    3.5970  100.00  45693
20-May-13 01.52.00.00 AM    Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean]    3.7910  100.00  45693
25-Apr-13 01.52.00.00 AM    Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean]    3.3400  100.00  45693
08-May-13 05.22.00.00 AM    Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean]    2.4410  100.00  45693
09-May-13 01.22.00.00 AM    Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean]            45693
21-May-13 06.52.00.00 AM    Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean]    3.5480  100.00  45693
23-Apr-13 02.23.00.00 AM    Live Site (TxP)[IE]-Logon To My Accounts - User Time (seconds)[Geo Mean]    10.7070 100.00  45693
26-Apr-13 09.22.00.00 AM    Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean]    4.0070  100.00  45693
26-Apr-13 03.52.00.00 AM    Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean]    3.9350  100.00  45693
22-May-13 12.52.00.00 PM    Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean]    4.1760  100.00  45693
23-Apr-13 02.53.00.00 AM    Live Site (TxP)[IE]-Logon To My Accounts - User Time (seconds)[Geo Mean]    6.9500  100.00  45693
23-Apr-13 03.23.00.00 AM    Live Site (TxP)[IE]-Logon To My Accounts - User Time (seconds)[Geo Mean]    6.0480  100.00  45693
23-Apr-13 04.23.00.00 AM    Live Site (TxP)[IE]-Logon To My Accounts - User Time (seconds)[Geo Mean]    6.7600  100.00  45693

有什么想法吗?

3 个答案:

答案 0 :(得分:1)

您可以将rowid作为伪主键引用,并运行删除行的查询,例如:

delete from
  my_table
where
  rowid not in (
    select   min(rowid)
    from     my_table
    group by column_1,
             column_2,
             column_3,
             etc)

column_1等是定义行唯一性的列集。

对于具有大量重复项的非常大的数据集,可能有更好的执行选项,但这是一种通常足够的快速方法。

答案 1 :(得分:0)

当您使用Oracle时,可以尝试以下操作来删除重复项:

DELETE my_table WHERE ROWID IN
(
  SELECT ROWID FROM
  (
    SELECT 
    DATE_TIME, SITE, RESPONSE_TIME, AVAIL_PERCENT, AGENT, ROWID, 
    ROW_NUMBER() OVER (PARTITION BY 
      DATE_TIME, SITE, RESPONSE_TIME, AVAIL_PERCENT, AGENT ORDER BY DATE_TIME) ITM_IDX
    FROM my_table
  )
  WHERE ITM_IDX > 1
);

答案 2 :(得分:0)

您打算创建主键吗? 您可以为异常创建一个表,Oracle会将违反主键的记录放在该表中。 如果存在违规行为,则不会创建主键本身,但您可以在之后分析不良数据。 =)

create table tb1 
(field1 number, field2 varchar2(100));

--good data
insert into tb1 values (1, 'a');
insert into tb1 values (1, 'b');
insert into tb1 values (1, 'c');
insert into tb1 values (2, 'a');
insert into tb1 values (2, 'b');
insert into tb1 values (2, 'c');
-- bad data
insert into tb1 values (3, 'a');
insert into tb1 values (3, 'a');
commit;

-- a table for exceptions
create table tbl_exceptions (row_id rowid,
                             owner varchar2(30),
                             table_name varchar2(30),
                             constraint varchar2(30));

-- the primary key
-- if it fails, you have repeated registers
alter table tb1 add constraint pk1 primary key (field1, field2)
exceptions into tbl_exceptions;

-- bad data will be here
-- please notice its 'ROW_ID' from the second table
select tb1.*
from  tb1,
      tbl_exceptions 
where tb1.rowid = tbl_exceptions.row_id;