Question

我有一个oracle表，由于其他原因没有设置任何pk。它有5列，我希望能够删除重复的记录（如果5列值相同，它们是重复的）。我已经提出了这个SQL，但看起来这并没有找到重复的值：

SELECT DATE_TIME, SITE, RESPONSE_TIME, AVAIL_PERCENT, AGENT
FROM table_name
GROUP BY DATE_TIME, SITE, RESPONSE_TIME, AVAIL_PERCENT, AGENT

HAVING COUNT(*) > 1

样本记录：

DATE_TIME                   SITE                                                                        RESPONSE_TIME AVAIL_PERCENT AGENT
20-Apr-13 04.23.00.00 AM    Live Site (TxP)[IE]-Logon To My Accounts - User Time (seconds)[Geo Mean]    8.2610  100.00  45693
20-Apr-13 10.23.00.00 AM    Live Site (TxP)[IE]-Logon To My Accounts - User Time (seconds)[Geo Mean]    6.2900  100.00  45693
24-Apr-13 07.22.00.00 AM    Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean]    3.7300  100.00  45693
24-Apr-13 03.52.00.00 AM    Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean]    3.7180  100.00  45693
08-May-13 06.52.00.00 AM    Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean]    3.5970  100.00  45693
20-May-13 01.52.00.00 AM    Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean]    3.7910  100.00  45693
25-Apr-13 01.52.00.00 AM    Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean]    3.3400  100.00  45693
08-May-13 05.22.00.00 AM    Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean]    2.4410  100.00  45693
09-May-13 01.22.00.00 AM    Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean]            45693
21-May-13 06.52.00.00 AM    Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean]    3.5480  100.00  45693
23-Apr-13 02.23.00.00 AM    Live Site (TxP)[IE]-Logon To My Accounts - User Time (seconds)[Geo Mean]    10.7070 100.00  45693
26-Apr-13 09.22.00.00 AM    Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean]    4.0070  100.00  45693
26-Apr-13 03.52.00.00 AM    Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean]    3.9350  100.00  45693
22-May-13 12.52.00.00 PM    Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean]    4.1760  100.00  45693
23-Apr-13 02.53.00.00 AM    Live Site (TxP)[IE]-Logon To My Accounts - User Time (seconds)[Geo Mean]    6.9500  100.00  45693
23-Apr-13 03.23.00.00 AM    Live Site (TxP)[IE]-Logon To My Accounts - User Time (seconds)[Geo Mean]    6.0480  100.00  45693
23-Apr-13 04.23.00.00 AM    Live Site (TxP)[IE]-Logon To My Accounts - User Time (seconds)[Geo Mean]    6.7600  100.00  45693

有什么想法吗？

Answer 1

您可以将rowid作为伪主键引用，并运行删除行的查询，例如：

delete from
  my_table
where
  rowid not in (
    select   min(rowid)
    from     my_table
    group by column_1,
             column_2,
             column_3,
             etc)

column_1等是定义行唯一性的列集。

对于具有大量重复项的非常大的数据集，可能有更好的执行选项，但这是一种通常足够的快速方法。

Answer 2

当您使用Oracle时，可以尝试以下操作来删除重复项：

DELETE my_table WHERE ROWID IN
(
  SELECT ROWID FROM
  (
    SELECT 
    DATE_TIME, SITE, RESPONSE_TIME, AVAIL_PERCENT, AGENT, ROWID, 
    ROW_NUMBER() OVER (PARTITION BY 
      DATE_TIME, SITE, RESPONSE_TIME, AVAIL_PERCENT, AGENT ORDER BY DATE_TIME) ITM_IDX
    FROM my_table
  )
  WHERE ITM_IDX > 1
);

Answer 3

您打算创建主键吗？您可以为异常创建一个表，Oracle会将违反主键的记录放在该表中。如果存在违规行为，则不会创建主键本身，但您可以在之后分析不良数据。 =）

create table tb1 
(field1 number, field2 varchar2(100));

--good data
insert into tb1 values (1, 'a');
insert into tb1 values (1, 'b');
insert into tb1 values (1, 'c');
insert into tb1 values (2, 'a');
insert into tb1 values (2, 'b');
insert into tb1 values (2, 'c');
-- bad data
insert into tb1 values (3, 'a');
insert into tb1 values (3, 'a');
commit;

-- a table for exceptions
create table tbl_exceptions (row_id rowid,
                             owner varchar2(30),
                             table_name varchar2(30),
                             constraint varchar2(30));

-- the primary key
-- if it fails, you have repeated registers
alter table tb1 add constraint pk1 primary key (field1, field2)
exceptions into tbl_exceptions;

-- bad data will be here
-- please notice its 'ROW_ID' from the second table
select tb1.*
from  tb1,
      tbl_exceptions 
where tb1.rowid = tbl_exceptions.row_id;

从oracle表中选择重复值

3 个答案: