我有一个oracle表,由于其他原因没有设置任何pk。它有5列,我希望能够删除重复的记录(如果5列值相同,它们是重复的)。我已经提出了这个SQL,但看起来这并没有找到重复的值:
SELECT DATE_TIME, SITE, RESPONSE_TIME, AVAIL_PERCENT, AGENT
FROM table_name
GROUP BY DATE_TIME, SITE, RESPONSE_TIME, AVAIL_PERCENT, AGENT
HAVING COUNT(*) > 1
样本记录:
DATE_TIME SITE RESPONSE_TIME AVAIL_PERCENT AGENT
20-Apr-13 04.23.00.00 AM Live Site (TxP)[IE]-Logon To My Accounts - User Time (seconds)[Geo Mean] 8.2610 100.00 45693
20-Apr-13 10.23.00.00 AM Live Site (TxP)[IE]-Logon To My Accounts - User Time (seconds)[Geo Mean] 6.2900 100.00 45693
24-Apr-13 07.22.00.00 AM Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean] 3.7300 100.00 45693
24-Apr-13 03.52.00.00 AM Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean] 3.7180 100.00 45693
08-May-13 06.52.00.00 AM Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean] 3.5970 100.00 45693
20-May-13 01.52.00.00 AM Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean] 3.7910 100.00 45693
25-Apr-13 01.52.00.00 AM Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean] 3.3400 100.00 45693
08-May-13 05.22.00.00 AM Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean] 2.4410 100.00 45693
09-May-13 01.22.00.00 AM Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean] 45693
21-May-13 06.52.00.00 AM Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean] 3.5480 100.00 45693
23-Apr-13 02.23.00.00 AM Live Site (TxP)[IE]-Logon To My Accounts - User Time (seconds)[Geo Mean] 10.7070 100.00 45693
26-Apr-13 09.22.00.00 AM Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean] 4.0070 100.00 45693
26-Apr-13 03.52.00.00 AM Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean] 3.9350 100.00 45693
22-May-13 12.52.00.00 PM Live Site (TxP)[IE]-Online Home Page - User Time (seconds)[Geo Mean] 4.1760 100.00 45693
23-Apr-13 02.53.00.00 AM Live Site (TxP)[IE]-Logon To My Accounts - User Time (seconds)[Geo Mean] 6.9500 100.00 45693
23-Apr-13 03.23.00.00 AM Live Site (TxP)[IE]-Logon To My Accounts - User Time (seconds)[Geo Mean] 6.0480 100.00 45693
23-Apr-13 04.23.00.00 AM Live Site (TxP)[IE]-Logon To My Accounts - User Time (seconds)[Geo Mean] 6.7600 100.00 45693
有什么想法吗?
答案 0 :(得分:1)
您可以将rowid作为伪主键引用,并运行删除行的查询,例如:
delete from
my_table
where
rowid not in (
select min(rowid)
from my_table
group by column_1,
column_2,
column_3,
etc)
column_1等是定义行唯一性的列集。
对于具有大量重复项的非常大的数据集,可能有更好的执行选项,但这是一种通常足够的快速方法。
答案 1 :(得分:0)
当您使用Oracle时,可以尝试以下操作来删除重复项:
DELETE my_table WHERE ROWID IN
(
SELECT ROWID FROM
(
SELECT
DATE_TIME, SITE, RESPONSE_TIME, AVAIL_PERCENT, AGENT, ROWID,
ROW_NUMBER() OVER (PARTITION BY
DATE_TIME, SITE, RESPONSE_TIME, AVAIL_PERCENT, AGENT ORDER BY DATE_TIME) ITM_IDX
FROM my_table
)
WHERE ITM_IDX > 1
);
答案 2 :(得分:0)
您打算创建主键吗? 您可以为异常创建一个表,Oracle会将违反主键的记录放在该表中。 如果存在违规行为,则不会创建主键本身,但您可以在之后分析不良数据。 =)
create table tb1
(field1 number, field2 varchar2(100));
--good data
insert into tb1 values (1, 'a');
insert into tb1 values (1, 'b');
insert into tb1 values (1, 'c');
insert into tb1 values (2, 'a');
insert into tb1 values (2, 'b');
insert into tb1 values (2, 'c');
-- bad data
insert into tb1 values (3, 'a');
insert into tb1 values (3, 'a');
commit;
-- a table for exceptions
create table tbl_exceptions (row_id rowid,
owner varchar2(30),
table_name varchar2(30),
constraint varchar2(30));
-- the primary key
-- if it fails, you have repeated registers
alter table tb1 add constraint pk1 primary key (field1, field2)
exceptions into tbl_exceptions;
-- bad data will be here
-- please notice its 'ROW_ID' from the second table
select tb1.*
from tb1,
tbl_exceptions
where tb1.rowid = tbl_exceptions.row_id;