删除冗余记录

时间:2012-12-27 07:42:24

标签: mysql

我有一张桌子:

+------------+------------------------------------------------------+------+-----+---------+-------+
| Field      | Type                                                 | Null | Key | Default | Extra |
+------------+------------------------------------------------------+------+-----+---------+-------+
| person_id1 | int(10)                                              | NO   | MUL | 0       |       |
| person_id2 | int(10)                                              | NO   | MUL | 0       |       |
| priority   | smallint(5)                                          | NO   |     | 0       |       |
| link_type  | enum('member_of_band','legal_name','performs_as','') | NO   |     |         |       |
+------------+------------------------------------------------------+------+-----+---------+-------+

此表上没有主键,但person_id1和person_id2上有索引。

问题是 - 我们有不一致的数据,例如,这个查询:

SELECT
    COUNT(*) as c, person_id1, person_id2
FROM person_person
WHERE link_type = "member_of_band"
GROUP BY person_id1, person_id2
HAVING c > 1
LIMIT 10;

返回:

+---+------------+------------+
| c | person_id1 | person_id2 |
+---+------------+------------+
| 2 |   50674235 |   51048792 |
| 3 |   50674245 |   50715733 |
| 2 |   50674283 |   50712621 |
| 2 |   50674322 |   50714244 |
| 2 |   50674378 |   51048804 |
| 2 |   50674438 |   51048812 |
| 4 |   50674442 |   50715733 |
| 2 |   50674449 |   50716913 |
| 2 |   50674455 |   51048803 |
| 3 |   50674469 |   50715733 |
+---+------------+------------+

有没有办法删除所有冗余记录并留下那些没问题?

我所有的一切都是:

DELETE person_person FROM person_person
WHERE (person_id1, person_id2) IN (

    SELECT
        person_id1, person_id2
    FROM person_person
    WHERE link_type = "member_of_band"
    GROUP BY person_id1, person_id2
    HAVING COUNT(*) > 1
    LIMIT 100

) AND link_type = "member_of_band";

但是这会删除所有带双打的记录,我需要删除双打。

mysql> select * from person_person where person_id1 = 50674245 and person_id2 = 50715733;
+------------+------------+----------+----------------+
| person_id1 | person_id2 | priority | link_type      |
+------------+------------+----------+----------------+
|   50674245 |   50715733 |        0 | member_of_band |
|   50674245 |   50715733 |        0 | member_of_band |
|   50674245 |   50715733 |        0 | member_of_band |
+------------+------------+----------+----------------+

1 个答案:

答案 0 :(得分:4)

ALTER IGNORE TABLE person_person ADD UNIQUE INDEX (person_id1, person_id2, link_type);