MySQL - 如何规范化包含分隔符分隔ID的列

时间:2016-05-18 12:44:09

标签: php mysql sql

我试图规范化一个表,前一个开发人员设计的表包含一个包含管道分隔ID的列,这些ID链接到同一个表中的其他行。

客户表

id    |    aliases (VARCHAR)
----------------------------
1     |    |4|58|76
2     |    
3     |
4     |    |1|58|76
...   |    
58    |    |1|4|76
...   |
76    |    |1|4|58

所以客户1,4,58和76都是"别名"彼此的。客户2和3没有别名,因此该字段包含空字符串。

我想取消整个"别名"系统,并规范化数据,以便我可以将其他所有客户映射到一条记录。所以我希望客户1,4,58和76的相关表格数据全部映射到客户1。

我想我会填充一个新表,然后我可以加入并在其他表上执行更新。

加入表格

id  |  customer_id  |  alias_id
-------------------------------
1   |  1            |  4
2   |  1            |  58
3   |  1            |  76

如何将第一张表中的数据转换为上述格式?如果这将是纯SQL中的绝对噩梦,我将编写一个PHP脚本,尝试执行此工作并插入数据。

3 个答案:

答案 0 :(得分:3)

当我开始回答这个问题时,我认为这将是快速而简单的,因为我在SQL Server中做过一次非常类似的事情,但证明了翻译中的概念已经发展成为这个完整的解决方案。

从您的问题中无法清楚的一个警告是,您是否有条件声明主要ID与别名ID。例如,此解决方案将允许1具有别名4以及4具有别名1,这与简化示例问题中提供的数据一致。

要设置此示例的数据,我使用了以下结构:

CREATE TABLE notnormal_customers (
  id INT NOT NULL PRIMARY KEY,
  aliases VARCHAR(10)
);

INSERT INTO notnormal_customers (id,aliases)
VALUES
(1,'|4|58|76'),
(2,''),
(3,''),
(4,'|1|58|76'),
(58,'|1|4|76'),
(76,'|1|4|58');

首先,为了表示一个客户到多个别名的一对多关系,我创建了这个表:

CREATE TABLE customer_aliases (
    primary_id INT NOT NULL,
    alias_id INT NOT NULL,
    FOREIGN KEY (primary_id) REFERENCES notnormal_customers(id),
    FOREIGN KEY (alias_id)   REFERENCES notnormal_customers(id),
    /* clustered primary key prevents duplicates */
    PRIMARY KEY (primary_id,alias_id)
)

最重要的是,我们会使用custom SPLIT_STR function

CREATE FUNCTION SPLIT_STR(
  x VARCHAR(255),
  delim VARCHAR(12),
  pos INT
)
RETURNS VARCHAR(255)
RETURN REPLACE(SUBSTRING(SUBSTRING_INDEX(x, delim, pos),
       LENGTH(SUBSTRING_INDEX(x, delim, pos -1)) + 1),
       delim, '');

然后我们将创建一个存储过程来完成所有工作。代码注释了对源引用的注释。

DELIMITER $$
CREATE PROCEDURE normalize_customers()
BEGIN

  DECLARE cust_id INT DEFAULT 0;
  DECLARE al_id INT UNSIGNED DEFAULT 0;
  DECLARE alias_str VARCHAR(10) DEFAULT '';
  /* set the value of the string delimiter */
  DECLARE string_delim CHAR(1) DEFAULT '|';
  DECLARE count_aliases INT DEFAULT 0;
  DECLARE i INT DEFAULT 1;

  /*
    use cursor to iterate through all customer records
    http://burnignorance.com/mysql-tips/how-to-loop-through-a-result-set-in-mysql-strored-procedure/
  */
  DECLARE done INT DEFAULT 0;
  DECLARE cur CURSOR FOR
      SELECT `id`, `aliases`
      FROM `notnormal_customers`;
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;

  OPEN cur;
  read_loop: LOOP

    /*
      Fetch one record from CURSOR and set to customer id and alias string.
      If not found then `done` will be set to 1 by continue handler.
    */
    FETCH cur INTO cust_id, alias_str;
    IF done THEN
        /* If done set to 1 then exit the loop, else continue. */
        LEAVE read_loop;
    END IF;

    /* skip to next record if no aliases */
    IF alias_str = '' THEN
      ITERATE read_loop;
    END IF;

    /*
      get number of aliases
      https://pisceansheart.wordpress.com/2008/04/15/count-occurrence-of-character-in-a-string-using-mysql/
    */
    SET count_aliases = LENGTH(alias_str) - LENGTH(REPLACE(alias_str, string_delim, ''));

    /* strip off the first pipe to make it compatible with our SPLIT_STR function */
    SET alias_str = SUBSTR(alias_str, 2);

    /*
      iterate and get each alias from custom split string function
      https://stackoverflow.com/questions/18304857/split-delimited-string-value-into-rows
    */
    WHILE i <= count_aliases DO

      /* get the next alias id */
      SET al_id = CAST(SPLIT_STR(alias_str, string_delim, i) AS UNSIGNED);
      /* REPLACE existing values instead of insert to prevent errors on primary key */
      REPLACE INTO customer_aliases (primary_id,alias_id) VALUES (cust_id,al_id);
      SET i = i+1;

    END WHILE;
    SET i = 1;

  END LOOP;
  CLOSE cur;

END$$
DELIMITER ;

最后,您只需通过调用:

即可运行它
CALL normalize_customers();

然后你可以在控制台中检查数据:

mysql> select * from customer_aliases;
+------------+----------+
| primary_id | alias_id |
+------------+----------+
|          4 |        1 |
|         58 |        1 |
|         76 |        1 |
|          1 |        4 |
|         58 |        4 |
|         76 |        4 |
|          1 |       58 |
|          4 |       58 |
|         76 |       58 |
|          1 |       76 |
|          4 |       76 |
|         58 |       76 |
+------------+----------+
12 rows in set (0.00 sec)

答案 1 :(得分:2)

更新2(One-Query-Solution)

假设别名列表始终排序,您只需一个查询即可实现结果:

CREATE TABLE aliases (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  customer_id INT UNSIGNED NOT NULL,
  alias_id INT UNSIGNED NOT NULL
) AS
  SELECT NULL AS id, c1.id AS customer_id, c2.id AS alias_id
  FROM customers c1
  JOIN customers c2 
    ON c2.aliases LIKE CONCAT('|', c1.id , '|%') -- c1.id is the first alias of c2.id
  WHERE c1.id < (SUBSTRING(c1.aliases,2)+0) -- c1.id is smaller than the first alias of c2.id

如果对aliases列编制索引,它也会快得多,因此范围搜索将支持JOIN。

sqlfiddle

原始回答

如果用逗号替换管道,可以使用FIND_IN_SET函数。

我首先要创建一个临时表(不需要技术上是临时的)来存储以逗号分隔的别名列表:

CREATE TABLE tmp (`id` int, `aliases` varchar(50));
INSERT INTO tmp(`id`, `aliases`)
  SELECT id, REPLACE(aliases, '|', ',')  AS aliases
  FROM customers;

然后使用JOINs ON子句中的FIND_IN_SET填充规范化表:

CREATE TABLE aliases (`id` int, `customer_id` int, `alias_id` int) AS
  SELECT t.id as customer_id, c.id AS alias_id
  FROM tmp t
  JOIN customers c ON find_in_set(c.id, t.aliases);

如果需要 - 删除具有更高customer_id的重复项(仅保持最低):

DELETE FROM aliases 
WHERE customer_id IN (SELECT * FROM(
  SELECT DISTINCT a1.customer_id
  FROM aliases a1
  JOIN aliases a2
    ON  a2.customer_id = a1.alias_id
    AND a1.customer_id = a2.alias_id
    AND a1.customer_id > a1.alias_id
)derived);

如果需要 - 创建AUTO_INCREMENT id:

ALTER TABLE aliases ADD column id INT(10) UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST;

aliases表现在看起来像这样:

| id | customer_id | alias_id |
|----|-------------|----------|
|  1 |           1 |        4 |
|  2 |           1 |       58 |
|  3 |           1 |       76 |

sqlfiddle

不要忘记定义正确的索引。

更新1

您可以跳过创建临时表并使用LIKE而不是FIND_IN_SET填充aliases表:

CREATE TABLE aliases (`customer_id` int, `alias_id` int) AS
  SELECT c2.id as customer_id, c1.id AS alias_id
  FROM customers c1
  JOIN customers c2 
    ON CONCAT(c1.aliases, '|') LIKE CONCAT('%|', c2.id , '|%');

sqlfiddle

答案 2 :(得分:2)

使用整数表(0-9) - 尽管你可以用(SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3...etc.)实现相同的目标......

 SELECT DISTINCT id old_id /* the technique below inevitably creates duplicates. */
                          /* DISTINCT discards them. */
              , SUBSTRING_INDEX(
                  SUBSTRING_INDEX(SUBSTR(aliases,2),'|',i+1) /* isolate text between */
                        ,'|',-1) x                           /* each pipe and the next */
           FROM customers
              , ints      /* do this for the first 10 pipes in each string */
          ORDER
             BY id,x+0    /* implicit CASTING */
+--------+------+
| old_id | x    |
+--------+------+
|      1 | 4    |
|      1 | 58   |
|      1 | 76   |
|      2 | NULL |
|      3 | NULL |
|      4 | 1    |
|      4 | 58   |
|      4 | 76   |
|     58 | 1    |
|     58 | 4    |
|     58 | 76   |
|     76 | 1    |
|     76 | 4    |
|     76 | 58   |
+--------+------+

(编辑:添加在线评论)

相关问题