我试图规范化一个表,前一个开发人员设计的表包含一个包含管道分隔ID的列,这些ID链接到同一个表中的其他行。
客户表
id | aliases (VARCHAR)
----------------------------
1 | |4|58|76
2 |
3 |
4 | |1|58|76
... |
58 | |1|4|76
... |
76 | |1|4|58
所以客户1,4,58和76都是"别名"彼此的。客户2和3没有别名,因此该字段包含空字符串。
我想取消整个"别名"系统,并规范化数据,以便我可以将其他所有客户映射到一条记录。所以我希望客户1,4,58和76的相关表格数据全部映射到客户1。
我想我会填充一个新表,然后我可以加入并在其他表上执行更新。
加入表格
id | customer_id | alias_id
-------------------------------
1 | 1 | 4
2 | 1 | 58
3 | 1 | 76
如何将第一张表中的数据转换为上述格式?如果这将是纯SQL中的绝对噩梦,我将编写一个PHP脚本,尝试执行此工作并插入数据。
答案 0 :(得分:3)
当我开始回答这个问题时,我认为这将是快速而简单的,因为我在SQL Server中做过一次非常类似的事情,但证明了翻译中的概念已经发展成为这个完整的解决方案。
从您的问题中无法清楚的一个警告是,您是否有条件声明主要ID与别名ID。例如,此解决方案将允许1具有别名4以及4具有别名1,这与简化示例问题中提供的数据一致。
要设置此示例的数据,我使用了以下结构:
CREATE TABLE notnormal_customers (
id INT NOT NULL PRIMARY KEY,
aliases VARCHAR(10)
);
INSERT INTO notnormal_customers (id,aliases)
VALUES
(1,'|4|58|76'),
(2,''),
(3,''),
(4,'|1|58|76'),
(58,'|1|4|76'),
(76,'|1|4|58');
首先,为了表示一个客户到多个别名的一对多关系,我创建了这个表:
CREATE TABLE customer_aliases (
primary_id INT NOT NULL,
alias_id INT NOT NULL,
FOREIGN KEY (primary_id) REFERENCES notnormal_customers(id),
FOREIGN KEY (alias_id) REFERENCES notnormal_customers(id),
/* clustered primary key prevents duplicates */
PRIMARY KEY (primary_id,alias_id)
)
最重要的是,我们会使用custom SPLIT_STR
function:
CREATE FUNCTION SPLIT_STR(
x VARCHAR(255),
delim VARCHAR(12),
pos INT
)
RETURNS VARCHAR(255)
RETURN REPLACE(SUBSTRING(SUBSTRING_INDEX(x, delim, pos),
LENGTH(SUBSTRING_INDEX(x, delim, pos -1)) + 1),
delim, '');
然后我们将创建一个存储过程来完成所有工作。代码注释了对源引用的注释。
DELIMITER $$
CREATE PROCEDURE normalize_customers()
BEGIN
DECLARE cust_id INT DEFAULT 0;
DECLARE al_id INT UNSIGNED DEFAULT 0;
DECLARE alias_str VARCHAR(10) DEFAULT '';
/* set the value of the string delimiter */
DECLARE string_delim CHAR(1) DEFAULT '|';
DECLARE count_aliases INT DEFAULT 0;
DECLARE i INT DEFAULT 1;
/*
use cursor to iterate through all customer records
http://burnignorance.com/mysql-tips/how-to-loop-through-a-result-set-in-mysql-strored-procedure/
*/
DECLARE done INT DEFAULT 0;
DECLARE cur CURSOR FOR
SELECT `id`, `aliases`
FROM `notnormal_customers`;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
OPEN cur;
read_loop: LOOP
/*
Fetch one record from CURSOR and set to customer id and alias string.
If not found then `done` will be set to 1 by continue handler.
*/
FETCH cur INTO cust_id, alias_str;
IF done THEN
/* If done set to 1 then exit the loop, else continue. */
LEAVE read_loop;
END IF;
/* skip to next record if no aliases */
IF alias_str = '' THEN
ITERATE read_loop;
END IF;
/*
get number of aliases
https://pisceansheart.wordpress.com/2008/04/15/count-occurrence-of-character-in-a-string-using-mysql/
*/
SET count_aliases = LENGTH(alias_str) - LENGTH(REPLACE(alias_str, string_delim, ''));
/* strip off the first pipe to make it compatible with our SPLIT_STR function */
SET alias_str = SUBSTR(alias_str, 2);
/*
iterate and get each alias from custom split string function
https://stackoverflow.com/questions/18304857/split-delimited-string-value-into-rows
*/
WHILE i <= count_aliases DO
/* get the next alias id */
SET al_id = CAST(SPLIT_STR(alias_str, string_delim, i) AS UNSIGNED);
/* REPLACE existing values instead of insert to prevent errors on primary key */
REPLACE INTO customer_aliases (primary_id,alias_id) VALUES (cust_id,al_id);
SET i = i+1;
END WHILE;
SET i = 1;
END LOOP;
CLOSE cur;
END$$
DELIMITER ;
最后,您只需通过调用:
即可运行它CALL normalize_customers();
然后你可以在控制台中检查数据:
mysql> select * from customer_aliases;
+------------+----------+
| primary_id | alias_id |
+------------+----------+
| 4 | 1 |
| 58 | 1 |
| 76 | 1 |
| 1 | 4 |
| 58 | 4 |
| 76 | 4 |
| 1 | 58 |
| 4 | 58 |
| 76 | 58 |
| 1 | 76 |
| 4 | 76 |
| 58 | 76 |
+------------+----------+
12 rows in set (0.00 sec)
答案 1 :(得分:2)
更新2(One-Query-Solution)
假设别名列表始终排序,您只需一个查询即可实现结果:
CREATE TABLE aliases (
id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
customer_id INT UNSIGNED NOT NULL,
alias_id INT UNSIGNED NOT NULL
) AS
SELECT NULL AS id, c1.id AS customer_id, c2.id AS alias_id
FROM customers c1
JOIN customers c2
ON c2.aliases LIKE CONCAT('|', c1.id , '|%') -- c1.id is the first alias of c2.id
WHERE c1.id < (SUBSTRING(c1.aliases,2)+0) -- c1.id is smaller than the first alias of c2.id
如果对aliases
列编制索引,它也会快得多,因此范围搜索将支持JOIN。
原始回答
如果用逗号替换管道,可以使用FIND_IN_SET函数。
我首先要创建一个临时表(不需要技术上是临时的)来存储以逗号分隔的别名列表:
CREATE TABLE tmp (`id` int, `aliases` varchar(50));
INSERT INTO tmp(`id`, `aliases`)
SELECT id, REPLACE(aliases, '|', ',') AS aliases
FROM customers;
然后使用JOINs ON子句中的FIND_IN_SET填充规范化表:
CREATE TABLE aliases (`id` int, `customer_id` int, `alias_id` int) AS
SELECT t.id as customer_id, c.id AS alias_id
FROM tmp t
JOIN customers c ON find_in_set(c.id, t.aliases);
如果需要 - 删除具有更高customer_id的重复项(仅保持最低):
DELETE FROM aliases
WHERE customer_id IN (SELECT * FROM(
SELECT DISTINCT a1.customer_id
FROM aliases a1
JOIN aliases a2
ON a2.customer_id = a1.alias_id
AND a1.customer_id = a2.alias_id
AND a1.customer_id > a1.alias_id
)derived);
如果需要 - 创建AUTO_INCREMENT id:
ALTER TABLE aliases ADD column id INT(10) UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST;
aliases
表现在看起来像这样:
| id | customer_id | alias_id |
|----|-------------|----------|
| 1 | 1 | 4 |
| 2 | 1 | 58 |
| 3 | 1 | 76 |
不要忘记定义正确的索引。
更新1
您可以跳过创建临时表并使用LIKE而不是FIND_IN_SET填充aliases
表:
CREATE TABLE aliases (`customer_id` int, `alias_id` int) AS
SELECT c2.id as customer_id, c1.id AS alias_id
FROM customers c1
JOIN customers c2
ON CONCAT(c1.aliases, '|') LIKE CONCAT('%|', c2.id , '|%');
答案 2 :(得分:2)
使用整数表(0-9) - 尽管你可以用(SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3...etc.)
实现相同的目标......
SELECT DISTINCT id old_id /* the technique below inevitably creates duplicates. */
/* DISTINCT discards them. */
, SUBSTRING_INDEX(
SUBSTRING_INDEX(SUBSTR(aliases,2),'|',i+1) /* isolate text between */
,'|',-1) x /* each pipe and the next */
FROM customers
, ints /* do this for the first 10 pipes in each string */
ORDER
BY id,x+0 /* implicit CASTING */
+--------+------+
| old_id | x |
+--------+------+
| 1 | 4 |
| 1 | 58 |
| 1 | 76 |
| 2 | NULL |
| 3 | NULL |
| 4 | 1 |
| 4 | 58 |
| 4 | 76 |
| 58 | 1 |
| 58 | 4 |
| 58 | 76 |
| 76 | 1 |
| 76 | 4 |
| 76 | 58 |
+--------+------+
(编辑:添加在线评论)