我有一张表格,上面列出了无效字符,例如:
InVCh
-----
!
"
$
%
&
'
(
)
*
+
,
.
/
然后,我有很多表,它们的列数不同(所有这些列都是字符串类型),例如:
Product Store
------- ------
Prod1 Store1
Pr$od!2 Sto$re!2
P:;()ro!!!"d3 S:;()to!!!"re3
我想创建一个查找所有那些无效字符并将其替换为空白的过程,如果在一起的空白太多,那么我必须用一个空格替换它们。所以我的预期结果应该是:
Product Store
------- ------
Prod1 Store1
Pr od 2 Sto re 2
P ro d3 S to re3
这可能吗?
谢谢!
答案 0 :(得分:2)
因为它是SQL Server 2016,所以使用R is an option。这似乎没有什么牵强的,因为 2017 中有一篇MSSQLTips文章对此进行了描述:SQL Server 2016 Regular Expressions with the R Language。
文章的代码也不难:
create table dbo.tblRegEx (id int identity, a varchar(300), b varchar(300) );
-- 3. Remove duplicate words
exec sp_execute_external_script @language=N'R'
, @script = N'
pattern <-"\\b(\\w+\\s*)(\\1\\s*)+";
inData$a <- gsub(pattern, "\\1", inData$a, perl = T );
outData <- inData;'
, @input_data_1 = N'select id, a, b from dbo.tblRegEx'
, @input_data_1_name = N'inData'
, @output_data_1_name=N'outData'
with result sets ( as object dbo.tblRegEx);
这个问题要求简单得多,只需替换一些字符即可。
create table #products
(
id int primary key identity,
product varchar(300),
store varchar(300)
);
go
insert into #products (product,store)
values
('Prod1', 'Store1'),
('Pr$od!2', 'Sto$re!2'),
('P:;()ro!!!"d3', 'S:;()to!!!"re3')
exec sp_execute_external_script @language=N'R'
, @script = N'
pattern <-"[!\"$%&''()*+,./:;]+";
inData$product <- gsub(pattern, " ", inData$product, perl = T );
inData$store <- gsub(pattern, " ", inData$store, perl = T );
outData <- inData;'
, @input_data_1 = N'select id, product, store from #products'
, @input_data_1_name = N'inData'
, @output_data_1_name=N'outData'
with result sets ( as object #products);
与所有存储过程一样,结果只能返回到客户端,或用作INSERT INTO
的源。这可能是一个表或临时表或一个表变量,可用于更新源表:
declare @outData table (id int primary key, product varchar(300), store varchar(300) );
insert into @outData
exec sp_execute_external_script @language=N'R'
, @script = N'
pattern <-"[!\"$%&''()*+,./:;]+";
inData$product <- gsub(pattern, " ", inData$product, perl = T );
inData$store <- gsub(pattern, " ", inData$store, perl = T );
outData <- inData;'
, @input_data_1 = N'select id, product, store from #products'
, @input_data_1_name = N'inData'
, @output_data_1_name=N'outData'
update #products
set product = r.product,
store = r.store
from #products inner join @outdata r on r.id=#products.id
select * from #products
这将返回:
id product store
-- ------- --------
1 Prod1 Store1
2 Pr od 2 Sto re 2
3 P ro d3 S to re3
答案 1 :(得分:-1)
没有版本,我假设您可以使用最新工具。因此,您可以使用FOR XML PATH
在需要替换的字符上创建一个字符串,然后使用TRANSLATE
删除所有字符:
WITH C AS(
SELECT *
FROM (VALUES('!'),
('"'),
('$'),
('%'),
('&'),
(''''),
('('),
(')'),
('*'),
('+'),
(','),
('.'),
('/'))V(InVCh)),
PS AS (
SELECT *
FROM (VALUES('Prod1','Store1'),
('Pr$od!2','Sto$re!2'),
('P:;()ro!!!"d3','S:;()to!!!"re3')) V(Product,Store))
SELECT REPLACE(TRANSLATE(PS.Product,V.C,REPLICATE(LEFT(V.C,1),LEN(V.C))),LEFT(V.C,1),'') AS Product,
REPLACE(TRANSLATE(PS.Store,V.C,REPLICATE(LEFT(V.C,1),LEN(V.C))),LEFT(V.C,1),'') AS Store
FROM PS
CROSS APPLY (VALUES((SELECT '' + InVCh
FROM C
FOR XML PATH(''),TYPE).value('.','varchar(MAX)')))V(C);
请注意,第三行的返回值为'P:;rod3'
和'S:;tore3'
,因为分号(;
)或冒号(:
)都不在您的列表中要删除的字符。您需要添加所有需要替换的字符。
OP似乎在评论中提到他们正在使用2016(为什么知道您使用的版本很重要!)。使用Ngrams8K
可以做到这一点(虽然看起来很凌乱):
WITH C AS(
SELECT *
FROM (VALUES('!'),
('"'),
('$'),
('%'),
('&'),
(''''),
('('),
(')'),
('*'),
('+'),
(','),
('.'),
('/'))V(InVCh)),
PS AS (
SELECT *
FROM (VALUES(1,'Prod1','Store1'),
(2,'Pr$od!2','Sto$re!2'),
(3,'P:;()ro!!!"d3','S:;()to!!!"re3')) V(ID,Product,Store))
SELECT PS.Product,V.Product,
PS.Store,V.Store
FROM PS
CROSS APPLY (VALUES((SELECT '' + N.token
FROM dbo.NGrams8k(PS.Product,1) N
WHERE NOT EXISTS (SELECT 1
FROM C
WHERE C.InVCh = N.token)
ORDER BY position
FOR XML PATH(''),TYPE).value('.','varchar(8000)'),
(SELECT '' + N.token
FROM dbo.NGrams8k(PS.Store,1) N
WHERE NOT EXISTS (SELECT 1
FROM C
WHERE C.InVCh = N.token)
ORDER BY position
FOR XML PATH(''),TYPE).value('.','varchar(8000)')))V(Product,Store)