搜索并替换字符串t-SQL

时间:2015-02-25 18:58:28

标签: sql sql-server tsql

每个人我都试图编写一个查询来替换最后出现的所有字符串。 我有一些噪音词(确切地说是104),如果它们出现在最后,需要从字符串中删除。

例如,两个噪音词是--Company,LLC

以下是一些示例和预期输出:

American Company, LLC --Expected output --American (both noise words should be removed)
American LLC,LLC --Expected output -- American
American Company American Company-- American Company American (one noise word occurs in between other words, so it should not be removed)

目前我有这个问题:

DECLARE @NEWSTRING VARCHAR(max) 
DECLARE @NEWSTRINGlength nvarchar(max)

SET @NEWSTRING = 'American Company American Company Company, LLC  LLC' ; 

SET @NEWSTRINGlength = len(@newstring)
SELECT @NEWSTRINGlength

CREATE TABLE #item (item Nvarchar(250) null)

INSERT INTO #item

SELECT 'Company' as item
UNION ALL 
SELECT 'LLC' as item

DECLARE @unwantedCharecters  VARCHAR(50) = '%[~,@,#,$,%,&,*,(,),.,!, ]%'

WHILE PATINDEX( @unwantedCharecters, @NEWSTRING ) > 0
SELECT @NEWSTRING = ltrim(rtrim(Replace(REPLACE( @NEWSTRING, SUBSTRING( @NEWSTRING, PATINDEX( @unwantedCharecters, @NEWSTRING ), 1 ),''),'-',' ')))

SELECT @NEWSTRING = substring(rtrim(@NEWSTRING), 1, len(@newstring) - len(ITEM)) FROM #item WHERE  rtrim(@NEWSTRING) LIKE '%' + ITEM

除非它们出现在其他词之间,否则应删除每个出现的噪音词。

1 个答案:

答案 0 :(得分:1)

这样可以解决问题:

WITH 
DirtyValues AS(
    SELECT * FROM (VALUES
          (1, 'American Company, LLC') --Expected output --American (both noise words should be removed)
        , (2, 'American LLC,LLC') --Expected output -- American
        , (3, 'American Company American Company')-- American Company American (one noise word occurs in between other words, so it should not be removed)
    ) AS T(ID, Dirty)
),
NoisyWords AS(
    SELECT * FROM (VALUES
          (' ') -- Just apend the chars to be filtered to your noise word list
        , (',')
        , ('LLC')
        , ('Company')
    ) AS T(Noisy)
),
DoSomeMagic AS(
    SELECT ID
         , Result = REVERSE(Dirty)
    FROM DirtyValues 
    UNION ALL 
    SELECT ID
         , Result = SUBSTRING(Result, DATALENGTH(Noisy)+1, DATALENGTH(Result))
    FROM DoSomeMagic
        CROSS APPLY(
            SELECT 
                  Noisy = REVERSE(Noisy)
            FROM NoisyWords
        ) AS T
    WHERE PATINDEX('%' + Noisy + '%', Result) = 1
),
PickBestResult AS(
    SELECT DoSomeMagic.ID
         , [clean as a whistle] = REVERSE(DoSomeMagic.Result)
         , [Rank]               = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY DATALENGTH(Result) ASC)
    FROM DoSomeMagic
)
SELECT *
FROM PickBestResult
WHERE [Rank] = 1

它的作用:

  • 前2个CTE是您的数据集,您当然希望为自己的表格更改它们。
  • DoSomeMagic是递归CTE,首先将字符串反转以便能够从末尾搜索然后交叉应用所有噪声字并检查字符串的现在开始是否以反向噪声字开始。如果是这样,请将其移除并继续前进,直到开始时没有发现噪音词。
  • PickBestResult然后[Rank]每行和最短的结果将获得Rank 1.