sql中的逐字符比较字符串

时间:2018-03-11 20:33:43

标签: sql-server string tsql

如何比较字符串,使用T-SQL检查字符串是否包含相同的符号?

例如:

  • com.codename1.impl.android.LifecycleListener vs 'aaabbcd' TRUE ):两个字符串都包含相同的符号
  • 'ddbca' vs 'abcddd' FALSE ):两个字符串都不包含相同的符号

3 个答案:

答案 0 :(得分:3)

如果性能很重要,那么我会使用Ngrams8k建议一个基于集合的纯解决方案。

这会给你正确答案:

SELECT AllSame = COALESCE(MAX(0),1)
FROM dbo.ngrams8k(@string1, 1) ng1
FULL JOIN dbo.ngrams8k(@string2, 1) ng2 ON ng1.token = ng2.token
WHERE ng1.token IS NULL OR ng2.token IS NULL;

要对表使用此逻辑,您可以像这样使用CROSS APPLY:

-- Sample data
DECLARE @table TABLE (string1 varchar(100), string2 varchar(100));
INSERT @table VALUES ('aaabbcd','ddbca'),('abcddd','cda');

-- Solution using CROSS APPLY
SELECT * 
FROM @table t
CROSS APPLY
(
  SELECT AllSame = COALESCE(MAX(0),1)
  FROM dbo.ngrams8k(t.string1, 1) ng1
  FULL JOIN dbo.ngrams8k(t.string2, 1) ng2 ON ng1.token = ng2.token
  WHERE ng1.token IS NULL OR ng2.token IS NULL
) x;

结果:

string1   string2   AllSame
--------- --------- --------
aaabbcd   ddbca     1
abcddd    cda       0

这不仅是目前为止提供的最快的解决方案,请注意我们可以用尽可能少的代码完成工作。

更新至包括马丁史密斯解决方案的比较表现

-- sample data
IF OBJECT_ID('tempdb..#sample') IS NOT NULL DROP TABLE #sample;
SELECT TOP (10000)
  string1 = replicate('a',abs(checksum(newid())%5))+replicate('b',abs(checksum(newid())%4))+
            replicate('c',abs(checksum(newid())%5))+replicate('d',abs(checksum(newid())%4))+
            replicate('e',abs(checksum(newid())%5))+replicate('f',abs(checksum(newid())%4)),
  string2 = replicate('a',abs(checksum(newid())%5))+replicate('b',abs(checksum(newid())%4))+
            replicate('c',abs(checksum(newid())%5))+replicate('d',abs(checksum(newid())%4))+
            replicate('e',abs(checksum(newid())%5))+replicate('f',abs(checksum(newid())%4))
INTO #sample
FROM sys.all_columns a, sys.all_columns b;

SET NOCOUNT ON;
SET STATISTICS TIME ON;
PRINT 'ajb serial'+char(10)+replicate('-',50);
SELECT flag 
FROM #sample t
CROSS APPLY
(
  SELECT Flag = COALESCE(MAX(0),1)
  FROM dbo.ngrams8k(t.string1, 1) ng1
  FULL JOIN dbo.ngrams8k(t.string2, 1) ng2 ON ng1.token = ng2.token
  WHERE ng1.token IS NULL OR ng2.token IS NULL
) x
OPTION (MAXDOP 1);

PRINT 'ajb parallel'+char(10)+replicate('-',50);
SELECT flag 
FROM #sample t
CROSS APPLY
(
  SELECT Flag = COALESCE(MAX(0),1)
  FROM dbo.ngrams8k(t.string1, 1) ng1
  FULL JOIN dbo.ngrams8k(t.string2, 1) ng2 ON ng1.token = ng2.token
  WHERE ng1.token IS NULL OR ng2.token IS NULL
) x
OPTION (querytraceon 8649);

PRINT 'M Smith - serial'+char(10)+replicate('-',50);
WITH Nums AS 
(
  SELECT TOP (100) ROW_NUMBER() OVER ( ORDER BY (SELECT NULL)) number
  FROM sys.all_columns 
)
SELECT flag
FROM #sample T
CROSS APPLY (SELECT CASE WHEN Min(Cnt) = 2 THEN 1 ELSE 0 END AS Flag 
             FROM   (SELECT Count(*) AS Cnt 
                     FROM   (SELECT 1                           AS s, 
                                    Substring(t.string1, N1.number, 1) AS c 
                             FROM   Nums N1 
                             WHERE  N1.number <= Len(t.string1) 
                             UNION 
                             SELECT 2                           AS s, 
                                    Substring(t.string2, N2.number, 1) AS c 
                             FROM   Nums N2 
                             WHERE  N2.number <= Len(t.string2)) D1 
                     GROUP  BY c) D2 
             ) Ca 
OPTION (MAXDOP 1);
SET STATISTICS TIME OFF;

<强>结果:

ajb serial
--------------------------------------------------
 SQL Server Execution Times:
   CPU time = 656 ms,  **elapsed time = 660 ms**.

ajb parallel
--------------------------------------------------
 SQL Server Execution Times:
   CPU time = 1281 ms,  **elapsed time = 204 ms**.

M Smith serial
--------------------------------------------------
 SQL Server Execution Times:
   CPU time = 1390 ms,  **elapsed time = 1393 ms**.

请注意,我没有使用并行计划测试Martin的解决方案,因为该查询无法并行运行。

答案 1 :(得分:2)

内联方法。

这使用数字表

CREATE TABLE dbo.Numbers (number INT PRIMARY KEY);

INSERT INTO dbo.Numbers
SELECT TOP 8000 ROW_NUMBER() OVER (ORDER BY @@SPID)
FROM sys.all_columns c1, 
     sys.all_columns c2

如果您不想使用性能而不必使用性能,则编辑历史记录中会显示没有但效果较差的版本。

WITH T(S1, S2) 
     AS (SELECT 'aaabbcd', 
                'ddbca' 
         UNION ALL 
         SELECT 'abcddd', 
                'cda')
SELECT * 
FROM   T 
       CROSS APPLY (SELECT CASE WHEN Min(Cnt) = 2 THEN 1 ELSE 0 END AS Flag 
                    FROM   (SELECT Count(*) AS Cnt 
                            FROM   (SELECT 1                           AS s, 
                                           Substring(S1, N1.number, 1) AS c 
                                    FROM   dbo.Numbers N1 
                                    WHERE  N1.number <= Len(S1) 
                                    UNION 
                                    SELECT 2                           AS s, 
                                           Substring(S2, N2.number, 1) AS c 
                                    FROM   dbo.Numbers N2 
                                    WHERE  N2.number <= Len(S2)) D1 
                            GROUP  BY c) D2 
                    ) Ca 

答案 2 :(得分:1)

您可以使用此'%your-search-string%'来查找包含任何子字符串的字符串。

SELECT * FROM TableName
WHERE Name LIKE '%searchText%'

您可以使用存储过程检查字符串的字符。

CREATE PROCEDURE IsStringMatching
(
@originalString NVARCHAR(32) ,
@stringToBeChecked NVARCHAR(32),
@IsMatching BIT OUTPUT
)
AS
BEGIN
     DECLARE @inputStringCount INT = LEN(@originalString);
     DECLARE @loopCount INT = 0, @temp INT; 
     DECLARE @char VARCHAR;
     SET @IsMatching = 1
     WHILE @loopCount < @inputStringCount
        BEGIN
            SET @char = SUBSTRING(@originalString,@loopCount+1,1);
             SET @temp =  CHARINDEX(@char, @stringToBeChecked,1);
             IF(@temp = 0)
                BEGIN
                    SET @IsMatching = 0;
                    BREAK;
                END             
            SET @loopCount = @loopCount + 1;
        END;    
END

您可以这样验证:

DECLARE @IsMatching BIT;
SELECT EXECUTE IsStringMatchingQ 'aaabbcd', 'ABC';
SELECT @IsMatching