有没有办法使用TSQL对字符串进行分组

时间:2015-06-30 16:42:28

标签: sql sql-server-2008 tsql

我有一个日志表,其中包含varchar格式的异常和堆栈跟踪数据的列。

我想查询此日志表以获取类似异常的计数。

我如何将相似但不完全匹配的内容聚合在一起?

MyApp.MyCustomException: UserId 1 not found
MyApp.MyCustomException: UserId 2 not found
MyApp.MyCustomException: UserId 3 not found
MyApp.MyCustomException: UserId 1 login failed
MyApp.MyCustomException: UserId 2 login failed
MyApp.MyCustomException: UserId 3 login failed

上述6行应计为

"MyApp.MyCustomException: UserId not found" Count:3
"MyApp.MyCustomException: UserId login failed" Count:3

LEFT函数可用于上述简单示例,但不适用于NullReferenceException等异常,其中错误可能发生在代码中的几个不同位置。

编辑:更新的示例以更清楚地表示问题。

3 个答案:

答案 0 :(得分:3)

您可以尝试使用

patindex('%pattern%',column)

整个选择可能类似于

SELECT * FROM tbl
WHERE patindex('%MyApp.MyCustomException: % not found%',err)>0

确保在模式结束之前和之后不要忘记%。该函数将为您提供在列中找到模式的位置或0如果未找到。

请点击此处查看示例:http://sqlfiddle.com/#!3/1a70e/1

修改

可以像CTE一样完成

WITH msgs AS(
 SELECT err,CASE 
   WHEN patindex('%MyApp.MyCustomException: % not found%',err)>0 THEN 1
   WHEN patindex('%Wrong password for %, please try again%',err)>0 THEN 2
   ELSE 0 END msgno FROM tbl )
SELECT msgno, MIN(err) msg1, COUNT(*) cnt FROM msgs GROUP BY msgno

见这里:http://sqlfiddle.com/#!3/9565c/2

<强> 2。编辑:

或者,以更一般的方式:

WITH pats as (SELECT 'UserId' pat -- define various patterns for
    UNION ALL SELECT 'IP'         -- words to be removed after ...  
), pos1 AS (                      -- find position of pattern
 SELECT pat,err msg,patindex('%'+pat+'%',err)+len(pat) p1  FROM tbl,pats 
), pos2 AS (                      -- remove word after pattern
 SELECT LEFT(msg,p1)
   +'<'+pat+'> '
   +SUBSTRING(msg,charindex(' ',SUBSTRING(msg,p1+1,256))+p1,256) msg
 FROM pos1 WHERE p1>len(pat) 
), nonames AS (                  -- find non-specific messages
 SELECT err FROM tbl WHERE NOT EXISTS 
  (SELECT 1 FROM pos1 WHERE msg=err AND p1>len(pat))
)
SELECT msg, count(*) cnt FROM    -- combine all, group and count
( SELECT msg FROM pos2 UNION ALL SELECT err FROM nonames ) m
GROUP BY msg

在所有消息中,这将删除在多个预定义模式(pat)中的一个之后出现的第一个单词(=没有空格的字符序列)。这将使某种类型的消息看起来完全相同,因此可以对它们进行分组。

你可以在这里试试(我的最终解决方案):http://sqlfiddle.com/#!3/a2fb9/4

答案 1 :(得分:3)

我只会将likecase

一起使用
select trace, count(*)
from (select l.*,
             (case when trace like 'MyApp.MyCustomException: UserId % not found'
                   then 'MyApp.MyCustomException: UserId not found'
                   when trace like 'MyApp.MyCustomException: UserId % login failed'
                   then 'MyApp.MyCustomException: UserId login failed'
                   else trace
              end) as canonical_tracer
      from log l
     ) l
group by trace;

答案 2 :(得分:0)

这可能看起来很难看,但应该相对有效。我在分组之前使用replace来摆脱数字和额外的空格。看看:

WITH yourTable
AS
(
    SELECT *
    FROM
    (
        VALUES  ('MyApp.MyCustomException: UserId 1 not found'),
                ('MyApp.MyCustomException: UserId 2 not found'),
                ('MyApp.MyCustomException: UserId 3 not found'),
                ('MyApp.MyCustomException: UserId 1 login failed'),
                ('MyApp.MyCustomException: UserId 2 login failed'),
                ('MyApp.MyCustomException: UserId 3 login failed')
    ) A(col)
)

SELECT  generic_col,
        COUNT(*) AS cnt,
        'Count: ' + CAST(COUNT(*) AS VARCHAR(25)) AS formatted_cnt
FROM yourTable
CROSS APPLY (SELECT REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(col,'1',''),'2',''),'3',''),'4',''),'5',''),'6',''),'7',''),'8',''),'9',''),'0',''),'  ',' ')) AS CA(generic_col)
GROUP BY generic_col