Question

我有一个日志表，其中包含varchar格式的异常和堆栈跟踪数据的列。

我想查询此日志表以获取类似异常的计数。

我如何将相似但不完全匹配的内容聚合在一起？

MyApp.MyCustomException: UserId 1 not found
MyApp.MyCustomException: UserId 2 not found
MyApp.MyCustomException: UserId 3 not found
MyApp.MyCustomException: UserId 1 login failed
MyApp.MyCustomException: UserId 2 login failed
MyApp.MyCustomException: UserId 3 login failed

上述6行应计为

"MyApp.MyCustomException: UserId not found" Count:3
"MyApp.MyCustomException: UserId login failed" Count:3

LEFT函数可用于上述简单示例，但不适用于NullReferenceException等异常，其中错误可能发生在代码中的几个不同位置。

编辑：更新的示例以更清楚地表示问题。

Answer 1

您可以尝试使用

patindex('%pattern%',column)

整个选择可能类似于

SELECT * FROM tbl
WHERE patindex('%MyApp.MyCustomException: % not found%',err)>0

确保在模式结束之前和之后不要忘记%。该函数将为您提供在列中找到模式的位置或0如果未找到。

请点击此处查看示例：http://sqlfiddle.com/#!3/1a70e/1

修改

可以像CTE一样完成

WITH msgs AS( SELECT err,CASE WHEN patindex('%MyApp.MyCustomException: % not found%',err)>0 THEN 1 WHEN patindex('%Wrong password for %, please try again%',err)>0 THEN 2 ELSE 0 END msgno FROM tbl ) SELECT msgno, MIN(err) msg1, COUNT(*) cnt FROM msgs GROUP BY msgno

见这里：http://sqlfiddle.com/#!3/9565c/2

<强> 2。编辑：

或者，以更一般的方式：

WITH pats as (SELECT 'UserId' pat -- define various patterns for UNION ALL SELECT 'IP' -- words to be removed after ... ), pos1 AS ( -- find position of pattern SELECT pat,err msg,patindex('%'+pat+'%',err)+len(pat) p1 FROM tbl,pats ), pos2 AS ( -- remove word after pattern SELECT LEFT(msg,p1) +'<'+pat+'> ' +SUBSTRING(msg,charindex(' ',SUBSTRING(msg,p1+1,256))+p1,256) msg FROM pos1 WHERE p1>len(pat) ), nonames AS ( -- find non-specific messages SELECT err FROM tbl WHERE NOT EXISTS (SELECT 1 FROM pos1 WHERE msg=err AND p1>len(pat)) ) SELECT msg, count(*) cnt FROM -- combine all, group and count ( SELECT msg FROM pos2 UNION ALL SELECT err FROM nonames ) m GROUP BY msg

在所有消息中，这将删除在多个预定义模式（pat）中的一个之后出现的第一个单词（=没有空格的字符序列）。这将使某种类型的消息看起来完全相同，因此可以对它们进行分组。

你可以在这里试试（我的最终解决方案）：http://sqlfiddle.com/#!3/a2fb9/4

Answer 2

我只会将like与case：

一起使用

select trace, count(*)
from (select l.*,
             (case when trace like 'MyApp.MyCustomException: UserId % not found'
                   then 'MyApp.MyCustomException: UserId not found'
                   when trace like 'MyApp.MyCustomException: UserId % login failed'
                   then 'MyApp.MyCustomException: UserId login failed'
                   else trace
              end) as canonical_tracer
      from log l
     ) l
group by trace;

Answer 3

这可能看起来很难看，但应该相对有效。我在分组之前使用replace来摆脱数字和额外的空格。看看：

WITH yourTable
AS
(
    SELECT *
    FROM
    (
        VALUES  ('MyApp.MyCustomException: UserId 1 not found'),
                ('MyApp.MyCustomException: UserId 2 not found'),
                ('MyApp.MyCustomException: UserId 3 not found'),
                ('MyApp.MyCustomException: UserId 1 login failed'),
                ('MyApp.MyCustomException: UserId 2 login failed'),
                ('MyApp.MyCustomException: UserId 3 login failed')
    ) A(col)
)

SELECT  generic_col,
        COUNT(*) AS cnt,
        'Count: ' + CAST(COUNT(*) AS VARCHAR(25)) AS formatted_cnt
FROM yourTable
CROSS APPLY (SELECT REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(col,'1',''),'2',''),'3',''),'4',''),'5',''),'6',''),'7',''),'8',''),'9',''),'0',''),'  ',' ')) AS CA(generic_col)
GROUP BY generic_col

有没有办法使用TSQL对字符串进行分组

3 个答案: