多行正则表达式确实选择了太多或太少的东西

时间:2013-10-03 11:47:23

标签: c# regex multiline

我正在尝试编写一个正则表达式来“解析”某种日志文件。结构如下所示:

2013-09-05 00:01:14.5726 WEB Info [n/a: UPN Claim] New instance of service created.
2013-09-05 00:01:14.6038 WEB Info [n/a: UPN Claim] 
---------------- [ Ping received ] -------------
CurrentPrincipel has Claims:
Claim Type: http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name with Value: GROUP\User
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/primarysid with Value: S-1-5-21-36134387-561137642-176895030-23737
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/primarygroupsid with Value: S-1-5-21-36134387-561137642-174895030-513
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-21-36134387-561137642-176892330-513
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-1-0
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-32-545
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-2
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-11
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-15
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-21-36134387-561137642-326895030-16415
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-21-36134387-561137642-1732895030-31127
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-21-36134387-561137642-176235030-12815
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-21-36134387-561137642-176892330-12145
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-21-36134387-561137642-176895430-31228
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-21-36134387-561137642-176892330-16100
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-64-10

2013-09-05 00:01:15.0406 WEB Info [n/a: UPN Claim] New instance of service created.
2013-09-05 00:01:15.0718 WEB Info [n/a: UPN Claim] GetLangugesChanges invoked for rowVersion 1 and 8 clientChanges.

我喜欢使用我的正则表达式来获得日志条目的4个匹配项(每个条目= 1个匹配项)。我尝试了以下正则表达式,但无法获得没有前导时间戳的行:

(^\d{4}-\d{2}-\d{2}(.|^|$)*(?=>^\d{4})*)

正则表达式的调用方式如下:

string input = File.ReadAllText(@"log.txt");
MatchCollection matches = Regex.Matches(input, @"(^\d{4}-\d{2}-\d{2}(.|^|$)*(?=>^\d{4})*)", RegexOptions.Multiline);
foreach (Match match in matches)
{
    Console.WriteLine("{0}", match.Value);
    Console.WriteLine("------");
}

给出的输出是:

2013-09-05 00:01:14.5726 WEB Info [n/a: UPN Claim] New instance of service created. ------
2013-09-05 00:01:14.6038 WEB Info [n/a: UPN Claim]  ------
2013-09-05 00:01:15.0406 WEB Info [n/a: UPN Claim] New instance of service created. ------
2013-09-05 00:01:15.0718 WEB Info [n/a: UPN Claim] GetLangugesChanges invoked for rowVersion 1 and 8 clientChanges.
------

预期的输出是:

2013-09-05 00:01:14.5726 WEB Info [n/a: UPN Claim] New instance of service created.
------
2013-09-05 00:01:14.6038 WEB Info [n/a: UPN Claim] 
---------------- [ Ping received ] -------------
CurrentPrincipel has Claims:
Claim Type: http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name with Value: GROUP\User
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/primarysid with Value: S-1-5-21-36134387-561137642-176895030-23737
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/primarygroupsid with Value: S-1-5-21-36134387-561137642-174895030-513
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-21-36134387-561137642-176892330-513
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-1-0
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-32-545
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-2
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-11
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-15
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-21-36134387-561137642-326895030-16415
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-21-36134387-561137642-1732895030-31127
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-21-36134387-561137642-176235030-12815
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-21-36134387-561137642-176892330-12145
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-21-36134387-561137642-176895430-31228
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-21-36134387-561137642-176892330-16100
Claim Type: http://schemas.microsoft.com/ws/2008/06/identity/claims/groupsid with Value: S-1-5-64-10
------
2013-09-05 00:01:15.0406 WEB Info [n/a: UPN Claim] New instance of service created.
------
2013-09-05 00:01:15.0718 WEB Info [n/a: UPN Claim] GetLangugesChanges invoked for rowVersion 1 and 8 clientChanges.
------

我做错了什么?我感谢任何帮助!

2 个答案:

答案 0 :(得分:1)

如果我理解你的问题,这应该会有所帮助。

(?<=\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{4} )(.*)

示例:

 2013-09-05 00:01:14.6038 VSWEB04 Info [n/a: UPN Claim] 

匹配:

VSWEB04 Info [n/a: UPN Claim] 

答案 1 :(得分:1)

您可以尝试匹配两个日期(或文件末尾)之间的所有内容,如下所示:

^\d{4}-\d{2}-\d{2}.*?(?=^\d{4}-\d{2}-\d{2}|(?!.))

RegexOptions.Multiline | RegexOptions.Singleline一起使用。

另一种选择是使用正则表达式分割字符串:

var regex = new Regex(@"^\d{4}-\d{2}-\d{2}", RegexOptions.Multiline);
string[] result = regex.Split(input);