为什么此正则表达式不匹配?

时间:2018-12-17 07:47:22

标签: regex perl

my $genlog_line_1= qr{
   \A
   (?:(\d{6}\s+\d{1,2}:\d\d:\d\d|\d{4}-\d{1,2}-\d{1,2}T\d\d:\d\d:\d\d\.\d+(?:Z|-?\d\d:\d\d)?))? # Timestamp
   \s+
   (?:\s*(\d+))                     # Thread ID
   \s
   (\w+)                            # Command
   \s+
   (.*)                             # Argument
   \Z
}xs;

my $line = "2018-12-14T17:32:52.236100+08:00        477637459 Query SELECT dv.mandatory,dv.optional FROM dbversion dv";

my ($ts, $thread_id, $cmd, $arg) = $line =~ m/$genlog_line_1/;

print $ts, $thread_id, $cmd, $arg;

为什么正则表达式不匹配?我期望的是:

Timestamp 2018-12-14T17:32:52.236100
thread_id 477637459 
cmd Query 
arg  SELECT dv.mandatory,dv.optional FROM dbversion dv

2 个答案:

答案 0 :(得分:4)

您在输入中输入了+08:00,但是在-?中的(?:Z|-?\d\d:\d\d)?仅说明了一个负值或无符号的值。

因此,在第一个正则表达式行上,应将-?替换为[+-]?,以匹配可选的- +。另外,由于+08:00部分不应该属于组1,因此我建议使用分支重置组 (?|...|...)将组内的不同部分捕获到同一组中,第1组:

(?|(\d{6}\s+\d{1,2}:\d\d:\d\d)|(\d{4}-\d{1,2}-\d{1,2}T\d\d:\d\d:\d\d\.\d+)(?:Z|[-+]?\d\d:\d\d)?)?
 ^^^                         ^ ^                                         ^     ^^^^         

固定模式:

my $genlog_line_1= qr{
   \A
   (?|(\d{6}\s+\d{1,2}:\d\d:\d\d)|(\d{4}-\d{1,2}-\d{1,2}T\d\d:\d\d:\d\d\.\d+)(?:Z|[-+]?\d\d:\d\d)?)? # Timestamp
   \s+
   (?:\s*(\d+))                     # Thread ID
   \s
   (\w+)                            # Command
   \s+
   (.*)                             # Argument
   \Z
}xs;

请参见regex demo

请注意,如果输入中始终存在TIMESTAMP,则在分支复位组之后的?可能不是必需的。

答案 1 :(得分:0)

您的正则表达式的主要问题是它没有考虑到+08:00中存在的$line

将其更改为:

\A(?:(\d{6}\s+\d{1,2}:\d\d:\d\d|\d{4}-\d{1,2}-\d{1,2}T\d\d:\d\d:\d\d\.\d+(?:Z|-?\d\d:\d\d)?))?(?:\+\d\d:\d\d)?\s+(?:\s*(\d+))\s+(\w+)\s+(.*)\Z

演示:

https://regex101.com/r/fgRCv1/3