可选地匹配文字字符串

时间:2011-01-20 19:08:01

标签: regex

我正在使用以下正则表达式匹配并捕获字符串weather in foo bar

weather in ([a-z]+|[0-9]{5})\s?([a-zA-Z]+)?

哪个匹配并捕获,bar是可选的,foo可以是城市或拉链。

但是,我很乐意让用户写weather in foo for bar,因为我自己不小心写了几次。有没有办法可以选择性地捕获像for这样的文字字符串,而不必诉诸\s?f?o?r?\s?

2 个答案:

答案 0 :(得分:6)

将其放入非捕获组:(?:\sfor\s)?

答案 1 :(得分:1)

要保持3个捕获组的完整性,需要多做一些工作 这可能有点先进,但这是断言有用的好例子。

/weather\s+in\s+([[:alpha:]]+|\d{5})\s*((?<=\s)for(?=\s|$)|)\s*((?<=\s)[[:alpha:]]+|)/

Perl中的测试用例:

use strict;
use warnings;

my @samples = (
 'this is  the weather in 12345 forever',
 'this is  the weather in 32156 for ',
 'this is  the weather in 32156 for today',
 'this is  the weather in abcdefghijk for',
 'this is  the weather in abcdefghijk ',
 'this is  the weather in abcdefghijk end',
);

my $regex = qr/
  weather \s+ in \s+    # a literal string with some whitespace's
   (                    # Group 1
       [[:alpha:]]+        # City (alpha's), but without spaces
     | \d{5}               # OR, zip code (5 digits)
   )                    # end group 1
   \s*                  # optional whitespace's
   (                    # Group 2
       (?<=\s)             # must be a whitespace behind us
       for                 # literal 'for'
       (?=\s|$)            # in front of us must be a whitespace or string end 
     |                     # OR, match NOTHING
   )                    # end group 2
   \s*                  # optional whitespace's
   (                    # Group 3
       (?<=\s)             # must be a whitespace behind us
       [[:alpha:]]+        # 1 or more alpha's
     |                     # OR, match NOTHING
   )                    # end group 3
 /x;

for (@samples) {
    if (/$regex/x ) {
        print "'$1',  '$2',  '$3'\n";
    }
}

输出:

'12345', '', 'forever'
'32156', 'for', ''
'32156', 'for', 'today'
'abcdefghijk', 'for', ''
'abcdefghijk', '', ''
'abcdefghijk', '', 'end'