正则表达式贪婪量词的前瞻性问题

时间:2014-11-04 21:11:17

标签: regex greedy lookahead quantifiers

需要支持以下格式

3位数后跟可选空格,后跟在以下字符集ACERV中指定的三个非重复字符(空格仅在两个字符之间有效)

有效格式:

123
123 A
123 A v
123 CER

格式无效:

123A
123 AA
123 A  - when followed by a space

到目前为止我所做的事情 - 我可能会因为不一定需要的前瞻而将其复杂化:

^([0-9]{3})                                         # - first 3 digits
 (\s(?=[ACERV]))([ACERV])                           # - allow space only when followed by ACERV
 (?!\3)(?=[ACERV ]{0,1})([ACERV ]{0,1})             # - do not allow 1st char to repeat
 (?!\3)                                             # - do not allow 1st char to repeat
 (?!\4)                                             # - do not allow 2nd to repeat
 (?!\s)                                             # - do not allow trailing space
 (?=[ACERV]{0,1})([ACERV]{0,1})|[0-9]{3}$

当添加前瞻(?!\ 4)时,它无法匹配有效格式123 A - 将(?!\ 4)上的量词修改为(?!\ 4)*或(?!\ 4) ?允许123 A匹配,但允许重复第一个或第二个字符。

3 个答案:

答案 0 :(得分:1)

不完全确定要求,这适用于您的样品。

 # ^(?i)\d{3}(?:[ ](?:([ACERV])[ ]?(?![ACERV ]*\1)){1,3}(?<![ ]))?$

 ^                      # BOL
 (?i)                   # Case insensitive modifier
 \d{3}                  # 3 digits
 (?:                    # Cluster grp, character block (optional)
      [ ]                    # Space, required
      (?:                    # Cluster grp
           ( [ACERV] )            # (1), Capture single character [ACERV]
           [ ]?                   # [ ], optional
           (?!                    # Negative lookahead
                [ACERV ]*              # As many [ACERV] or [ ] needed
                \1                     # to find what is captured in group 1
                                       # Found it, the assertion fails
           )                      # End Negative lookahead
      ){1,3}                 # End Cluster grp, gets 1-3 [ACERV] characters
      (?<! [ ] )             # No dangling [ ] at end
 )?                     # End Cluster grp, character block (optional)
 $                      # EOL  

更新 - 已调整以替换lookbehind。

 # ^(?i)\d{3}(?!.*[ ]$)(?:[ ](?:([ACERV])[ ]?(?![ACERV ]*\1)){1,3})?$

 ^                      # BOL
 (?i)                   # Case insensitive modifier
 \d{3}                  # 3 digits
 (?! .* [ ] $ )         # No dangling [ ] at end
 (?:                    # Cluster grp, character block (optional)
      [ ]                    # Space, required
      (?:                    # Cluster grp
           ( [ACERV] )            # (1), Capture single character [ACERV]
           [ ]?                   # [ ], optional
           (?!                    # Negative lookahead
                [ACERV ]*              # As many [ACERV] or [ ] needed
                \1                     # to find what is captured in group 1
                                       # Found it, the assertion fails
           )                      # End Negative lookahead
      ){1,3}                 # End Cluster grp, gets 1-3 [ACERV] characters
 )?                     # End Cluster grp, character block (optional)
 $                      # EOL

答案 1 :(得分:0)

正则表达式怎么样

^\d{3}(?:$|\s)(?:([ACERV])(?!\1)|\s(?!$|\1))*$

将匹配字符串

123
123 A
123 A V
123 CER

了解正则表达式如何在http://regex101.com/r/mW5qZ9/9

进行计算
  • ^将正则表达式固定在字符串的开头

  • \d{3}匹配任何数字的3次出现

  • (?:$|\s)匹配字符串$或空格的结尾,\s

  • (?:\s?([ACERV])(?!\1)){0,3}匹配[ACERV]

    中的非重复字符
    • (?: )非捕获组

    • \s?可选空间

    • ([ACERV])匹配班级中的字符

    • (?:([ACERV])(?!\1)|\s(?!$|\1))断言正则表达式后面没有\1,最近捕获的字符。确保字符不重复。

      • (?!\1)断言字符类后面不能重复字符

      • \s(?!$|\1))断言如果它是一个空格,那么它不能跟着一个字符串的结尾或重复来自\1的字符

    • {0,3}量词指定最小出现次数为零,最大出现次数为3

  • $将正则表达式锚定在字符串的末尾。

答案 2 :(得分:0)

一个计划是拉开字符串的简单正则表达式,然后是第二步验证字符不重复。

// check all characters in a string are unique,
// by ensuring that each character is its own first appearance
function unique_characters(str) {
    return str.split('').every(function(chr, i, chrs) {
        return chrs.indexOf(chr) === i;
    });
}

// check that the code is valid
function valid_code(str) {
    var spacepos = str.indexOf(' ');
    return unique_characters(str) &&
        (spacepos === -1 || (spacepos === 1 && str.length === 3));
}

// check basic format and pull out code portion
function check_string(str) {
    var matches = str.match(/^\d{3} ?([ACERV ]{0,3})$/i);
    valid = matches && valid_code(matches[1]);
    return valid;
}

>> inputs = ['123', '123 A', '123 A v', '123 CER', '123A', '123 AA', '123 A ']
[true, true, true, true, true, false, false]

第四个测试用例显示为有效,因为如果空格确实是可选的,那么如果123 A有效,那么123A似乎也是有效的。

这种方法的一个可能的优点是,如果引入了额外的验证规则,它们可以比在巨大的正则表达式内部更容易实现。