这个RegEx的含义

时间:2012-08-14 09:28:05

标签: javascript jquery regex jquery-plugins

我不太熟悉正则表达式,所以我需要一些帮助。我正在使用jQuery dynacloud插件,当正则表达式匹配发生时,该插件会在代码中的标识点处中断。我需要有人帮我弄清楚这个正则表达式匹配

/^[a-z\xE4\xF6\xFC]*[A-Z\xC4\xD6\xDC]([A-Z\xC4\xD6\xDC\xDF]+|[a-z\xE4\xF6\xFC\xDF]{3,}

请帮助!!

5 个答案:

答案 0 :(得分:1)

我建议你看看Expresso,因为你错过了右括号,这就是结果:

enter image description here

答案 1 :(得分:1)

^开头一行

[...]一类可能的字符

a-z范围(abcde ... yz)

\xE4 char的十六进制值(“ascii”代码)。

在n和m次出现之间

{n,m}

*相当于{0,}

+相当于{1,}

答案 2 :(得分:1)

\x**部分转换为特殊字符,如果你替换那些你基本上得到的字符:

/^[a-zäöü]*[A-ZÄÖÜ]([A-ZÄÖÜß]+|[a-zäöüß]{3,})/

我会把它分开给你:

^字符串

的开头

[a-zäöü] characterset:任意字符从a到z或äöü*零次或多次

[A-ZÄÖÜ] characterset:从A到Z的任何角色或只是一次

(小组开始

[A-ZÄÖÜß]另一个字符集,你现在应该得到:) +一次或多次

|

[a-zäöüß] characterset,{3,} 3次或更多次

)小组结尾

另外,你最后错过了一个)/,开头和结尾的/意味着中间是正则表达式。

答案 3 :(得分:0)

假设这是你的正则表达式:

/^[a-z\xE4\xF6\xFC]*[A-Z\xC4\xD6\xDC]([A-Z\xC4\xD6\xDC\xDF]+|[a-z\xE4\xF6\xFC\xDF]{3,})/

以下是正则表达式的解释:

"^" +                              // Assert position at the beginning of a line (at beginning of the string or after a line break character)
"[a-z\xE4\xF6\xFC]" +              // Match a single character present in the list below
                                      // A character in the range between “a” and “z”
                                      // ASCII character 0xE4 (228 decimal)
                                      // ASCII character 0xF6 (246 decimal)
                                      // ASCII character 0xFC (252 decimal)
   "*" +                              // Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
"[A-Z\xC4\xD6\xDC]" +              // Match a single character present in the list below
                                      // A character in the range between “A” and “Z”
                                      // ASCII character 0xC4 (196 decimal)
                                      // ASCII character 0xD6 (214 decimal)
                                      // ASCII character 0xDC (220 decimal)
"(" +                              // Match the regular expression below and capture its match into backreference number 1
                                      // Match either the regular expression below (attempting the next alternative only if this one fails)
      "[A-Z\xC4\xD6\xDC\xDF]" +          // Match a single character present in the list below
                                            // A character in the range between “A” and “Z”
                                            // ASCII character 0xC4 (196 decimal)
                                            // ASCII character 0xD6 (214 decimal)
                                            // ASCII character 0xDC (220 decimal)
                                            // ASCII character 0xDF (223 decimal)
         "+" +                              // Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   "|" +                              // Or match regular expression number 2 below (the entire group fails if this one fails to match)
      "[a-z\xE4\xF6\xFC\xDF]" +          // Match a single character present in the list below
                                            // A character in the range between “a” and “z”
                                            // ASCII character 0xE4 (228 decimal)
                                            // ASCII character 0xF6 (246 decimal)
                                            // ASCII character 0xFC (252 decimal)
                                            // ASCII character 0xDF (223 decimal)
         "{3,}" +                           // Between 3 and unlimited times, as many times as possible, giving back as needed (greedy)
")"  

答案 4 :(得分:0)

我会假设正则表达式中缺少的)/只是你的一个割伤错误;他们出现在DynaCloud source code。什么存在是一个结束锚($),我觉得这很令人惊讶。这是相关的代码:

var elems = jQuery(this).text()
            .replace(/[^A-Z\xC4\xD6\xDCa-z\xE4\xF6\xFC\xDF0-9_]/g, ' ')
            .replace(jQuery.dynaCloud.stopwords, ' ')
            .split(' ');
var word = 
  /^[a-z\xE4\xF6\xFC]*[A-Z\xC4\xD6\xDC]([A-Z\xC4\xD6\xDC\xDF]+|[a-z\xE4\xF6\xFC\xDF]{3,})/;

第一个语句过滤掉不需要的字符,但仅留下数字和下划线。第二个语句尝试匹配由ASCII字母组成的单词以及(例如)德语中使用的一些非ASCII字母。但是,一旦匹配的字母用完,就可以继续匹配任何字符,而不仅仅是第一个正则表达式中列出的字符。此外,单词中的任何数字或下划线都会导致单词分成两个或多个单词。

我会尝试在最后锚定正则表达式并添加对数字和下划线的支持,如下所示:

/^[a-z\xE4\xF6\xFC]*[A-Z\xC4\xD6\xDC]([A-Z\xC4\xD6\xDC\xDF0-9_]+|[a-z\xE4\xF6\xFC\xDF0-9_]{3,})$/g

此正则表达式仅用于说明目的;它不是一个解决方案。首先,我对数字和下划线的位置做了一个疯狂的猜测。另一方面,它现在可以匹配以数字和下划线结尾的单词,您可能不希望这样。