什么是逃避这个字符串的最佳方法“\ b [A-Z0-9 ._%+ - ] + @ [A-Z0-9 .-] + \。[A-Z] {2,4} \ b”在java中

时间:2012-05-08 04:25:39

标签: java regex string escaping

我正在使用此字符串作为正则表达式"\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b" - 我用它来检测电子邮件地址。

我想知道逃避它的最佳方法是什么。

我尝试了很多变化,例如。

\\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,4}\\b
\\\\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\\\\.[A-Z]{2,4}\\\\b

我在@Match注释中使用正则表达式,所以我认为我不能使用StringEscapeUtils。代码是使用Play框架用Java编写的。但我想这只是一个关于转义Java字符串的问题。

 public static void signup(
        @Match( value=("\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b"), 
            message="Hey there, we need a real email address so we can send you an invite. Thanks :)") String email){

        if(validation.hasErrors()) {
            params.flash(); // add http parameters to the flash scope
            validation.keep(); // keep the errors for the next request
            index();
        }
        else{
                Email mail = new Email();
                String[] to = {"myemail@me.com", "myemail@gmail.com"};
                mail.sendMessage(to, "beta signup", email);
                thanks();
        }
    }

3 个答案:

答案 0 :(得分:2)

试试这个:

此正则表达式实现了电子邮件地址的官方RFC 2822标准。出于一般目的,它可能很有用。

\b(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])\b

<强>解释

<!--
\b(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])\b

Options: case insensitive; ^ and $ match at line breaks

Assert position at a word boundary «\b»
Match the regular expression below «(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")»
   Match either the regular expression below (attempting the next alternative only if this one fails) «[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*»
      Match a single character present in the list below «[a-z0-9!#$%&'*+/=?^_`{|}~-]+»
         Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
         A character in the range between “a” and “z” «a-z»
         A character in the range between “0” and “9” «0-9»
         One of the characters “!#$%&'*+/=?^_`{|}” «!#$%&'*+/=?^_`{|}»
         The character “~” «~»
         The character “-” «-»
      Match the regular expression below «(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*»
         Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
         Match the character “.” literally «\.»
         Match a single character present in the list below «[a-z0-9!#$%&'*+/=?^_`{|}~-]+»
            Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
            A character in the range between “a” and “z” «a-z»
            A character in the range between “0” and “9” «0-9»
            One of the characters “!#$%&'*+/=?^_`{|}” «!#$%&'*+/=?^_`{|}»
            The character “~” «~»
            The character “-” «-»
   Or match regular expression number 2 below (the entire group fails if this one fails to match) «"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*"»
      Match the character “"” literally «"»
      Match the regular expression below «(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*»
         Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
         Match either the regular expression below (attempting the next alternative only if this one fails) «[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]»
            Match a single character present in the list below «[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]»
               A character in the range between ASCII character 0x01 (1 decimal) and ASCII character 0x08 (8 decimal) «\x01-\x08»
               ASCII character 0x0b (11 decimal) «\x0b»
               ASCII character 0x0c (12 decimal) «\x0c»
               A character in the range between ASCII character 0x0e (14 decimal) and ASCII character 0x1f (31 decimal) «\x0e-\x1f»
               ASCII character 0x21 (33 decimal) «\x21»
               A character in the range between ASCII character 0x23 (35 decimal) and ASCII character 0x5b (91 decimal) «\x23-\x5b»
               A character in the range between ASCII character 0x5d (93 decimal) and ASCII character 0x7f (127 decimal) «\x5d-\x7f»
         Or match regular expression number 2 below (the entire group fails if this one fails to match) «\\[\x01-\x09\x0b\x0c\x0e-\x7f]»
            Match the character “\” literally «\\»
            Match a single character present in the list below «[\x01-\x09\x0b\x0c\x0e-\x7f]»
               A character in the range between ASCII character 0x01 (1 decimal) and ASCII character 0x09 (9 decimal) «\x01-\x09»
               ASCII character 0x0b (11 decimal) «\x0b»
               ASCII character 0x0c (12 decimal) «\x0c»
               A character in the range between ASCII character 0x0e (14 decimal) and ASCII character 0x7f (127 decimal) «\x0e-\x7f»
      Match the character “"” literally «"»
Match the character “@” literally «@»
Match the regular expression below «(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])»
   Match either the regular expression below (attempting the next alternative only if this one fails) «(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?»
      Match the regular expression below «(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+»
         Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
         Match a single character present in the list below «[a-z0-9]»
            A character in the range between “a” and “z” «a-z»
            A character in the range between “0” and “9” «0-9»
         Match the regular expression below «(?:[a-z0-9-]*[a-z0-9])?»
            Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
            Match a single character present in the list below «[a-z0-9-]*»
               Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
               A character in the range between “a” and “z” «a-z»
               A character in the range between “0” and “9” «0-9»
               The character “-” «-»
            Match a single character present in the list below «[a-z0-9]»
               A character in the range between “a” and “z” «a-z»
               A character in the range between “0” and “9” «0-9»
         Match the character “.” literally «\.»
      Match a single character present in the list below «[a-z0-9]»
         A character in the range between “a” and “z” «a-z»
         A character in the range between “0” and “9” «0-9»
      Match the regular expression below «(?:[a-z0-9-]*[a-z0-9])?»
         Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
         Match a single character present in the list below «[a-z0-9-]*»
            Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
            A character in the range between “a” and “z” «a-z»
            A character in the range between “0” and “9” «0-9»
            The character “-” «-»
         Match a single character present in the list below «[a-z0-9]»
            A character in the range between “a” and “z” «a-z»
            A character in the range between “0” and “9” «0-9»
   Or match regular expression number 2 below (the entire group fails if this one fails to match) «\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\]»
      Match the character “[” literally «\[»
      Match the regular expression below «(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}»
         Exactly 3 times «{3}»
         Match the regular expression below «(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)»
            Match either the regular expression below (attempting the next alternative only if this one fails) «25[0-5]»
               Match the characters “25” literally «25»
               Match a single character in the range between “0” and “5” «[0-5]»
            Or match regular expression number 2 below (attempting the next alternative only if this one fails) «2[0-4][0-9]»
               Match the character “2” literally «2»
               Match a single character in the range between “0” and “4” «[0-4]»
               Match a single character in the range between “0” and “9” «[0-9]»
            Or match regular expression number 3 below (the entire group fails if this one fails to match) «[01]?[0-9][0-9]?»
               Match a single character present in the list “01” «[01]?»
                  Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
               Match a single character in the range between “0” and “9” «[0-9]»
               Match a single character in the range between “0” and “9” «[0-9]?»
                  Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
         Match the character “.” literally «\.»
      Match the regular expression below «(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)»
         Match either the regular expression below (attempting the next alternative only if this one fails) «25[0-5]»
            Match the characters “25” literally «25»
            Match a single character in the range between “0” and “5” «[0-5]»
         Or match regular expression number 2 below (attempting the next alternative only if this one fails) «2[0-4][0-9]»
            Match the character “2” literally «2»
            Match a single character in the range between “0” and “4” «[0-4]»
            Match a single character in the range between “0” and “9” «[0-9]»
         Or match regular expression number 3 below (attempting the next alternative only if this one fails) «[01]?[0-9][0-9]?»
            Match a single character present in the list “01” «[01]?»
               Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
            Match a single character in the range between “0” and “9” «[0-9]»
            Match a single character in the range between “0” and “9” «[0-9]?»
               Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
         Or match regular expression number 4 below (the entire group fails if this one fails to match) «[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+»
            Match a single character present in the list below «[a-z0-9-]*»
               Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
               A character in the range between “a” and “z” «a-z»
               A character in the range between “0” and “9” «0-9»
               The character “-” «-»
            Match a single character present in the list below «[a-z0-9]»
               A character in the range between “a” and “z” «a-z»
               A character in the range between “0” and “9” «0-9»
            Match the character “:” literally «:»
            Match the regular expression below «(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+»
               Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
               Match either the regular expression below (attempting the next alternative only if this one fails) «[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]»
                  Match a single character present in the list below «[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]»
                     A character in the range between ASCII character 0x01 (1 decimal) and ASCII character 0x08 (8 decimal) «\x01-\x08»
                     ASCII character 0x0b (11 decimal) «\x0b»
                     ASCII character 0x0c (12 decimal) «\x0c»
                     A character in the range between ASCII character 0x0e (14 decimal) and ASCII character 0x1f (31 decimal) «\x0e-\x1f»
                     A character in the range between ASCII character 0x21 (33 decimal) and ASCII character 0x5a (90 decimal) «\x21-\x5a»
                     A character in the range between ASCII character 0x53 (83 decimal) and ASCII character 0x7f (127 decimal) «\x53-\x7f»
               Or match regular expression number 2 below (the entire group fails if this one fails to match) «\\[\x01-\x09\x0b\x0c\x0e-\x7f]»
                  Match the character “\” literally «\\»
                  Match a single character present in the list below «[\x01-\x09\x0b\x0c\x0e-\x7f]»
                     A character in the range between ASCII character 0x01 (1 decimal) and ASCII character 0x09 (9 decimal) «\x01-\x09»
                     ASCII character 0x0b (11 decimal) «\x0b»
                     ASCII character 0x0c (12 decimal) «\x0c»
                     A character in the range between ASCII character 0x0e (14 decimal) and ASCII character 0x7f (127 decimal) «\x0e-\x7f»
      Match the character “]” literally «\]»
Assert position at a word boundary «\b»
-->

答案 1 :(得分:0)

您可以在这里找到RFC 2822

http://www.ietf.org/rfc/rfc2822.txt

[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?:[A-Z]{2}|com|org|net|edu|gov|mil|biz|info|mobi|name|aero|asia|jobs|museum)\b

答案 2 :(得分:0)

我不会进入“这是正确的电子邮件正则表达式”的事情,只有一句话:您的正则表达式不会接受所有有效的电子邮件地址。请参阅评论中的link BalusC给您的信息。

关于逃避。 Java需要双重转义,因为它首先将正则表达式作为字符串,并在字符串创建期间处理所有转义序列。所以,只要逃避所有反斜杠,因为它们需要在替换后存在。

\\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,4}\\b

字符类末尾的破折号不需要转义。