Question

我试图为以下规则编写正则表达式：

字符1-3必须为数字
字符4必须为'P'
字符5必须为alpha
字符6-12必须为数字
字符13必须为数字或“X”

这些构成了帐户的办公室参考，用于会计目的。到目前为止，我有以下内容：

^\d{3}P[A-Z]{1}\d{7}$

要完成正则表达式，我只需说＆＃34;任何单个数字或字母X＆＃34;，但我不太确定如何去做。我试过了\d{1}[X]，但它期待一个数字和一个字母。

有什么想法吗？

Answer 1

试试这个：

^\d{3}P[A-Z]\d{7}[0-9X]$

character group [0-9X]将匹配单个数字字符或X（除非{1}以外的显式量词 - 例如{2} - 跟随它。

<强>附录：

正如@sln所指出的那样，最好在给定的正则表达式中以0-9或\d（不混合两者）结算以保持一致性 - 换句话说使用......

^\d{3}P[A-Z]\d{7}[\dX]$

...或...

^[0-9]{3}P[A-Z]\d{7}[0-9X]$

......在这种情况下。

效果

关于糟糕的正则表现形式的评论之后，这些担忧被大大夸大了。

这是一个快速的理智检查...

void Main()
{
    // Quick sanity check.

    string str = "111PH1234567X";

    Stopwatch stopwatch = Stopwatch.StartNew();

    for (int i = 0; i < 1000000; i++)
    {
        if (str.Substring(0, 3).All(char.IsDigit)           //first 3 are digits
               && str[3] == 'P'                             //4th is P
               && char.IsLetter(str[4])                     //5th is a letter
               && str.Substring(5, 7).All(char.IsDigit)     //6-12 are digits 
               && char.IsDigit(str[12]) || str[12] == 'X')  //13 is a digit or X
       {
           ;
           //Console.WriteLine("good");
       }
    }

    Console.WriteLine(stopwatch.Elapsed);

    stopwatch = Stopwatch.StartNew();

    Regex regex = new Regex(@"^\d{3}P[A-Z]\d{7}[0-9X]$", RegexOptions.Compiled);
    for (int j = 0; j < 1000000; j++)
    {
        regex.IsMatch(str);
    }

    Console.WriteLine(stopwatch.Elapsed + " (regexp)");

    // A bit more rigorous sanity check.

    string[] strs = { "111PH1234567X", "grokfoobarbaz", "really, really, really, really long string that does not match", "345BA7654321Z" };

    Stopwatch stopwatch2 = Stopwatch.StartNew();

    for (int i = 0; i < strs.Length; i++)
    {
        for (int j = 0; j < 1000000; j++)
        {
            if (strs[i].Substring(0, 3).All(char.IsDigit)           //first 3 are digits
                && strs[i][3] == 'P'                                //4th is P
                && char.IsLetter(strs[i][4])                        //5th is a letter
                && strs[i].Substring(5, 7).All(char.IsDigit)        //6-12 are digits 
                && char.IsDigit(strs[i][12]) || strs[i][12] == 'X') //13 is a digit or X
            {
                ;
                //Console.WriteLine("good");
            }
        }
    }

    Console.WriteLine(stopwatch2.Elapsed);

    stopwatch2 = Stopwatch.StartNew();

    Regex regex2 = new Regex(@"^\d{3}P[A-Z]\d{7}[0-9X]$", RegexOptions.Compiled);
    for (int i = 0; i < strs.Length; i++)
    {
        for (int j = 0; j < 1000000; j++)
        {
            regex2.IsMatch(strs[i]);
        }
    }

    Console.WriteLine(stopwatch2.Elapsed + " (regexp)");
}

......在我不起眼的机器上产生以下内容：

00:00:00.2134404
00:00:00.4527271 (regexp)
00:00:00.4872452
00:00:00.9534147 (regexp)

regexp方法似乎慢了约2倍。与任何事情一样，人们需要考虑对他们的用例，规模等有意义的事情。就个人而言，我支持Donald Knuth，从"premature optimization is the root of all evil"开始，并且只在需要时做出性能驱动的选择。

Answer 2

我可以选择基本方法而不是正则表达式。

这是一种白名单方法：

var str = "111PH1234567X";

if (str.Substring(0, 3).All(char.IsDigit)           //first 3 are digits
       && str[3] == 'P'                             //4th is P
       && char.IsLetter(str[4])                     //5th is a letter
       && str.Substring(5, 7).All(char.IsDigit)     //6-12 are digits 
       && char.IsDigit(str[12]) || str[12] == 'X')  //13 is a digit or X
   {
       Console.WriteLine("good");
   }

您可能需要根据条件添加字符串长度检查。

与正则表达式方法相比，运行此次100万次表明，在最坏情况下（str有效，检查每个条件），速度提高4倍。把它扔掉那里。

正则表达帐户办公室参考

2 个答案:

效果