如何提高此LINQ查询的速度?

时间:2012-02-02 17:20:18

标签: c# linq linq-to-xml

我使用long.TryParse,但切换到正则表达式。目前,123 + K消息总共需要7+毫秒。 7+毫秒是从XElement.Parse到foreach方法的结尾。

Stopwatch s1 =Stopwatch.StartNew();
XElement element = XElement.Parse(xml);    

string pattern = @"\b\d+\b";
Regex r = new Regex(pattern);

IEnumerable<XElement> elementsWithPossibleCCNumbers = element
    .Descendants()
    .Where(d => d.Attributes()
        .Where(a => a.Value.Length >= 13 &&
               a.Value.Length <= 16 &&
               r.IsMatch(a.Value)).Count() == 1)
    .Select(x => x);

foreach(var x in elementsWithPossibleCCNumbers)
{
    foreach(var a in x.Attributes())
    {
        //Check if the value is a number
        if(r.IsMatch(a.Value))
        {
            //Check if value is the credit card
            if(a.Value.Length >= 13 && a.Value.Length <= 16)
            {
                a.Value = Regex.Replace(a.Value, @"\b\d{13,16}\b", match => 
                    new String('*', match.Value.Length - 4) +
                    match.Value.Substring(match.Value.Length - 4)
                );
        }
        else //If value is not a credit card, replace it with ***
                a.Value = Regex.Replace(a.Value ,@"\b\d+\b", "***");
        }
    }
}

xml = element.ToString();
s1.Stop();

XElement.Parse(xml);需要2到3毫秒。

LINQ查询需要0.004 - 0.005 ms。

foreach语句需要4到5毫秒。

2 个答案:

答案 0 :(得分:1)

您似乎正在进行两次搜索和替换:

  1. 将每个CC编号替换为*和最后4位
  2. 使用*替换同一元素上的任何其他“CC-ish”号
  3. 一种方法是让XLinq更难为你工作:

    // you're not using the elements, ignore them, just get the attributes
    foreach (var atr in xelt.Descendants()
                            .Where(e => e.Attributes()
                                         .Any(a => a.Value.Length >= 13
                                                && a.Value.Length <= 16))
                            .SelectMany(e => e.Attributes()))
    {
        // static basicDigits = new Regex(@"\b\d+\b", RegexOptions.Compiled);
        // static ccDigits = new Regex(@"\b\d{13,16}\b", RegexOptions.Compiled);
        if (ccDigits.IsMatch(atr.Value))
        {
             atr.Value = ccDigits.Replace(
                 atr.Value,
                 mm => new String('*', mm.Value.Length - 4)
                       + mm.Value.Substring(mm.Value.Length - 4));
        }
        else
        {
            atr.Value = basicDigits.Replace(atr.Value, "***");
        }
    }
    
    // using 150k XML (1k nodes/5k attrs, 3 attr/node avg, avg depth 4 nodes)
    // with 10% match rate:
    // - 25.7 MB/s (average 100 trials)
    // - 61 attributes/ms
    

    示例输入XML:

    <item f1="abc123abc" f2="helloooo 1234567" f3="abc123abc">
         <item f1="abc123abc" f2="helloooo 1234567" f3="abc123abc" real1="4444555566667777" />
         <item f1="abc123abc" f2="helloooo 1234567" f3="abc123abc" />
         ruBTMjSesurMsP6lK2jg
     </item>
    

    输出:

    <item f1="abc123abc" f2="helloooo 1234567" f3="abc123abc">
         <item f1="abc123abc" f2="helloooo ***" f3="abc123abc" real1="************7777" />
         <item f1="abc123abc" f2="helloooo 1234567" f3="abc123abc" />
         ruBTMjSesurMsP6lK2jg
    </item>
    

答案 1 :(得分:0)

您可能需要考虑预编译正则表达式。这篇文章:http://en.csharp-online.net/CSharp_Regular_Expression_Recipes%E2%80%94Compiling_Regular_Expressions解释了编译正则表达式的优点和缺点。