JIT优化:为什么它会变慢,我该如何改进它?

时间:2018-05-01 14:44:13

标签: c# .net performance jit

我一直在寻找使用JIT的方法内联并登陆this article by Scott Hanselman。我进一步使用了他的代码,似乎虽然当代码在Release模式下运行时只有几个调用堆栈,但实际上似乎运行好像这些额外的帧仍然存在于已编译的代码中(即使它们确实存在)不报告如此)。

首先,如果您想要跳转并运行它,我已将代码放在此处: https://github.com/Mike-EEE/StackOverflow.Performance

我在.NET 4.7.1,.NET Core 2.0,甚至是最近宣布的新.NET Core 2.1 Preview上尝试过这个。所有都具有相同的结果。

我所做的是创建一个发出消息的简单命令,然后创建一个后续的多重修饰命令,该命令多次包装这个简单的命令。在已发布的代码中,此装饰完成10次,从而产生具有10个级别的嵌套命令(如果计算origin simple命令,则为11)。

测试中使用的这两个命令都使用空委托来发出消息,因为在性能测试期间使用import android.graphics.Canvas; import android.graphics.Path; import android.graphics.drawable.Drawable; import com.github.mikephil.charting.animation.ChartAnimator; import com.github.mikephil.charting.data.Entry; import com.github.mikephil.charting.interfaces.dataprovider.LineDataProvider; import com.github.mikephil.charting.interfaces.datasets.ILineDataSet; import com.github.mikephil.charting.renderer.LineChartRenderer; import com.github.mikephil.charting.utils.Transformer; import com.github.mikephil.charting.utils.ViewPortHandler; import java.util.List; public class MyLineLegendRenderer extends LineChartRenderer { MyLineLegendRenderer(LineDataProvider chart, ChartAnimator animator, ViewPortHandler viewPortHandler) { super(chart, animator, viewPortHandler); } // This method is same as its parent implementation. (Required so our version of generateFilledPath() is called.) @Override protected void drawLinearFill(Canvas c, ILineDataSet dataSet, Transformer trans, XBounds bounds) { final Path filled = mGenerateFilledPathBuffer; final int startingIndex = bounds.min; final int endingIndex = bounds.range + bounds.min; final int indexInterval = 128; int currentStartIndex; int currentEndIndex; int iterations = 0; // Doing this iteratively in order to avoid OutOfMemory errors that can happen on large bounds sets. do { currentStartIndex = startingIndex + (iterations * indexInterval); currentEndIndex = currentStartIndex + indexInterval; currentEndIndex = currentEndIndex > endingIndex ? endingIndex : currentEndIndex; if (currentStartIndex <= currentEndIndex) { generateFilledPath(dataSet, currentStartIndex, currentEndIndex, filled); trans.pathValueToPixel(filled); final Drawable drawable = dataSet.getFillDrawable(); if (drawable != null) { drawFilledPath(c, filled, drawable); } else { drawFilledPath(c, filled, dataSet.getFillColor(), dataSet.getFillAlpha()); } } iterations++; } while (currentStartIndex <= currentEndIndex); } // This method defines the perimeter of the area to be filled for horizontal bezier data sets. @Override protected void drawCubicFill(Canvas c, ILineDataSet dataSet, Path spline, Transformer trans, XBounds bounds) { final float phaseY = mAnimator.getPhaseY(); //Call the custom method to retrieve the dataset for other line final List<Entry> boundaryEntries = ((MyFillFormatter)dataSet.getFillFormatter()).getFillLineBoundary(); // We are currently at top-last point, so draw down to the last boundary point Entry boundaryEntry = boundaryEntries.get(bounds.min + bounds.range); spline.lineTo(boundaryEntry.getX(), boundaryEntry.getY() * phaseY); // Draw a cubic line going back through all the previous boundary points Entry prev = dataSet.getEntryForIndex(bounds.min + bounds.range); Entry cur = prev; for (int x = bounds.min + bounds.range; x >= bounds.min; x--) { prev = cur; cur = boundaryEntries.get(x); final float cpx = (prev.getX()) + (cur.getX() - prev.getX()) / 2.0f; spline.cubicTo( cpx, prev.getY() * phaseY, cpx, cur.getY() * phaseY, cur.getX(), cur.getY() * phaseY); } // Join up the perimeter spline.close(); trans.pathValueToPixel(spline); final Drawable drawable = dataSet.getFillDrawable(); if (drawable != null) { drawFilledPath(c, spline, drawable); } else { drawFilledPath(c, spline, dataSet.getFillColor(), dataSet.getFillAlpha()); } } // This method defines the perimeter of the area to be filled for straight-line (default) data sets. private void generateFilledPath(final ILineDataSet dataSet, final int startIndex, final int endIndex, final Path outputPath) { final float phaseY = mAnimator.getPhaseY(); final Path filled = outputPath; // Not sure if this is required, but this is done in the original code so preserving the same technique here. filled.reset(); //Call the custom method to retrieve the dataset for other line final List<Entry> boundaryEntries = ((MyFillFormatter)dataSet.getFillFormatter()).getFillLineBoundary(); final Entry entry = dataSet.getEntryForIndex(startIndex); final Entry boundaryEntry = boundaryEntries.get(startIndex); // Move down to boundary of first entry filled.moveTo(entry.getX(), boundaryEntry.getY() * phaseY); // Draw line up to value of first entry filled.lineTo(entry.getX(), entry.getY() * phaseY); // Draw line across to the values of the next entries Entry currentEntry; for (int x = startIndex + 1; x <= endIndex; x++) { currentEntry = dataSet.getEntryForIndex(x); filled.lineTo(currentEntry.getX(), currentEntry.getY() * phaseY); } // Draw down to the boundary value of the last entry, then back to the first boundary value Entry boundaryEntry1; for (int x = endIndex; x > startIndex; x--) { boundaryEntry1 = boundaryEntries.get(x); filled.lineTo(boundaryEntry1.getX(), boundaryEntry1.getY() * phaseY); } // Join up the perimeter filled.close(); } } 会变得相当丑陋。

在运行测试之前,我确实创建了一个使用与测试代码相同的代码的修饰命令,但是使用Console.WriteLine来验证当前执行环境中的堆栈跟踪,而不是空委托。

在Debug中,此堆栈跟踪如下所示:

Console.WriteLine

在发布中,它看起来像这样:

   at StackOverflow.Performance.EmitMessage.Emit(String message)
   at StackOverflow.Performance.EmitMessage.MethodC(String message)
   at StackOverflow.Performance.EmitMessage.MethodB(String message)
   at StackOverflow.Performance.EmitMessage.MethodA(String message)
   at StackOverflow.Performance.EmitMessage.Execute(String message)
   at StackOverflow.Performance.DecoratedCommand.Execute(String message)
   at StackOverflow.Performance.DecoratedCommand.Execute(String message)
   at StackOverflow.Performance.DecoratedCommand.Execute(String message)
   at StackOverflow.Performance.DecoratedCommand.Execute(String message)
   at StackOverflow.Performance.DecoratedCommand.Execute(String message)
   at StackOverflow.Performance.DecoratedCommand.Execute(String message)
   at StackOverflow.Performance.DecoratedCommand.Execute(String message)
   at StackOverflow.Performance.DecoratedCommand.Execute(String message)
   at StackOverflow.Performance.DecoratedCommand.Execute(String message)
   at StackOverflow.Performance.DecoratedCommand.Execute(String message)
   at StackOverflow.Performance.Program.Main()

到目前为止,一切看起来都很棒,而且正是我所期待的。但是,然后我通过BenchmarkDotNet执行这两个命令,以查看性能设置中的结果。这些结果似乎表明装饰命令的调用链是完整执行的,即使发出的堆栈跟踪表明不存在这样的调用链:

   at StackOverflow.Performance.EmitMessage.Emit(String message)
   at StackOverflow.Performance.Program.Main()

所以,这里似乎有超过2帧正在执行,这使我在StackOverflow上发布了这个问题。我对此有几个问题:

  1. 我的代码是否存在根本不准确的内容?这将是令人难以置信的令人尴尬,但我想首先清除掉明显的东西。 :)
  2. 如果我的代码和结果确实准确,那么:这是一个已知问题吗?和/或这是按设计执行的?
  3. 我的假设是这是正在使用的尾调用优化。是不是也在这里进行内联方法?我想我的基本问题是:究竟正在使用这些意外未优化的结果进行优化?
  4. 最重要的是:无论如何都要确保并实现我想要的优化结果?传递给根代表的任何魔法在这里都很有价值。似乎根代理是已正确解析,而不是正确调用
  5. 为了完整起见,以下是运行此示例的所有代码:

    // * Summary *
    
    BenchmarkDotNet=v0.10.14, OS=Windows 10.0.16299.371 (1709/FallCreatorsUpdate/Redstone3)
    Intel Core i7-4820K CPU 3.70GHz (Haswell), 1 CPU, 8 logical and 8 physical cores
    .NET Core SDK=2.1.300-preview2-008533
      [Host]     : .NET Core 2.0.7 (CoreCLR 4.6.26328.01, CoreFX 4.6.26403.03), 64bit RyuJIT
      DefaultJob : .NET Core 2.0.7 (CoreCLR 4.6.26328.01, CoreFX 4.6.26403.03), 64bit RyuJIT
    
    
        Method |      Mean |     Error |    StdDev |
    ---------- |----------:|----------:|----------:|
        Direct |  3.581 ns | 0.0759 ns | 0.0710 ns |
     Decorated | 44.646 ns | 0.7701 ns | 0.7203 ns |
    

    提前感谢您提供的任何见解/帮助!

1 个答案:

答案 0 :(得分:1)

here

无耻地复制Stephen Toub的作品
  

我刚看了一下装饰委员会的反汇编程序使用核对的coreclr构建并使用setCOMPlus_JitDisasm=Execute运行,请参阅documentation。实际上它正在使用尾调用:

     
    

方法DecoratedCommand:Execute(ref)的汇编列表:此

  
     

使用AVX发送X64 CPU的BLENDED_CODE

     

优化代码

     

基于rsp的框架

     

完全可以中断

     

最终的局部变量分配

     

V00 [V00,T00](3,3)ref - > rcx这个类-hnd

     

V01 arg1 [V01,T01](3,3)ref - &gt; rdx class-hnd

     

;#V02 OutArgs [V02](1,1)lclBlk(0)[rsp + 0x00]

     

Lcl帧大小= 0

     

G_M223_IG01:

     

G_M223_IG02:

     

488B4908 mov rcx,gword ptr [rcx + 8]

     

49BB48007733FD7F0000 mov r11,0x7FFD33770048

     

488B05934FE5FF mov rax,qword ptr [(reloc)]

     

3909 cmp dword ptr [rcx],ecx

     

G_M223_IG03:

     

48FFE0 rex.jmp rax