如何调试上下文切换的增加

时间:2018-06-18 16:08:01

标签: performance linux-kernel performance-testing performancecounter perf

我正在使用Linux perf分析两个版本的大型应用程序。其中一个版本的性能可重现性降低。问题是影响运行大约需要10分钟才能完成并运行

perf stat

可以看出上下文切换的数量存在很大差异:

    2759681,344820      task-clock (msec)         #    4,089 CPUs utilized          
         1.976.068      context-switches          #    0,716 K/sec                  
           288.370      cpu-migrations            #    0,104 K/sec                  
         1.065.076      page-faults               #    0,386 K/sec                  
 9.600.316.147.196      cycles                    #    3,479 GHz                    
 9.608.308.311.681      instructions              #    1,00  insn per cycle         
 1.847.613.212.847      branches                  #  669,502 M/sec                  
    29.342.163.081      branch-misses             #    1,59% of all branches        

     674,891697479 seconds time elapsed

    3045676,296012      task-clock (msec)         #    4,093 CPUs utilized          
        22.156.426      context-switches          #    0,007 M/sec                  
           385.364      cpu-migrations            #    0,127 K/sec                  
         1.066.383      page-faults               #    0,350 K/sec                  
10.505.321.454.387      cycles                    #    3,449 GHz                    
 9.723.994.869.100      instructions              #    0,93  insn per cycle         
 1.869.145.049.594      branches                  #  613,704 M/sec                  
    30.241.815.060      branch-misses             #    1,62% of all branches        

     744,170941002 seconds time elapsed

运行

perf record -e context-switches -ag -T

提供以下系统调用

  Children      Self       Samples  Command          Shared Object                   Symbol 
+   44,06%    44,06%        170846  swapper          [kernel.kallsyms]               [k] schedule_idle 
+   33,07%    33,07%        127004  Thread (pooled)  [kernel.kallsyms]               [k] schedule

  Children      Self       Samples  Command          Shared Object                 Symbol 
+   49,02%    49,02%        958827  swapper          [kernel.kallsyms]             [k] schedule_idle 
+   43,96%    43,96%        855603  Thread (pooled)  [kernel.kallsyms]             [k] schedule

因此,样本数量的差异几乎是一个数量级。我的问题是我如何进一步调查这个问题,因为我可以访问这两个版本的源代码,但它很大而且我不太了解它?

更新

问题是锁定,我可以通过运行gdb发现它,在执行过程中中断,捕获系统调用并打印回溯

(gdb) catch syscall
(gdb) bt

perf报告的信息是

  Children      Self  Command          Shared Object                 Symbol                                                                                                                                                                                                              ◆
-   49,02%    49,02%  swapper          [kernel.kallsyms]             [k] schedule_idle                                                                                                                                                                                                   ▒
   - secondary_startup_64                                                                                                                                                                                                                                                                ▒
      - 42,82% start_secondary                                                                                                                                                                                                                                                           ▒
           cpu_startup_entry                                                                                                                                                                                                                                                             ▒
           do_idle                                                                                                                                                                                                                                                                       ▒
           schedule_idle                                                                                                                                                                                                                                                                 ▒
      + 6,20% x86_64_start_kernel


-   43,96%    43,96%  Thread (pooled)  [kernel.kallsyms]             [k] schedule                                                                                                                                                                                                        ▒
   - 43,32% syscall                                                                                                                                                                                                                                                                      ▒
      - 43,32% entry_SYSCALL_64_after_hwframe                                                                                                                                                                                                                                            ▒
           do_syscall_64                                                                                                                                                                                                                                                                 ▒
           sys_futex                                                                                                                                                                                                                                                                     ▒
           do_futex                                                                                                                                                                                                                                                                      ▒
           futex_wait                                                                                                                                                                                                                                                                    ▒
           futex_wait_queue_me                                                                                                                                                                                                                                                           ▒
           schedule             

这不是很有用,因为它不打印谁进行了系统调用。使用GDB的步骤适用于我的情况,但可能很乏味。你知道任何跟踪toool,或者在这种情况下有用的选项吗? Brendan Gregg的博客http://www.brendangregg.com/blog/2015-07-08/choosing-a-linux-tracer.html上有一系列工具,但我对它们没有多少经验。

0 个答案:

没有答案