Question

每当我运行此代码时，它会向我显示并行部分所用的不同运行时间。我根据自己的核心尝试了一定数量的线程，但仍然无济于事。该程序是计算pi的值。使用gcc -fopenmp编译。

#include <stdio.h>
#include <omp.h>

static long num_steps = 100000; double step;
//double omp_get_wtime(void);

int main (){
      int i;
      double x,pi,max_threads,start,time;
      double sum=0.0;
      step = 1.0/(double) num_steps;
    //omp_set_num_threads(4);       
      omp_get_max_threads();
      start=omp_get_wtime();

    #pragma omp parallel
   {

    #pragma omp for reduction(+:sum) schedule(static) private(x) //reduction to get local copy
            for (i=0;i<num_steps;i++){
            x=(i+0.5)*step;
            sum += 4.0/(1.0+x*x);
            }
    //max_threads=omp_get_max_threads();
    }
time=omp_get_wtime()-start;
pi=step*sum;
printf("pi=(%f)\t run_time(%f)\n",pi,time);//,max_threads);
return 0;
 }

Answer 1

代码只运行几毫秒（在我的系统上运行2-6毫秒），时间正在开销，例如用于创建线程。串行版本运行<1毫秒。这种短的执行时间是非常可变的，因为它取决于系统的当前状态，例如，有一些'需要热身'。

在这种情况下，只需增加num_steps即可获得有意义的稳定结果。例如。使用num_steps = 1000000000，我的系统上有10个执行时间在4.332到4.399秒之间。

通常，如果进行性能测量，则应使用-O3标志进行编译。

改变OpenMP并行区域的运行时间

1 个答案: