Question

如果我像这样使用嵌套并行for循环：

#pragma omp parallel for schedule(dynamic,1)
for (int x = 0; x < x_max; ++x) {
    #pragma omp parallel for schedule(dynamic,1)
    for (int y = 0; y < y_max; ++y) { 
    //parallelize this code here
   }
//IMPORTANT: no code in here
}

相当于：

for (int x = 0; x < x_max; ++x) {
    #pragma omp parallel for schedule(dynamic,1)
    for (int y = 0; y < y_max; ++y) { 
    //parallelize this code here
   }
//IMPORTANT: no code in here
}

外部并行是否可以执行除创建新任务以外的任何操作？

Answer 1

如果您的编译器支持OpenMP 3.0，则可以使用collapse子句：

#pragma omp parallel for schedule(dynamic,1) collapse(2)
for (int x = 0; x < x_max; ++x) {
    for (int y = 0; y < y_max; ++y) { 
    //parallelize this code here
    }
//IMPORTANT: no code in here
}

如果不支持（例如，仅支持OpenMP 2.5），则有一个简单的解决方法：

#pragma omp parallel for schedule(dynamic,1)
for (int xy = 0; xy < x_max*y_max; ++xy) {
    int x = xy / y_max;
    int y = xy % y_max;
    //parallelize this code here
}

您可以使用omp_set_nested(1);启用嵌套并行性，并且嵌套的omp parallel for代码可以使用，但这可能不是最佳选择。

顺便问一下，为什么动态调度呢？是否在非恒定时间内评估每个循环迭代？

Answer 2

NO。

第一个#pragma omp parallel将创建一个并行线程团队，第二个将尝试为每个原始线程创建另一个团队，即团队团队。但是，在几乎所有现有实现中，第二个团队只有一个线程：第二个并行区域基本上没有使用。因此，您的代码更像是等同于

#pragma omp parallel for schedule(dynamic,1)
for (int x = 0; x < x_max; ++x) {
    // only one x per thread
    for (int y = 0; y < y_max; ++y) { 
        // code here: each thread loops all y
    }
}

如果你不想这样，但只是内部循环，你可以这样做：

#pragma omp parallel
for (int x = 0; x < x_max; ++x) {
    // each thread loops over all x
#pragma omp for schedule(dynamic,1)
    for (int y = 0; y < y_max; ++y) { 
        // code here, only one y per thread
    }
}

openMP嵌套并行for循环vs内部并行for

2 个答案: