Question

我想运行以下代码（如下）。我想生成两个独立的线程，每个线程都会运行并行for循环。不幸的是，我收到了一个错误。显然，无法在for内生成并行section。怎么解决？

#include <omp.h>
#include "stdio.h"

int main()
{

omp_set_num_threads(10);

#pragma omp parallel    
#pragma omp sections
  {
#pragma omp section
#pragma omp for
    for(int i=0; i<5; i++) {
        printf("x %d\n", i);
    }

#pragma omp section
#pragma omp for
    for(int i=0; i<5; i++) {
        printf(". %d\n", i);
    }
  } // end parallel and end sections
}

错误：

main.cpp: In function ‘int main()’:
main.cpp:14:9: warning: work-sharing region may not be closely nested inside of work-sharing, critical, ordered, master or explicit task region [enabled by default]
main.cpp:20:9: warning: work-sharing region may not be closely nested inside of work-sharing, critical, ordered, master or explicit task region [enabled by default]

Answer 1

这里你必须使用嵌套并行。 omp for中sections的问题在于，范围内的所有线程都必须参与omp for，而且他们显然不会 - 它们按部分分解。所以你必须引入函数，并在函数中做嵌套的并行。

#include <stdio.h>
#include <omp.h>

void doTask1(const int gtid) {
    omp_set_num_threads(5);
#pragma omp parallel 
    {
        int tid = omp_get_thread_num();
        #pragma omp for
        for(int i=0; i<5; i++) {
            printf("x %d %d %d\n", i, tid, gtid);
        }
    }
}

void doTask2(const int gtid) {
    omp_set_num_threads(5);
#pragma omp parallel 
    {
        int tid = omp_get_thread_num();
        #pragma omp for
        for(int i=0; i<5; i++) {
            printf(". %d %d %d\n", i, tid, gtid);
        }
    }
}


int main()
{
    omp_set_num_threads(2);
    omp_set_nested(1);

#pragma omp parallel    
    {
        int gtid = omp_get_thread_num();
#pragma omp sections
        {
#pragma omp section
            doTask1(gtid);

#pragma omp section
            doTask2(gtid);
        } // end parallel and end sections
    }
}

Answer 2

OpenMP无法在并行区域内创建并行区域。这是因为OpenMP在程序开始时创建num_threads并行线程，在非并行区域中不使用其他并且休眠。他们这样做了，因为与醒来的睡眠线相比，频繁生成新线程的速度非常慢。

因此，您应该只对循环进行并行化：

#include <omp.h>
#include "stdio.h"

int main()
{

omp_set_num_threads(10);

#pragma omp parallel for
    for(int i=0; i<5; i++) {
        printf("x %d\n", i);
    }

#pragma omp parallel for
    for(int i=0; i<5; i++) {
        printf(". %d\n", i);
    }
}

Answer 3

实际上，最佳线程数等于可用CPU核心数。因此，每个并行应该在所有可用内核中处理，这在 omp部分中是不可能的。所以，你想要实现的目标并不是最优的。 tune2fs＆＃39;建议执行没有部分的两个循环是有意义的，并提供最佳的性能。你可以在另一个函数内部执行并行循环，但是这个＆＃34;作弊＆＃34;没有提高性能。

OpenMP，用于循环内部部分

3 个答案: