Question

我是MPI的新手，我受命使用它来计算大小为L * M的非常大的矩阵，这是较大代码体的一部分。我们从0迭代到L，执行一些计算，然后将M个值保存到结果矩阵中。 L上每次迭代的结果彼此独立，并且是我的代码中最大的时间瓶颈，因此将其并行化似乎是一个合理的选择。

我要并行化的代码归结为：

//preceding code

int M = 12;
for (int i = 0; i < L; i++) //START OF PARALLEL CHUNK
{
    funkystuffresults = dosomefunkystuff(i);

    for (int j = 0; j < M; j++)
    {
        resultmatrix[i*M + j] = funkystuffresults
    }
}//END OF PARALLEL CHUNK

由于此块之前的代码的性质，我不确定运行时之前的L值，并且几乎可以肯定地将其除以进程数。根据我的研究，我觉得MPI_Scatterv是在此处使用的理想函数。

我最初尝试使用此代码：

    MPI_Comm_size(MPI_COMM_WORLD, &world_size);
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
    int sendcounts[world_size];
    int displs[world_size];
    vector<int> rec_buf(L);
    int rem = L%world_size;
    int counttester = 0;
    for (int i = 0; i < world_size; i++)
    {
        sendcounts[i] = L/world_size;
        if (rem > 0)
        {
            sendcounts[i]++;
            rem--;
        }
        displs[i] = counttester;
        counttester += sendcounts[i];
    }
    vector<int> paralleli(L);
    for (int i = 0; i < L; i++)
    {
        paralleli[i] = i;
    }
    MPI_Scatterv(&paralleli, sendcounts, displs, MPI_INT, &rec_buf, L, MPI_INT, 0, MPI_COMM_WORLD);
    for (int i = 0; i < sendcounts[world_rank]; i++) //START OF PARALLEL CHUNK
    {
         //previously mentioned code
    }

对于我来说，这似乎很麻烦，因为我认为这是一种非常常见的问题。要使用此代码，需要大量修改我代码中的“笨拙的东西”，因此我希望有更好的方法来做到这一点。使用scatterv（大概是事后收集v）是否理想？我希望即使没有使用并行化也可以使用一种实现。

具有MPI的Scatterv / Gatherv返回大向量

0 个答案: