我什么时候应该终止MPI子进程?

时间:2014-04-14 04:05:11

标签: c++ mpi

我正在使用MPI在C ++中实现一个算法。有一些数据要处理。这是我的设计:

int main()
{
  MPI_Init();
  MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
  MPI_Comm_rank(MPI_COMM_WORLD, &nproc);

  MPI_Barrier(..);
  if(my_rank == 0)
  {
    for (each file to be processed)
      {
         Read in file content;
         MPI_Send data to child processes;
         process partial data on root process;
         MPI_Recv data processed by child processes;
         combine processed data from root and children;
      }
   }
   else
   {
      MPI_Recv data from root;
      process received data;
      MPI_Send processed data to root;
      MPI_Finalize();
    }

//only root process reaches here
MPI_Finalize();

}

当只有一个文件要处理时,程序运行完美。但是,如果我有多个文件要处理,它将停留在第二个文件。并且似乎没有子进程可用于从root接收新数据。我认为这是因为我在处理完第一个文件后终止子进程。但是如果我在else块中注释掉MPI_Finalize(),程序将在处理完第一个数据文件后退出并显示错误:

mpirun has exited due to process rank 1 with PID 2003 on
node c301-115 exiting improperly. There are three reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
orte_create_session_dirs is set to false. In this case, the run-time cannot
detect that the abort call was an abnormal termination. Hence, the only
error message you will receive is this one. 

在这种情况下,有没有办法为子进程重置MPI实例,哪里是完成子进程的最佳位置?

2 个答案:

答案 0 :(得分:1)

你需要第二个for循环让工人等待新的任务,这样他们就不会立即终止。

想想这样:你有N个人在同一时间工作。你以某种方式决定,其中一个是名字" 0"有特殊的工作,为所有其他人分配工作。你给出了每个人应该做的准确的说明。您编写的代码在代码中看起来像这样,这意味着

for(file in files)
   send job

对于名字为" 0"和

process one job

所有其他人。你希望其他人做的是:

for(file in files)
   process job

这应该反映在您的代码中,这可能与此类似:

int main()
{
  MPI_Init();
  MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
  MPI_Comm_rank(MPI_COMM_WORLD, &nproc);

  MPI_Barrier(..);
  if(my_rank == 0)
  {
    for (each file to be processed)
      {
         Read in file content;
         MPI_Send data to child processes;
         process partial data on root process;
         MPI_Recv data processed by child processes;
         combine processed data from root and children;
      }
   }
   else
   {
      for (each file to be processed)
        {
          MPI_Recv data from root;
          process received data;
          MPI_Send processed data to root;
          MPI_Finalize();
        }
    }

  //only root process reaches here
  MPI_Finalize();
}

旁注:您提议的极其等级结构有什么好处?如果每个工作线程都在自己的文件上工作会不会更好?

答案 1 :(得分:0)

请记住,您的流程需要保持同步状态。也就是说,对于每个发送都应该有一个接收等等,所以你需要在子进程中有一个循环,就像你在根进程中一样。

这样做的一种方法是在程序开始时让根进程将要处理的文件数发送给所有子进程。然后让它们循环次数与根进程一样多次。

相关问题