导致死锁的MPI_COMM_SPAWN

时间:2017-07-25 17:14:34

标签: fortran mpi

我有一个需要产生的MPI程序A然后等待不同的MPI程序B完成。然后我需要生成并再次等待程序B.

计划A

      IF (rank .eq. 0) THEN 

         CALL MPI_COMM_SPAWN('prog_b', MPI_ARGV_NULL, size,                &
     &                       MPI_INFO_NULL, 0, MPI_COMM_SELF,              &
     &                       child_comm, MPI_ERRCODES_IGNORE, status)
         WRITE (*,*) 'Parent 1 Before'
         CALL MPI_BARRIER(child_comm, status)
         WRITE (*,*) 'Parent 1 After'

... Change some things ...

         CALL MPI_COMM_SPAWN('prog_b', MPI_ARGV_NULL, size,                &
     &                       MPI_INFO_NULL, 0, MPI_COMM_SELF,              &
     &                       child_comm, MPI_ERRCODES_IGNORE, status)
         WRITE (*,*) 'Parent 2 Before'
         CALL MPI_BARRIER(child_comm, status)
         WRITE (*,*) 'Parent 2 After'

      END IF

计划B

... Wait to finished ...

      CALL MPI_COMM_GET_PARENT(parent_comm, error)
      IF (parent_comm .ne. MPI_COMM_NULL) THEN
         WRITE (*,*) 'Before'
         CALL MPI_BARRIER(parent_comm, error)
         WRITE (*,*) 'After'
      END IF

... Finalize ...

当我运行它时,程序B的第一次产生工作正常。但在第二轮比赛中,两个项目在第二道障碍上陷入僵局。我每次都会产生16个程序b的实例。

输出

Parent Before 1

... Output of program b ...

Before
Before
Before
Before
Before
Before
Before
Before
Before
Before
Before
Before
Before
After
After
Before
After
Before
After
After
After
After
After
Before
After
After
Parent After 1
After
After
After
After
After
After

... Second call to spawn ...

Parent Before 2

... Output of program b ...

Before
Before
Before
Before
Before
Before
Before
Before
Before
Before
Before
Before
Before
Before
Before
Before

正如您所看到的,每个过程都会使它超越第一道屏障,但第二次失去锁定。我尝试在第一次产生调用后断开父节点和子节点的连接。我尝试合并父和子通信并在它们上面调用屏障,但似乎没有解决这个死锁问题。

0 个答案:

没有答案