mpi4py:如果通信进程在不同的计算机上运行,​​isend的行为会很奇怪

时间:2018-09-24 01:29:07

标签: python mpi mpi4py

我正在测试mpi4py支持的非阻塞通信,并且遇到了isend的意外行为(至少对我而言):出于某种原因,以isend发送的消息直到发送过程完成或调用wait返回的请求实例的isend方法,这使isend的有效性无效。

仅当进程在不同的计算机上运行时,才会观察到此行为。

代码:

from mpi4py import MPI
import socket
from time import sleep,time


comm = MPI.COMM_WORLD
rank = comm.Get_rank()
node=socket.gethostname()


print 'rank {} on {}'.format(rank,node)
if rank == 1:
    message=1
    req_send=comm.isend(message, dest=0, tag=11)
    #req_send.wait() #no issue if uncommented
    sleep(10)
    print 'sending process finished'

elif rank == 0:
    t=time()
    data=comm.recv(source=1, tag=11)
    print 'message recieved: {}, waiting time: {}'.format( data,time()-t)

结果/输出:

1 。不同的机器,#req_send.wait()行被注释掉(错误的设置;仅在发送过程完成后才收到消息,这会使等待时间增加10秒时间):

rank 0 on node1
rank 1 on node2
sending process finished
message recieved: 1, waiting time: 10.0342979431

2 。不同的机器,req_send.wait()行未注释:

rank 0 on node1
rank 1 on node2
message recieved: 1, waiting time: 0.000602006912231
sending process finished

3 。同一台机器,带有或不带有req_send.wait()行:

rank 1 on node1
rank 0 on node1
message recieved: 1, waiting time: 2.09808349609e-05
sending process finished

我为ancaconda2尝试了多个mpi4py构建,并且行为类似。但是,使用 anaconda mpi4py 错误的设置会导致其他错误出现在输出中:

rank 0 on node1
rank 1 on node2
sending process finished
message recieved: 1, waiting time: 10.0120418072
Assertion failed in file ch3u_handle_connection.c at line 332: vc->state == MPIDI_VC_STATE_LOCAL_CLOSE || vc->state == MPIDI_VC_STATE_CLOSE_ACKED
internal ABORT - process 1

什么可能导致此问题?如何解决/解决?

0 个答案:

没有答案