我遇到了一个问题,虽然有效,但现在却没有。
我运行一个OpenMPI程序,在两台计算机之间进行tau分析。似乎mpirun无法在远程主机上运行tau_exec程序,也许这是一个权限问题?
cluster@master:~/software/mpi_in_30_source/test2$ mpirun -np 2 --hostfile hostfile -d tau_exec -v -T MPI,TRACE,PROFILE ./hello.exe
[master:19319] procdir: /tmp/openmpi-sessions-cluster@master_0/4568/0/0
[master:19319] jobdir: /tmp/openmpi-sessions-cluster@master_0/4568/0
[master:19319] top: openmpi-sessions-cluster@master_0
[master:19319] tmp: /tmp
[slave2:06777] procdir: /tmp/openmpi-sessions-cluster@slave2_0/4568/0/1
[slave2:06777] jobdir: /tmp/openmpi-sessions-cluster@slave2_0/4568/0
[slave2:06777] top: openmpi-sessions-cluster@slave2_0
[slave2:06777] tmp: /tmp
[master:19319] [[4568,0],0] node[0].name master daemon 0 arch ff000200
[master:19319] [[4568,0],0] node[1].name slave2 daemon 1 arch ff000200
[slave2:06777] [[4568,0],1] node[0].name master daemon 0 arch ff000200
[slave2:06777] [[4568,0],1] node[1].name slave2 daemon 1 arch ff000200
[master:19319] Info: Setting up debugger process table for applications
MPIR_being_debugged = 0
MPIR_debug_state = 1
MPIR_partial_attach_ok = 1
MPIR_i_am_starter = 0
MPIR_proctable_size = 2
MPIR_proctable:
(i, host, exe, pid) = (0, master, /home/cluster/software/mpi_in_30_source/test2/tau_exec, 19321)
(i, host, exe, pid) = (1, slave2, /home/cluster/software/mpi_in_30_source/test2/tau_exec, 0)
--------------------------------------------------------------------------
mpirun was unable to launch the specified application as it could not find an executable:
Executable: tau_exec
Node: slave2
while attempting to start process rank 1.
--------------------------------------------------------------------------
[slave2:06777] sess_dir_finalize: job session dir not empty - leaving
[slave2:06777] sess_dir_finalize: job session dir not empty - leaving
[master:19319] sess_dir_finalize: job session dir not empty - leaving
[master:19319] sess_dir_finalize: proc session dir not empty - leaving
orterun: exiting with status -123
在slave2上:
cluster@slave2:~/software/mpi_in_30_source/test2$ tau_exec -T MPI,TRACE,PROFILE ./hello.exe
hello MPI user: from process = 0 on machine=slave2, of NCPU=1 processes
cluster@slave2:~/software/mpi_in_30_source/test2$ which tau_exec
/home/cluster/tools/tau-2.22.2/arm_linux/bin/tau_exec
因此两个节点上都有一个工作的tau_exec。当我在没有tau_exec的情况下运行mpirun时,一切正常。
cluster@master:~/software/mpi_in_30_source/test2$ mpirun -np 2 --hostfile hostfile ./hello.exe
hello MPI user: from process = 0 on machine=master, of NCPU=2 processes
hello MPI user: from process = 1 on machine=slave2, of NCPU=2 processes
答案 0 :(得分:2)
曾经有过这样的错误 试试吧,保持原样
mpirun -n <number> a.out
这对我有用!
答案 1 :(得分:2)
也许是因为你已经安装了openMPI而不仅仅是MPICH2,所以你应该以root身份运行以下命令:
root~# update-alternatives --config mpirun
替代mpirun有两种选择(提供/ usr / bin / mpirun)。
选择|路径|优先|状态
按Enter键保留当前选项[*]或类型选择号: 1
然后你应该选择 MPICH 版本,如上所述,以便正常运行。
答案 2 :(得分:1)
如果您正在使用mpirun运行Shell脚本,请确保已chmod +x script_file.sh
,否则会看到此错误。
答案 3 :(得分:0)
尝试在命令行中输入tau_exec
的完整路径。您的PATH可能在所有节点上都不相同。如果是这种情况,它将无法在路径不正确的任何地方找到可执行文件。
这很可能不是权限问题,但我不记得Open MPI中的所有错误消息告诉您它们可能有多大帮助。