Discussion:
[OMPI users] Failed to create a queue pair (QP) error
Ilchenko Evgeniy
2017-04-08 15:54:31 UTC
Permalink
Hi!

Problem with random segfault for java-programs solved by adding mca options:

$path_to_openmpi_bin/mpirun -np 1 -mca btl self,sm,openib
$path_to_java_bin/java randomTest

Thanks to Eshsou Hashba and Michael Kalugin!


But i get other problems!

If I start mpirun from manager-node (without ssh-login to calculation node)

$path_to_openmpi_bin/mpirun -np 2 -host node188,node189 -mca btl
self,sm,openib $path_to_java_bin/java randomTest

I get next error:


$openmpi1.10_folder/bin/orted: error while loading shared libraries:
libimf.so: cannot open shared object file: No such file or directory
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.

* the inability to write startup files into /tmp
(--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.

* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
--------------------------------------------------------------------------

If I throw LD_LIBRARY_PATH (that contain path to libimf.so) via -x option
to mpirun:

$path_to_openmpi_bin/mpirun -x LD_LIBRARY_PATH -np 2 -host node188,node189
-mca btl self,sm,openib $path_to_java_bin/java randomTest

then I get same error (orted: error while loading shared libraries:
libimf.so: cannot open shared object file: No such file or directory).

How I can throw lib path for spawned mpi processes and orted?
I don't have root-privileges on this cluster.
Gilles Gouaillardet
2017-04-09 01:43:20 UTC
Permalink
What happens is mpirun does under the hood
<remote_exec> orted
And your remote_exec does not propagate LD_LIBRARY_PATH
one option is to configure your remote_exec to do so, but I'd rather suggest
you re-configure ompi with --enable-orterun-prefix-by-default
If your remote_exec is ssh (if you are not running under a supported batch
manager), then
ssh node188 ldd $path_to_openmpi_bin/orted
should show zero unresolved libraries

Cheers,

Gilles
Post by Ilchenko Evgeniy
Hi!
$path_to_openmpi_bin/mpirun -np 1 -mca btl self,sm,openib
$path_to_java_bin/java randomTest
Thanks to Eshsou Hashba and Michael Kalugin!
But i get other problems!
If I start mpirun from manager-node (without ssh-login to calculation node)
$path_to_openmpi_bin/mpirun -np 2 -host node188,node189 -mca btl
self,sm,openib $path_to_java_bin/java randomTest
libimf.so: cannot open shared object file: No such file or directory
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default
* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.
* the inability to write startup files into /tmp
(--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.
* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.
* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
--------------------------------------------------------------------------
If I throw LD_LIBRARY_PATH (that contain path to libimf.so) via -x option
$path_to_openmpi_bin/mpirun -x LD_LIBRARY_PATH -np 2 -host
node188,node189 -mca btl self,sm,openib $path_to_java_bin/java randomTest
libimf.so: cannot open shared object file: No such file or directory).
How I can throw lib path for spawned mpi processes and orted?
I don't have root-privileges on this cluster.
Loading...