Discussion:
[OMPI users] Error bash: /usr/mpi/gcc/openmpi-1.8.8/bin/orted: No such file or directory
Sebastian Antunez N.
2016-11-19 23:22:35 UTC
Permalink
Hello Guys

I have a cluster of HPC and I update OFED, Firmware etc.

Post reboot and run mpirun -machinefile nodes8 -n 128
/home/HPL/run_hpl/xhpl show the following error

bash: /usr/mpi/gcc/openmpi-1.8.8/bin/orted: No such file or directory
bash: /usr/mpi/gcc/openmpi-1.8.8/bin/orted: No such file or directory
bash: /usr/mpi/gcc/openmpi-1.8.8/bin/orted: No such file or directory
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.

* the inability to write startup files into /tmp
(--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.

* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).



Before update I have version 1.6.4 and the cluster not show errors when I
run the mpirun

I changed the Enviroment Variables but persist the error.

Is possible ypur comment who resolved the issue.

Regards

Sebastian Antunez
Gilles Gouaillardet
2016-11-20 08:14:33 UTC
Permalink
Sebastian,

The error message is pretty self-explanatory
/usr/mpi/gcc/openmpi-1.8.8/bin/orted is missing on your compute nodes.

it seems you are using /usr/mpi/gcc/openmpi-1.8.8/bin/mpirun on your
frontend node
(e.g. the node on which mpirun is invoked)
but Open MPI was not updated on some nodes listed in your nodes8 machinefile

you likely want to contact your sysadmin and figure this out

Cheers,

Gilles

On Sat, Nov 19, 2016 at 4:22 PM, Sebastian Antunez N.
Post by Sebastian Antunez N.
Hello Guys
I have a cluster of HPC and I update OFED, Firmware etc.
Post reboot and run mpirun -machinefile nodes8 -n 128
/home/HPL/run_hpl/xhpl show the following error
bash: /usr/mpi/gcc/openmpi-1.8.8/bin/orted: No such file or directory
bash: /usr/mpi/gcc/openmpi-1.8.8/bin/orted: No such file or directory
bash: /usr/mpi/gcc/openmpi-1.8.8/bin/orted: No such file or directory
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default
* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.
* the inability to write startup files into /tmp
(--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.
* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.
* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
Before update I have version 1.6.4 and the cluster not show errors when I
run the mpirun
I changed the Enviroment Variables but persist the error.
Is possible ypur comment who resolved the issue.
Regards
Sebastian Antunez
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Sebastian Antunez N.
2016-11-20 10:50:37 UTC
Permalink
Hello

Thank for your comment

Only the frontend was updated directly via install.sh fron ofed 2.4.3 to
ofed 3.1.1.0 and contains openmpi 1.8.8.

Now the compute node have a older version of ofed 2.4 with openmpi 1.6.4

My question; if is possible update ofed directly in the compute node
executing install.sh in ofed or is recomended add the rolls and update the
nodes.

Regards.

Sebastian
Post by Gilles Gouaillardet
Sebastian,
The error message is pretty self-explanatory
/usr/mpi/gcc/openmpi-1.8.8/bin/orted is missing on your compute nodes.
it seems you are using /usr/mpi/gcc/openmpi-1.8.8/bin/mpirun on your
frontend node
(e.g. the node on which mpirun is invoked)
but Open MPI was not updated on some nodes listed in your nodes8 machinefile
you likely want to contact your sysadmin and figure this out
Cheers,
Gilles
On Sat, Nov 19, 2016 at 4:22 PM, Sebastian Antunez N.
Post by Sebastian Antunez N.
Hello Guys
I have a cluster of HPC and I update OFED, Firmware etc.
Post reboot and run mpirun -machinefile nodes8 -n 128
/home/HPL/run_hpl/xhpl show the following error
bash: /usr/mpi/gcc/openmpi-1.8.8/bin/orted: No such file or directory
bash: /usr/mpi/gcc/openmpi-1.8.8/bin/orted: No such file or directory
bash: /usr/mpi/gcc/openmpi-1.8.8/bin/orted: No such file or directory
------------------------------------------------------------
--------------
Post by Sebastian Antunez N.
ORTE was unable to reliably start one or more daemons.
* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default
* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.
* the inability to write startup files into /tmp
(--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to
use.
Post by Sebastian Antunez N.
* compilation of the orted with dynamic libraries when static are
required
Post by Sebastian Antunez N.
(e.g., on Cray). Please check your configure cmd line and consider
using
Post by Sebastian Antunez N.
one of the contrib/platform definitions for your system type.
* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
Before update I have version 1.6.4 and the cluster not show errors when I
run the mpirun
I changed the Enviroment Variables but persist the error.
Is possible ypur comment who resolved the issue.
Regards
Sebastian Antunez
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Loading...