Siegmar Gross
2017-12-20 06:49:28 UTC
Hi,
I've installed openmpi-v3.0.0 on my "SUSE Linux Enterprise Server 12.3 (x86_64)"
with gcc-6.4.0. Today I discovered that I get an error for --map-by that I don't
get with older versions.
loki fd1026 115 which mpiexec
/usr/local/openmpi-2.0.3_64_gcc/bin/mpiexec
loki fd1026 116 mpiexec --host pc02:2,pc03:2 --map-by ppr:1:socket:pe=1 date
Wed Dec 20 07:41:00 CET 2017
,...
loki fd1026 107 which mpiexec
/usr/local/openmpi-2.1.2_64_gcc/bin/mpiexec
loki fd1026 108 mpiexec --host pc02:2,pc03:2 --map-by ppr:1:socket:pe=1 date
Wed Dec 20 07:41:27 CET 2017
...
loki fd1026 107 which mpiexec
/usr/local/openmpi-3.0.0_64_gcc/bin/mpiexec
loki fd1026 108 mpiexec --host pc02:2,pc03:2 --map-by ppr:1:socket:pe=1 date
[loki:32662] SETTING BINDING TO CORE
[pc02:04420] SETTING BINDING TO CORE
[pc03:04788] SETTING BINDING TO CORE
--------------------------------------------------------------------------
The request to bind processes could not be completed due to
an internal error - the locale of the following process was
not set by the mapper code:
Process: [[57386,1],3]
Please contact the OMPI developers for assistance. Meantime,
you will still be able to run your application without binding
by specifying "--bind-to none" on your command line.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
ORTE has lost communication with a remote daemon.
HNP daemon : [[57386,0],0] on node loki
Remote daemon: [[57386,0],2] on node pc03
This is usually due to either a failure of the TCP network
connection to the node, or possibly an internal failure of
the daemon itself. We cannot recover from this failure, and
therefore will terminate the job.
--------------------------------------------------------------------------
[loki:32662] 1 more process has sent help message help-orte-rmaps-base.txt /
rmaps:no-locale
[loki:32662] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help /
error messages
loki fd1026 109
I would be grateful, if somebody can fix the problem. Do you need anything
else? Thank you very much for any help in advance.
Kind regards
Siegmar
I've installed openmpi-v3.0.0 on my "SUSE Linux Enterprise Server 12.3 (x86_64)"
with gcc-6.4.0. Today I discovered that I get an error for --map-by that I don't
get with older versions.
loki fd1026 115 which mpiexec
/usr/local/openmpi-2.0.3_64_gcc/bin/mpiexec
loki fd1026 116 mpiexec --host pc02:2,pc03:2 --map-by ppr:1:socket:pe=1 date
Wed Dec 20 07:41:00 CET 2017
,...
loki fd1026 107 which mpiexec
/usr/local/openmpi-2.1.2_64_gcc/bin/mpiexec
loki fd1026 108 mpiexec --host pc02:2,pc03:2 --map-by ppr:1:socket:pe=1 date
Wed Dec 20 07:41:27 CET 2017
...
loki fd1026 107 which mpiexec
/usr/local/openmpi-3.0.0_64_gcc/bin/mpiexec
loki fd1026 108 mpiexec --host pc02:2,pc03:2 --map-by ppr:1:socket:pe=1 date
[loki:32662] SETTING BINDING TO CORE
[pc02:04420] SETTING BINDING TO CORE
[pc03:04788] SETTING BINDING TO CORE
--------------------------------------------------------------------------
The request to bind processes could not be completed due to
an internal error - the locale of the following process was
not set by the mapper code:
Process: [[57386,1],3]
Please contact the OMPI developers for assistance. Meantime,
you will still be able to run your application without binding
by specifying "--bind-to none" on your command line.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
ORTE has lost communication with a remote daemon.
HNP daemon : [[57386,0],0] on node loki
Remote daemon: [[57386,0],2] on node pc03
This is usually due to either a failure of the TCP network
connection to the node, or possibly an internal failure of
the daemon itself. We cannot recover from this failure, and
therefore will terminate the job.
--------------------------------------------------------------------------
[loki:32662] 1 more process has sent help message help-orte-rmaps-base.txt /
rmaps:no-locale
[loki:32662] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help /
error messages
loki fd1026 109
I would be grateful, if somebody can fix the problem. Do you need anything
else? Thank you very much for any help in advance.
Kind regards
Siegmar