Jing Gong
2017-02-21 13:08:07 UTC
Hi,
The email is intended to follow the thread about
"Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch".
https://mail-archive.com/***@lists.open-mpi.org/msg30650.html
We have installed the latest version v2.0.2 on the cluster that
<https://mail-archive.com/***@lists.open-mpi.org/msg30654.html>Anastasia Kruchinina were running.
It seems to me that the issue still is not fixed in v2.0.2.
The job script and sample codes can be found at
https://www.pdc.kth.se/~gongjing/files/test_spawn/
The messages we got
$ cat error_file.e
Currently Loaded Modulefiles:
[t03n06.pdc.kth.se:39767] OPAL ERROR: Timeout in file base/pmix_base_fns.c at line 193
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
$ cat output_file.o
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_dpm_dyn_init() failed
--> Returned "Timeout" (-15) instead of "Success" (0)
--------------------------------------------------------------------------
Please let me know if you need additional information.
Thanks a lot for your help.
Regards, Jing Gong
The email is intended to follow the thread about
"Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch".
https://mail-archive.com/***@lists.open-mpi.org/msg30650.html
We have installed the latest version v2.0.2 on the cluster that
<https://mail-archive.com/***@lists.open-mpi.org/msg30654.html>Anastasia Kruchinina were running.
It seems to me that the issue still is not fixed in v2.0.2.
The job script and sample codes can be found at
https://www.pdc.kth.se/~gongjing/files/test_spawn/
The messages we got
$ cat error_file.e
Currently Loaded Modulefiles:
[t03n06.pdc.kth.se:39767] OPAL ERROR: Timeout in file base/pmix_base_fns.c at line 193
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
$ cat output_file.o
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_dpm_dyn_init() failed
--> Returned "Timeout" (-15) instead of "Success" (0)
--------------------------------------------------------------------------
Please let me know if you need additional information.
Thanks a lot for your help.
Regards, Jing Gong