Anastasia Kruchinina
2017-02-14 13:11:16 UTC
Hi,
I am trying to use MPI_Comm_spawn function in my code. I am having trouble
with openmpi 2.0.x + sbatch (batch system Slurm).
My test program is located here:
http://user.it.uu.se/~anakr367/files/MPI_test/
When I am running my code I am getting an error:
OPAL ERROR: Timeout in file
../../../../openmpi-2.0.1/opal/mca/pmix/base/pmix_base_fns.c at line 193
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_dpm_dyn_init() failed
--> Returned "Timeout" (-15) instead of "Success" (0)
--------------------------------------------------------------------------
The interesting thing is that there is no error when I am firstly
allocating nodes with salloc and then run my program. So, I noticed that
the program works fine using openmpi 1.x+sbach/salloc or openmpi
2.0.x+salloc but not openmpi 2.0.x+sbatch.
The error was reproduced on three different computer clusters.
Best regards,
Anastasia
I am trying to use MPI_Comm_spawn function in my code. I am having trouble
with openmpi 2.0.x + sbatch (batch system Slurm).
My test program is located here:
http://user.it.uu.se/~anakr367/files/MPI_test/
When I am running my code I am getting an error:
OPAL ERROR: Timeout in file
../../../../openmpi-2.0.1/opal/mca/pmix/base/pmix_base_fns.c at line 193
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_dpm_dyn_init() failed
--> Returned "Timeout" (-15) instead of "Success" (0)
--------------------------------------------------------------------------
The interesting thing is that there is no error when I am firstly
allocating nodes with salloc and then run my program. So, I noticed that
the program works fine using openmpi 1.x+sbach/salloc or openmpi
2.0.x+salloc but not openmpi 2.0.x+sbatch.
The error was reproduced on three different computer clusters.
Best regards,
Anastasia