Discussion:
[OMPI users] Issue with MPI_Init in MPI_Comm_Spawn
Kiker, Kathleen R
2018-11-28 17:33:47 UTC
Permalink
Good Afternoon,

I'm trying to diagnose an issue I've been having with MPI_Comm_Spawn. When I run the simple example program:

#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>

int main( int argc, char *argv[] )
{
int np[2] = { 1, 1 };
int errcodes[2];
MPI_Comm parentcomm, intercomm;
char *cmds[2] = { "spawn_example", "spawn_example" };
MPI_Info infos[2] = { MPI_INFO_NULL, MPI_INFO_NULL };

MPI_Init( &argc, &argv );
MPI_Comm_get_parent( &parentcomm );
if (parentcomm == MPI_COMM_NULL)
{
/* Create 2 more processes - this example must be called spawn_example.exe for this to work. */
MPI_Comm_spawn_multiple( 2, cmds, MPI_ARGVS_NULL, np, infos, 0, MPI_COMM_WORLD, &intercomm, errcodes );
printf("I'm the parent.\n");
}
else
{
printf("I'm the spawned.\n");
}
fflush(stdout);
MPI_Finalize();
return 0;
}

I get the output:


--------------------------------------------------------------------------

It looks like MPI_INIT failed for some reason; your parallel process is

likely to abort. There are many reasons that a parallel process can

fail during MPI_INIT; some of which are due to configuration or environment

problems. This failure appears to be an internal failure; here's some

additional information (which may only be relevant to an Open MPI

developer):



ompi_dpm_dyn_init() failed

--> Returned "Unreachable" (-12) instead of "Success" (0)

--------------------------------------------------------------------------

I'm using OpenMPI 3.1.1. I know past versions (like 2.x) had a similar issue, but I believe those were fixed by this version. Is there something else that can cause this?

Thank you,
Kathleen
Ralph H Castain
2018-11-29 14:38:07 UTC
Permalink
I ran a simple spawn test - you can find it in the OMPI code at orte/test/mpi/simple_spawn.c - and it worked fine:
$ mpirun -n 2 ./simple_spawn
[1858076673:0 pid 19909] starting up on node Ralphs-iMac-2.local!
[1858076673:1 pid 19910] starting up on node Ralphs-iMac-2.local!
1 completed MPI_Init
Parent [pid 19910] about to spawn!
0 completed MPI_Init
Parent [pid 19909] about to spawn!
[1858076674:0 pid 19911] starting up on node Ralphs-iMac-2.local!
[1858076674:1 pid 19912] starting up on node Ralphs-iMac-2.local!
[1858076674:2 pid 19913] starting up on node Ralphs-iMac-2.local!
Parent done with spawn
Parent sending message to child
Parent done with spawn
2 completed MPI_Init
Hello from the child 2 of 3 on host Ralphs-iMac-2.local pid 19913
1 completed MPI_Init
Hello from the child 1 of 3 on host Ralphs-iMac-2.local pid 19912
0 completed MPI_Init
Hello from the child 0 of 3 on host Ralphs-iMac-2.local pid 19911
Child 0 received msg: 38
Parent disconnected
Parent disconnected
Child 0 disconnected
Child 1 disconnected
Child 2 disconnected
19910: exiting
19911: exiting
19912: exiting
19913: exiting
19909: exiting
$

I then ran our spawn_multiple test - again, you can find it at orte/test/mpi/spawn_multiple.c:
$ mpirun -n 2 ./spawn_multiple
Parent [pid 19946] about to spawn!
Parent [pid 19947] about to spawn!
Parent done with spawn
Parent sending message to children
Parent done with spawn
Hello from the child 1 of 2 on host Ralphs-iMac-2.local pid 19949: argv[1] = bar
Hello from the child 0 of 2 on host Ralphs-iMac-2.local pid 19948: argv[1] = foo
Child 0 received msg: 38
Child 1 received msg: 38
Parent disconnected
Child 1 disconnected
Child 0 disconnected
Parent disconnected
$

How did you configure OMPI, and how were you running your example?
Post by Kiker, Kathleen R
Good Afternoon,
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
int main( int argc, char *argv[] )
{
int np[2] = { 1, 1 };
int errcodes[2];
MPI_Comm parentcomm, intercomm;
char *cmds[2] = { "spawn_example", "spawn_example" };
MPI_Info infos[2] = { MPI_INFO_NULL, MPI_INFO_NULL };
MPI_Init( &argc, &argv );
MPI_Comm_get_parent( &parentcomm );
if (parentcomm == MPI_COMM_NULL)
{
/* Create 2 more processes - this example must be called spawn_example.exe for this to work. */
MPI_Comm_spawn_multiple( 2, cmds, MPI_ARGVS_NULL, np, infos, 0, MPI_COMM_WORLD, &intercomm, errcodes );
printf("I'm the parent.\n");
}
else
{
printf("I'm the spawned.\n");
}
fflush(stdout);
MPI_Finalize();
return 0;
}
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
ompi_dpm_dyn_init() failed
--> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
I’m using OpenMPI 3.1.1. I know past versions (like 2.x) had a similar issue, but I believe those were fixed by this version. Is there something else that can cause this?
Thank you,
Kathleen
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users <https://lists.open-mpi.org/mailman/listinfo/users>
Loading...