[OMPI users] MPI_Comm

Discussion:

[OMPI users] MPI_Comm_spawn

j***@gmail.com

2016-09-29 12:45:00 UTC

Hello,

I am using MPI_Comm_spawn to dynamically create new processes from single
manager process. Everything works fine when all the processes are running
on the same node. But imposing restriction to run only a single process per
node does not work. Below are the errors produced during multinode
interactive session and multinode sbatch job.

The system I am using is: Linux version 3.10.0-229.el7.x86_64 (
***@kbuilder.dev.centos.org) (gcc version 4.8.2 20140120 (Red Hat
4.8.2-16) (GCC) )
I am using Open MPI 2.0.1
Slurm is version 15.08.9

What is preventing my jobs to spawn on multiple nodes? Does slurm requires
some additional configuration to allow it? Is it issue on the MPI side,
does it need to be compiled with some special flag (I have compiled it with
--enable-mpi-fortran=all --with-pmi)?

The code I am launching is here: https://github.com/goghino/dynamicMPI

Manager tries to launch one new process (./manager 1), the error produced
by requesting each process to be located on different node (interactive
session):
$ salloc -N 2
$ cat my_hosts
icsnode37
icsnode38
$ mpirun -np 1 -npernode 1 --hostfile my_hosts ./manager 1
[manager]I'm running MPI 3.1
[manager]Runing on node icsnode37
icsnode37.12614Assertion failure at ptl.c:183: epaddr == ((void *)0)
icsnode38.32443Assertion failure at ptl.c:183: epaddr == ((void *)0)
[icsnode37:12614] *** Process received signal ***
[icsnode37:12614] Signal: Aborted (6)
[icsnode37:12614] Signal code: (-6)
[icsnode38:32443] *** Process received signal ***
[icsnode38:32443] Signal: Aborted (6)
[icsnode38:32443] Signal code: (-6)

The same example as above via sbatch job submission:
$ cat job.sbatch
#!/bin/bash

#SBATCH --nodes=2
#SBATCH --ntasks-per-node=1

module load openmpi/2.0.1
srun -n 1 -N 1 ./manager 1

$ cat output.o
[manager]I'm running MPI 3.1
[manager]Runing on node icsnode39
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
[icsnode39:9692] *** An error occurred in MPI_Comm_spawn
[icsnode39:9692] *** reported by process [1007812608,0]
[icsnode39:9692] *** on communicator MPI_COMM_SELF
[icsnode39:9692] *** MPI_ERR_SPAWN: could not spawn processes
[icsnode39:9692] *** MPI_ERRORS_ARE_FATAL (processes in this communicator
will now abort,
[icsnode39:9692] *** and potentially your MPI job)
In: PMI_Abort(50, N/A)
slurmstepd: *** STEP 15378.0 ON icsnode39 CANCELLED AT 2016-09-26T16:48:20
***
srun: error: icsnode39: task 0: Exited with exit code 50

Thank for any feedback!

Best regards,
Juraj

Gilles Gouaillardet

2016-09-29 13:06:12 UTC

Permalink

Hi,

I do not expect spawn can work with direct launch (e.g. srun)

Do you have PSM (e.g. Infinipath) hardware ? That could be linked to the
failure

Can you please try

mpirun --mca pml ob1 --mca btl tcp,sm,self -np 1 --hostfile my_hosts
./manager 1

and see if it help ?

Note if you have the possibility, I suggest you first try that without
slurm, and then within a slurm job

Cheers,

Gilles

Post by j***@gmail.com
Hello,
I am using MPI_Comm_spawn to dynamically create new processes from single
manager process. Everything works fine when all the processes are running
on the same node. But imposing restriction to run only a single process per
node does not work. Below are the errors produced during multinode
interactive session and multinode sbatch job.
The system I am using is: Linux version 3.10.0-229.el7.x86_64 (
version 4.8.2 20140120 (Red Hat 4.8.2-16) (GCC) )
I am using Open MPI 2.0.1
Slurm is version 15.08.9
What is preventing my jobs to spawn on multiple nodes? Does slurm requires
some additional configuration to allow it? Is it issue on the MPI side,
does it need to be compiled with some special flag (I have compiled it with
--enable-mpi-fortran=all --with-pmi)?
The code I am launching is here: https://github.com/goghino/dynamicMPI
Manager tries to launch one new process (./manager 1), the error produced
by requesting each process to be located on different node (interactive
$ salloc -N 2
$ cat my_hosts
icsnode37
icsnode38
$ mpirun -np 1 -npernode 1 --hostfile my_hosts ./manager 1
[manager]I'm running MPI 3.1
[manager]Runing on node icsnode37
icsnode37.12614Assertion failure at ptl.c:183: epaddr == ((void *)0)
icsnode38.32443Assertion failure at ptl.c:183: epaddr == ((void *)0)
[icsnode37:12614] *** Process received signal ***
[icsnode37:12614] Signal: Aborted (6)
[icsnode37:12614] Signal code: (-6)
[icsnode38:32443] *** Process received signal ***
[icsnode38:32443] Signal: Aborted (6)
[icsnode38:32443] Signal code: (-6)
$ cat job.sbatch
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=1
module load openmpi/2.0.1
srun -n 1 -N 1 ./manager 1
$ cat output.o
[manager]I'm running MPI 3.1
[manager]Runing on node icsnode39
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
[icsnode39:9692] *** An error occurred in MPI_Comm_spawn
[icsnode39:9692] *** reported by process [1007812608,0]
[icsnode39:9692] *** on communicator MPI_COMM_SELF
[icsnode39:9692] *** MPI_ERR_SPAWN: could not spawn processes
[icsnode39:9692] *** MPI_ERRORS_ARE_FATAL (processes in this communicator
will now abort,
[icsnode39:9692] *** and potentially your MPI job)
In: PMI_Abort(50, N/A)
slurmstepd: *** STEP 15378.0 ON icsnode39 CANCELLED AT 2016-09-26T16:48:20
***
srun: error: icsnode39: task 0: Exited with exit code 50
Thank for any feedback!
Best regards,
Juraj

r***@open-mpi.org

2016-09-29 13:49:00 UTC

Permalink

Spawn definitely does not work with srun. I donât recognize the name of the file that segfaulted - what is âptl.câ? Is that in your manager program?

Post by Gilles Gouaillardet
Hi,
I do not expect spawn can work with direct launch (e.g. srun)
Do you have PSM (e.g. Infinipath) hardware ? That could be linked to the failure
Can you please try
mpirun --mca pml ob1 --mca btl tcp,sm,self -np 1 --hostfile my_hosts ./manager 1
and see if it help ?
Note if you have the possibility, I suggest you first try that without slurm, and then within a slurm job
Cheers,
Gilles
Hello,
I am using MPI_Comm_spawn to dynamically create new processes from single manager process. Everything works fine when all the processes are running on the same node. But imposing restriction to run only a single process per node does not work. Below are the errors produced during multinode interactive session and multinode sbatch job.
I am using Open MPI 2.0.1
Slurm is version 15.08.9
What is preventing my jobs to spawn on multiple nodes? Does slurm requires some additional configuration to allow it? Is it issue on the MPI side, does it need to be compiled with some special flag (I have compiled it with --enable-mpi-fortran=all --with-pmi)?
The code I am launching is here: https://github.com/goghino/dynamicMPI <https://github.com/goghino/dynamicMPI>
$ salloc -N 2
$ cat my_hosts
icsnode37
icsnode38
$ mpirun -np 1 -npernode 1 --hostfile my_hosts ./manager 1
[manager]I'm running MPI 3.1
[manager]Runing on node icsnode37
icsnode37.12614Assertion failure at ptl.c:183: epaddr == ((void *)0)
icsnode38.32443Assertion failure at ptl.c:183: epaddr == ((void *)0)
[icsnode37:12614] *** Process received signal ***
[icsnode37:12614] Signal: Aborted (6)
[icsnode37:12614] Signal code: (-6)
[icsnode38:32443] *** Process received signal ***
[icsnode38:32443] Signal: Aborted (6)
[icsnode38:32443] Signal code: (-6)
$ cat job.sbatch
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=1
module load openmpi/2.0.1
srun -n 1 -N 1 ./manager 1
$ cat output.o
[manager]I'm running MPI 3.1
[manager]Runing on node icsnode39
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
[icsnode39:9692] *** An error occurred in MPI_Comm_spawn
[icsnode39:9692] *** reported by process [1007812608,0]
[icsnode39:9692] *** on communicator MPI_COMM_SELF
[icsnode39:9692] *** MPI_ERR_SPAWN: could not spawn processes
[icsnode39:9692] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[icsnode39:9692] *** and potentially your MPI job)
In: PMI_Abort(50, N/A)
slurmstepd: *** STEP 15378.0 ON icsnode39 CANCELLED AT 2016-09-26T16:48:20 ***
srun: error: icsnode39: task 0: Exited with exit code 50
Thank for any feedback!
Best regards,
Juraj
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Gilles Gouaillardet

2016-09-29 13:58:57 UTC

Permalink

Ralph,

My guess is that ptl.c comes from PSM lib ...

Cheers,

Gilles

Post by r***@open-mpi.org
Spawn definitely does not work with srun. I donât recognize the name of
the file that segfaulted - what is âptl.câ? Is that in your manager program?
On Sep 29, 2016, at 6:06 AM, Gilles Gouaillardet <
Hi,
I do not expect spawn can work with direct launch (e.g. srun)
Do you have PSM (e.g. Infinipath) hardware ? That could be linked to the failure
Can you please try
mpirun --mca pml ob1 --mca btl tcp,sm,self -np 1 --hostfile my_hosts ./manager 1
and see if it help ?
Note if you have the possibility, I suggest you first try that without
slurm, and then within a slurm job
Cheers,
Gilles

Post by j***@gmail.com
Hello,
I am using MPI_Comm_spawn to dynamically create new processes from single
manager process. Everything works fine when all the processes are running
on the same node. But imposing restriction to run only a single process per
node does not work. Below are the errors produced during multinode
interactive session and multinode sbatch job.
The system I am using is: Linux version 3.10.0-229.el7.x86_64 (
4.8.2-16) (GCC) )
I am using Open MPI 2.0.1
Slurm is version 15.08.9
What is preventing my jobs to spawn on multiple nodes? Does slurm
requires some additional configuration to allow it? Is it issue on the MPI
side, does it need to be compiled with some special flag (I have compiled
it with --enable-mpi-fortran=all --with-pmi)?
The code I am launching is here: https://github.com/goghino/dynamicMPI
Manager tries to launch one new process (./manager 1), the error produced
by requesting each process to be located on different node (interactive
$ salloc -N 2
$ cat my_hosts
icsnode37
icsnode38
$ mpirun -np 1 -npernode 1 --hostfile my_hosts ./manager 1
[manager]I'm running MPI 3.1
[manager]Runing on node icsnode37
icsnode37.12614Assertion failure at ptl.c:183: epaddr == ((void *)0)
icsnode38.32443Assertion failure at ptl.c:183: epaddr == ((void *)0)
[icsnode37:12614] *** Process received signal ***
[icsnode37:12614] Signal: Aborted (6)
[icsnode37:12614] Signal code: (-6)
[icsnode38:32443] *** Process received signal ***
[icsnode38:32443] Signal: Aborted (6)
[icsnode38:32443] Signal code: (-6)
$ cat job.sbatch
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=1
module load openmpi/2.0.1
srun -n 1 -N 1 ./manager 1
$ cat output.o
[manager]I'm running MPI 3.1
[manager]Runing on node icsnode39
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
[icsnode39:9692] *** An error occurred in MPI_Comm_spawn
[icsnode39:9692] *** reported by process [1007812608,0]
[icsnode39:9692] *** on communicator MPI_COMM_SELF
[icsnode39:9692] *** MPI_ERR_SPAWN: could not spawn processes
[icsnode39:9692] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[icsnode39:9692] *** and potentially your MPI job)
In: PMI_Abort(50, N/A)
slurmstepd: *** STEP 15378.0 ON icsnode39 CANCELLED AT
2016-09-26T16:48:20 ***
srun: error: icsnode39: task 0: Exited with exit code 50
Thank for any feedback!
Best regards,
Juraj

_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

r***@open-mpi.org

2016-09-29 14:12:21 UTC

Permalink

Ah, that may be why it wouldnât show up in the OMPI code base itself. If that is the case here, then no - OMPI v2.0.1 does not support comm_spawn for PSM. It is fixed in the upcoming 2.0.2

Post by Gilles Gouaillardet
Ralph,
My guess is that ptl.c comes from PSM lib ...
Cheers,
Gilles
Spawn definitely does not work with srun. I donât recognize the name of the file that segfaulted - what is âptl.câ? Is that in your manager program?

_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Cabral, Matias A

2016-09-29 15:38:08 UTC

Permalink

Hi Giles et.al.,

You are right, ptl.c is in PSM2 code. As Ralph mentions, dynamic process support was/is not working in OMPI when using PSM2 because of an issue related to the transport keys. This was fixed in PR #1602 (https://github.com/open-mpi/ompi/pull/1602) and should be included in v2.0.2. HOWEVER, this not the error Juraj is seeing. The root of the assertion is because the PSM/PSM2 MTLs will check for where the âoriginalâ process are running and, if detects all are local to the node, it will ONLY initialize the shared memory device (variable PSM2_DEVICES="self,shmâ ). This is to avoid âreservingâ HW resources in the HFI card that wouldnât be used unless you later on spawn ranks in other nodes. Therefore, to allow dynamic process to be spawned on other nodes you need to tell PSM2 to instruct the HW to initialize all the de devices by making the environment variable PSM2_DEVICES="self,shm,hfi" available before running the job.
Note that setting PSM2_DEVICES (*) will solve the below assertion, you will most likely still see the transport key issue if PR1602 if is not included.

Thanks,

_MAC

(*)
PSM2_DEVICES -> Omni Path
PSM_DEVICES -> TrueScale

From: users [mailto:users-***@lists.open-mpi.org] On Behalf Of ***@open-mpi.org
Sent: Thursday, September 29, 2016 7:12 AM
To: Open MPI Users <***@lists.open-mpi.org>
Subject: Re: [OMPI users] MPI_Comm_spawn

Ah, that may be why it wouldnât show up in the OMPI code base itself. If that is the case here, then no - OMPI v2.0.1 does not support comm_spawn for PSM. It is fixed in the upcoming 2.0.2

On Sep 29, 2016, at 6:58 AM, Gilles Gouaillardet <***@gmail.com<mailto:***@gmail.com>> wrote:

Ralph,

My guess is that ptl.c comes from PSM lib ...

Cheers,

Gilles

On Thursday, September 29, 2016, ***@open-mpi.org<mailto:***@open-mpi.org> <***@open-mpi.org<mailto:***@open-mpi.org>> wrote:
Spawn definitely does not work with srun. I donât recognize the name of the file that segfaulted - what is âptl.câ? Is that in your manager program?

On Sep 29, 2016, at 6:06 AM, Gilles Gouaillardet <***@gmail.com<javascript:_e(%7B%7D,'cvml','***@gmail.com');>> wrote:

Hi,

I do not expect spawn can work with direct launch (e.g. srun)

Do you have PSM (e.g. Infinipath) hardware ? That could be linked to the failure

Can you please try

mpirun --mca pml ob1 --mca btl tcp,sm,self -np 1 --hostfile my_hosts ./manager 1

and see if it help ?

Note if you have the possibility, I suggest you first try that without slurm, and then within a slurm job

Cheers,

Gilles

On Thursday, September 29, 2016, ***@gmail.com<javascript:_e(%7B%7D,'cvml','***@gmail.com');> <***@gmail.com<javascript:_e(%7B%7D,'cvml','***@gmail.com');>> wrote:
Hello,

I am using MPI_Comm_spawn to dynamically create new processes from single manager process. Everything works fine when all the processes are running on the same node. But imposing restriction to run only a single process per node does not work. Below are the errors produced during multinode interactive session and multinode sbatch job.

The system I am using is: Linux version 3.10.0-229.el7.x86_64 (***@kbuilder.dev.centos.org<mailto:***@kbuilder.dev.centos.org>) (gcc version 4.8.2 20140120 (Red Hat 4.8.2-16) (GCC) )
I am using Open MPI 2.0.1
Slurm is version 15.08.9

What is preventing my jobs to spawn on multiple nodes? Does slurm requires some additional configuration to allow it? Is it issue on the MPI side, does it need to be compiled with some special flag (I have compiled it with --enable-mpi-fortran=all --with-pmi)?

The code I am launching is here: https://github.com/goghino/dynamicMPI

Manager tries to launch one new process (./manager 1), the error produced by requesting each process to be located on different node (interactive session):
$ salloc -N 2
$ cat my_hosts
icsnode37
icsnode38
$ mpirun -np 1 -npernode 1 --hostfile my_hosts ./manager 1
[manager]I'm running MPI 3.1
[manager]Runing on node icsnode37
icsnode37.12614Assertion failure at ptl.c:183: epaddr == ((void *)0)
icsnode38.32443Assertion failure at ptl.c:183: epaddr == ((void *)0)
[icsnode37:12614] *** Process received signal ***
[icsnode37:12614] Signal: Aborted (6)
[icsnode37:12614] Signal code: (-6)
[icsnode38:32443] *** Process received signal ***
[icsnode38:32443] Signal: Aborted (6)
[icsnode38:32443] Signal code: (-6)

The same example as above via sbatch job submission:
$ cat job.sbatch
#!/bin/bash

#SBATCH --nodes=2
#SBATCH --ntasks-per-node=1

module load openmpi/2.0.1
srun -n 1 -N 1 ./manager 1

$ cat output.o
[manager]I'm running MPI 3.1
[manager]Runing on node icsnode39
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
[icsnode39:9692] *** An error occurred in MPI_Comm_spawn
[icsnode39:9692] *** reported by process [1007812608,0]
[icsnode39:9692] *** on communicator MPI_COMM_SELF
[icsnode39:9692] *** MPI_ERR_SPAWN: could not spawn processes
[icsnode39:9692] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[icsnode39:9692] *** and potentially your MPI job)
In: PMI_Abort(50, N/A)
slurmstepd: *** STEP 15378.0 ON icsnode39 CANCELLED AT 2016-09-26T16:48:20 ***
srun: error: icsnode39: task 0: Exited with exit code 50

Thank for any feedback!

Best regards,
Juraj
_______________________________________________
users mailing list
***@lists.open-mpi.org<javascript:_e(%7B%7D,'cvml','***@lists.open-mpi.org');>
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

j***@gmail.com

2016-09-29 14:38:57 UTC

Permalink

The solution was to use the "tcp", "sm" and "self" BTLs for the transport
of MPI messages, with TCP restricting only the eth0 interface to
communicate and using ob1 as p2p management layer:

mpirun --mca btl_tcp_if_include eth0 --mca pml ob1 --mca btl tcp,sm,self
-np 1 --hostfile my_hosts ./manager 1

âThank you for your help!â