Discussion:
[OMPI users] OpenMPI2 + slurm (Ralph H Castain)
Lothar Brendel
2018-11-25 10:21:48 UTC
Permalink
srun -n 2 mpirun MPI-hellow
tells srun to launch two copies of mpirun, each of which is to run as many processes as there are slots assigned to the allocation. srun will get an allocation of two slots, and so you?ll get two concurrent MPI jobs, each consisting of two procs.
Exactly. Hence, launching a job via this "minimal command line" and a certain number x after the -n, will always occupy x^2 processors.
Just out of curiocity: In which situation is this reasonable? IMHO the number of concurrent MPI jobs on one hand and the number of procs allocated by each of them on the other, these are quite independent parameters.
srun -c 2 mpirun -np 2 MPI-hellow
told srun to get two slots but only run one copy (the default value of the -n option) of mpirun, and you told mpirun to launch two procs. So you got one job consisting of two procs.
Exactly. Sadly, it bails out with Slurm 16.05.9 + OpenMPI 2.0.2 (while it used to work with Slurm 14.03.9 + OpenMPI 1.6.5). But most probabably I'm barking up the wrong tree here: I've just checked that the handling of SLURM_CPUS_PER_TASK hasn't changed in OpenMPI's ras_slurm_module.c.
What you probably want to do is what Gilles advised. However, Slurm 16.05 only supports PMIx v1,
Actually, --mpi=list doesn't show any PMIx at all.

Well, I'll probably pass to newer versions of Slurm AND OpenMPI.

Thanks a lot to both of you
Lothar

Loading...