Discussion:
[OMPI users] OMPI users] Slot count parameter in hostfile ignored
Gilles Gouaillardet
2017-09-08 10:02:24 UTC
Permalink
For the time being, you can
srun --ntasks-per-node 24 --jobid=...
When joining the allocation.

This use case looks a bit convoluted to me, so i am not even sure we should consider there is a bug in Open MPI.

Ralph, any thoughts ?

Cheers,

Gilles
Thanks, now i can reproduce the issue
Cheers,
Gilles
I run start an interactive allocation and I just noticed that the problem happens, when I join this allocation from another shell.
srun --pty --x11 --jobid=$(squeue -u $USER -o %A | tail -n 1) bash
srun --pty --nodes 8 --ntasks-per-node 24 --mem 50G --time=3:00:00 --partition=haswell --x11 bash
Maxsym,
can you please post your sbatch script ?
fwiw, i am unable to reproduce the issue with the latest v2.x from github.
by any chance, would you be able to test the latest openmpi 2.1.2rc3 ?
Cheers,
Gilles
Indeed mpirun shows slots=1 per node, but I create allocation with
--ntasks-per-node 24, so I do have all cores of the node allocated.
When I use srun I can get all the cores.
My best guess is that SLURM has only allocated 2 slots, and we
respect the RM regardless of what you say in the hostfile. You can
check this by adding --display-allocation to your cmd line. You
probably need to tell slurm to allocate more cpus/node.
On Sep 7, 2017, at 3:33 AM, Maksym Planeta
Hello,
I'm trying to tell OpenMPI how many processes per node I want to
use, but mpirun seems to ignore the configuration I provide.
$ cat hostfile.16
taurusi6344 slots=16
taurusi6348 slots=16
$ mpirun --display-map -machinefile hostfile.16 -np 2 hostname
Data for JOB [42099,1] offset 0
======================== JOB MAP ========================
Data for node: taurusi6344 Num slots: 1 Max slots: 0 Num
procs: 1
socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core
2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket
0[core 5[hwt 0]], socket 0[core 6[hwt 0]], socket 0[core 7[hwt 0]],
socket 0[core 8[hwt 0]], socket 0[core 9[hwt 0]], socket 0[core
10[hwt 0]], socket 0[core 11[hwt
0]]:[B/B/B/B/B/B/B/B/B/B/B/B][./././././././././././.]
Data for node: taurusi6348 Num slots: 1 Max slots: 0 Num
procs: 1
socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core
2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket
0[core 5[hwt 0]], socket 0[core 6[hwt 0]], socket 0[core 7[hwt 0]],
socket 0[core 8[hwt 0]], socket 0[core 9[hwt 0]], socket 0[core
10[hwt 0]], socket 0[core 11[hwt
0]]:[B/B/B/B/B/B/B/B/B/B/B/B][./././././././././././.]
=============================================================
taurusi6344
taurusi6348
If I put anything more than 2 in "-np 2", I get following error
$ mpirun --display-map -machinefile hostfile.16 -np 4 hostname
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 4
slots
hostname
Either request fewer slots for your application, or make more slots
available
for use.
--------------------------------------------------------------------------
The OpenMPI version is "mpirun (Open MPI) 2.1.0"
Also there is SLURM installed with version "slurm
16.05.7-Bull.1.1-20170512-1252"
Could you help me to enforce OpenMPI to respect slots paremeter?
--
Regards,
Maksym Planeta
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Loading...