Discussion:
[OMPI users] Slurm binding not propagated to MPI jobs
Andy Riebs
2016-10-27 15:48:54 UTC
Permalink
Hi All,

We are running Open MPI version 1.10.2, built with support for Slurm
version 16.05.0. When a user specifies "--cpu_bind=none", MPI tries to
bind by core, which segv's if there are more processes than cores.

The user reports:

What I found is that

% srun --ntasks-per-node=8 --cpu_bind=none \
env SHMEM_SYMMETRIC_HEAP_SIZE=1024M bin/all2all.shmem.exe 0

will have the problem, but:

% srun --ntasks-per-node=8 --cpu_bind=none \
env SHMEM_SYMMETRIC_HEAP_SIZE=1024M ./bindit.sh bin/all2all.shmem.exe 0

Will run as expected and print out the usage message because I didn’t
provide the right arguments to the code.

So, it appears that the binding has something to do with the issue. My
binding script is as follows:

% cat bindit.sh
#!/bin/bash

#echo SLURM_LOCALID=$SLURM_LOCALID

stride=1

if [ ! -z "$SLURM_LOCALID" ]; then
let bindCPU=$SLURM_LOCALID*$stride
exec numactl --membind=0 --physcpubind=$bindCPU $*
fi

$*

%
--
Andy Riebs
***@hpe.com
Hewlett-Packard Enterprise
High Performance Computing Software Engineering
+1 404 648 9024
My opinions are not necessarily those of HPE
May the source be with you!
r***@open-mpi.org
2016-10-27 15:57:07 UTC
Permalink
Hey Andy

Is there a SLURM envar that would tell us the binding option from the srun cmd line? We automatically bind when direct launched due to user complaints of poor performance if we don’t. If the user specifies a binding option, then we detect that we were already bound and don’t do it.

However, if the user specifies that they not be bound, then we think they simply didn’t specify anything - and that isn’t the case. If we can see something that tells us “they explicitly said not to do it”, then we can avoid the situation.

Ralph
Post by Andy Riebs
Hi All,
We are running Open MPI version 1.10.2, built with support for Slurm version 16.05.0. When a user specifies "--cpu_bind=none", MPI tries to bind by core, which segv's if there are more processes than cores.
What I found is that
% srun --ntasks-per-node=8 --cpu_bind=none \
env SHMEM_SYMMETRIC_HEAP_SIZE=1024M bin/all2all.shmem.exe 0
% srun --ntasks-per-node=8 --cpu_bind=none \
env SHMEM_SYMMETRIC_HEAP_SIZE=1024M ./bindit.sh bin/all2all.shmem.exe 0
Will run as expected and print out the usage message because I didn’t provide the right arguments to the code.
% cat bindit.sh
#!/bin/bash
#echo SLURM_LOCALID=$SLURM_LOCALID
stride=1
if [ ! -z "$SLURM_LOCALID" ]; then
let bindCPU=$SLURM_LOCALID*$stride
exec numactl --membind=0 --physcpubind=$bindCPU $*
fi
$*
%
--
Andy Riebs
Hewlett-Packard Enterprise
High Performance Computing Software Engineering
+1 404 648 9024
My opinions are not necessarily those of HPE
May the source be with you!
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Andy Riebs
2016-10-27 17:14:06 UTC
Permalink
Hi Ralph,

I think I've found the magic keys...

$ srun --ntasks-per-node=2 -N1 --cpu_bind=none env | grep BIND
SLURM_CPU_BIND_VERBOSE=quiet
SLURM_CPU_BIND_TYPE=none
SLURM_CPU_BIND_LIST=
SLURM_CPU_BIND=quiet,none
SLURM_CPU_BIND_VERBOSE=quiet
SLURM_CPU_BIND_TYPE=none
SLURM_CPU_BIND_LIST=
SLURM_CPU_BIND=quiet,none
$ srun --ntasks-per-node=2 -N1 --cpu_bind=core env | grep BIND
SLURM_CPU_BIND_VERBOSE=quiet
SLURM_CPU_BIND_TYPE=mask_cpu:
SLURM_CPU_BIND_LIST=0x1111,0x2222
SLURM_CPU_BIND=quiet,mask_cpu:0x1111,0x2222
SLURM_CPU_BIND_VERBOSE=quiet
SLURM_CPU_BIND_TYPE=mask_cpu:
SLURM_CPU_BIND_LIST=0x1111,0x2222
SLURM_CPU_BIND=quiet,mask_cpu:0x1111,0x2222

Andy
Post by r***@open-mpi.org
Hey Andy
Is there a SLURM envar that would tell us the binding option from the srun cmd line? We automatically bind when direct launched due to user complaints of poor performance if we don’t. If the user specifies a binding option, then we detect that we were already bound and don’t do it.
However, if the user specifies that they not be bound, then we think they simply didn’t specify anything - and that isn’t the case. If we can see something that tells us “they explicitly said not to do it”, then we can avoid the situation.
Ralph
Post by Andy Riebs
Hi All,
We are running Open MPI version 1.10.2, built with support for Slurm version 16.05.0. When a user specifies "--cpu_bind=none", MPI tries to bind by core, which segv's if there are more processes than cores.
What I found is that
% srun --ntasks-per-node=8 --cpu_bind=none \
env SHMEM_SYMMETRIC_HEAP_SIZE=1024M bin/all2all.shmem.exe 0
% srun --ntasks-per-node=8 --cpu_bind=none \
env SHMEM_SYMMETRIC_HEAP_SIZE=1024M ./bindit.sh bin/all2all.shmem.exe 0
Will run as expected and print out the usage message because I didn’t provide the right arguments to the code.
% cat bindit.sh
#!/bin/bash
#echo SLURM_LOCALID=$SLURM_LOCALID
stride=1
if [ ! -z "$SLURM_LOCALID" ]; then
let bindCPU=$SLURM_LOCALID*$stride
exec numactl --membind=0 --physcpubind=$bindCPU $*
fi
$*
%
--
Andy Riebs
Hewlett-Packard Enterprise
High Performance Computing Software Engineering
+1 404 648 9024
My opinions are not necessarily those of HPE
May the source be with you!
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
r***@open-mpi.org
2016-10-27 17:17:46 UTC
Permalink
And if there is no --cpu_bind on the cmd line? Do these not exist?
Post by Andy Riebs
Hi Ralph,
I think I've found the magic keys...
$ srun --ntasks-per-node=2 -N1 --cpu_bind=none env | grep BIND
SLURM_CPU_BIND_VERBOSE=quiet
SLURM_CPU_BIND_TYPE=none
SLURM_CPU_BIND_LIST=
SLURM_CPU_BIND=quiet,none
SLURM_CPU_BIND_VERBOSE=quiet
SLURM_CPU_BIND_TYPE=none
SLURM_CPU_BIND_LIST=
SLURM_CPU_BIND=quiet,none
$ srun --ntasks-per-node=2 -N1 --cpu_bind=core env | grep BIND
SLURM_CPU_BIND_VERBOSE=quiet
SLURM_CPU_BIND_LIST=0x1111,0x2222
SLURM_CPU_BIND=quiet,mask_cpu:0x1111,0x2222
SLURM_CPU_BIND_VERBOSE=quiet
SLURM_CPU_BIND_LIST=0x1111,0x2222
SLURM_CPU_BIND=quiet,mask_cpu:0x1111,0x2222
Andy
Post by r***@open-mpi.org
Hey Andy
Is there a SLURM envar that would tell us the binding option from the srun cmd line? We automatically bind when direct launched due to user complaints of poor performance if we don’t. If the user specifies a binding option, then we detect that we were already bound and don’t do it.
However, if the user specifies that they not be bound, then we think they simply didn’t specify anything - and that isn’t the case. If we can see something that tells us “they explicitly said not to do it”, then we can avoid the situation.
Ralph
Post by Andy Riebs
Hi All,
We are running Open MPI version 1.10.2, built with support for Slurm version 16.05.0. When a user specifies "--cpu_bind=none", MPI tries to bind by core, which segv's if there are more processes than cores.
What I found is that
% srun --ntasks-per-node=8 --cpu_bind=none \
env SHMEM_SYMMETRIC_HEAP_SIZE=1024M bin/all2all.shmem.exe 0
% srun --ntasks-per-node=8 --cpu_bind=none \
env SHMEM_SYMMETRIC_HEAP_SIZE=1024M ./bindit.sh bin/all2all.shmem.exe 0
Will run as expected and print out the usage message because I didn’t provide the right arguments to the code.
% cat bindit.sh
#!/bin/bash
#echo SLURM_LOCALID=$SLURM_LOCALID
stride=1
if [ ! -z "$SLURM_LOCALID" ]; then
let bindCPU=$SLURM_LOCALID*$stride
exec numactl --membind=0 --physcpubind=$bindCPU $*
fi
$*
%
--
Andy Riebs
Hewlett-Packard Enterprise
High Performance Computing Software Engineering
+1 404 648 9024
My opinions are not necessarily those of HPE
May the source be with you!
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
Andy Riebs
2016-10-27 17:37:57 UTC
Permalink
_______________________________________________
users mailing list
***@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
r***@open-mpi.org
2016-10-27 17:44:54 UTC
Permalink
Sigh - of course it wouldn’t be simple :-(

All right, let’s suppose we look for SLURM_CPU_BIND:

* if it includes the word “none”, then we know the user specified that they don’t want us to bind

* if it includes the word mask_cpu, then we have to check the value of that option.

* If it is all F’s, then they didn’t specify a binding and we should do our thing.

* If it is anything else, then we assume they _did_ specify a binding, and we leave it alone

Would that make sense? Is there anything else that could be in that envar which would trip us up?
$ srun --ntasks-per-node=2 -N1 env | grep BIND | sort -u
SLURM_CPU_BIND_LIST=0xFFFF
SLURM_CPU_BIND=quiet,mask_cpu:0xFFFF
SLURM_CPU_BIND_VERBOSE=quiet
SelectType = select/cons_res
SelectTypeParameters = CR_CPU
Post by r***@open-mpi.org
And if there is no --cpu_bind on the cmd line? Do these not exist?
Post by Andy Riebs
Hi Ralph,
I think I've found the magic keys...
$ srun --ntasks-per-node=2 -N1 --cpu_bind=none env | grep BIND
SLURM_CPU_BIND_VERBOSE=quiet
SLURM_CPU_BIND_TYPE=none
SLURM_CPU_BIND_LIST=
SLURM_CPU_BIND=quiet,none
SLURM_CPU_BIND_VERBOSE=quiet
SLURM_CPU_BIND_TYPE=none
SLURM_CPU_BIND_LIST=
SLURM_CPU_BIND=quiet,none
$ srun --ntasks-per-node=2 -N1 --cpu_bind=core env | grep BIND
SLURM_CPU_BIND_VERBOSE=quiet
SLURM_CPU_BIND_LIST=0x1111,0x2222
SLURM_CPU_BIND=quiet,mask_cpu:0x1111,0x2222
SLURM_CPU_BIND_VERBOSE=quiet
SLURM_CPU_BIND_LIST=0x1111,0x2222
SLURM_CPU_BIND=quiet,mask_cpu:0x1111,0x2222
Andy
Post by r***@open-mpi.org
Hey Andy
Is there a SLURM envar that would tell us the binding option from the srun cmd line? We automatically bind when direct launched due to user complaints of poor performance if we don’t. If the user specifies a binding option, then we detect that we were already bound and don’t do it.
However, if the user specifies that they not be bound, then we think they simply didn’t specify anything - and that isn’t the case. If we can see something that tells us “they explicitly said not to do it”, then we can avoid the situation.
Ralph
Post by Andy Riebs
Hi All,
We are running Open MPI version 1.10.2, built with support for Slurm version 16.05.0. When a user specifies "--cpu_bind=none", MPI tries to bind by core, which segv's if there are more processes than cores.
What I found is that
% srun --ntasks-per-node=8 --cpu_bind=none \
env SHMEM_SYMMETRIC_HEAP_SIZE=1024M bin/all2all.shmem.exe 0
% srun --ntasks-per-node=8 --cpu_bind=none \
env SHMEM_SYMMETRIC_HEAP_SIZE=1024M ./bindit.sh bin/all2all.shmem.exe 0
Will run as expected and print out the usage message because I didn’t provide the right arguments to the code.
% cat bindit.sh
#!/bin/bash
#echo SLURM_LOCALID=$SLURM_LOCALID
stride=1
if [ ! -z "$SLURM_LOCALID" ]; then
let bindCPU=$SLURM_LOCALID*$stride
exec numactl --membind=0 --physcpubind=$bindCPU $*
fi
$*
%
--
Andy Riebs
Hewlett-Packard Enterprise
High Performance Computing Software Engineering
+1 404 648 9024
My opinions are not necessarily those of HPE
May the source be with you!
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Andy Riebs
2016-10-27 17:53:20 UTC
Permalink
_______________________________________________
users mailing list
***@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Riebs, Andy
2016-11-01 18:38:27 UTC
Permalink
To close the thread here
 I got the following information:


Looking at SLURM_CPU_BIND is the right idea, but there are quite a few more options. It misses map_cpu, rank, plus the NUMA-based options:

rank_ldom, map_ldom, and mask_ldom. See the srun man pages for documentation.


From: Riebs, Andy
Sent: Thursday, October 27, 2016 1:53 PM
To: ***@lists.open-mpi.org
Subject: Re: [OMPI users] Slurm binding not propagated to MPI jobs


Hi Ralph,

I haven't played around in this code, so I'll flip the question over to the Slurm list, and report back here when I learn anything.
Cheers
Andy
On 10/27/2016 01:44 PM, ***@open-mpi.org<mailto:***@open-mpi.org> wrote:
Sigh - of course it wouldn’t be simple :-(

All right, let’s suppose we look for SLURM_CPU_BIND:

* if it includes the word “none”, then we know the user specified that they don’t want us to bind

* if it includes the word mask_cpu, then we have to check the value of that option.

* If it is all F’s, then they didn’t specify a binding and we should do our thing.

* If it is anything else, then we assume they _did_ specify a binding, and we leave it alone

Would that make sense? Is there anything else that could be in that envar which would trip us up?


On Oct 27, 2016, at 10:37 AM, Andy Riebs <***@hpe.com<mailto:***@hpe.com>> wrote:

Yes, they still exist:
$ srun --ntasks-per-node=2 -N1 env | grep BIND | sort -u
SLURM_CPU_BIND_LIST=0xFFFF
SLURM_CPU_BIND=quiet,mask_cpu:0xFFFF
SLURM_CPU_BIND_TYPE=mask_cpu:
SLURM_CPU_BIND_VERBOSE=quiet
Here are the relevant Slurm configuration options that could conceivably change the behavior from system to system:
SelectType = select/cons_res
SelectTypeParameters = CR_CPU

On 10/27/2016 01:17 PM, ***@open-mpi.org<mailto:***@open-mpi.org> wrote:
And if there is no --cpu_bind on the cmd line? Do these not exist?

On Oct 27, 2016, at 10:14 AM, Andy Riebs <***@hpe.com<mailto:***@hpe.com>> wrote:

Hi Ralph,

I think I've found the magic keys...

$ srun --ntasks-per-node=2 -N1 --cpu_bind=none env | grep BIND
SLURM_CPU_BIND_VERBOSE=quiet
SLURM_CPU_BIND_TYPE=none
SLURM_CPU_BIND_LIST=
SLURM_CPU_BIND=quiet,none
SLURM_CPU_BIND_VERBOSE=quiet
SLURM_CPU_BIND_TYPE=none
SLURM_CPU_BIND_LIST=
SLURM_CPU_BIND=quiet,none
$ srun --ntasks-per-node=2 -N1 --cpu_bind=core env | grep BIND
SLURM_CPU_BIND_VERBOSE=quiet
SLURM_CPU_BIND_TYPE=mask_cpu:
SLURM_CPU_BIND_LIST=0x1111,0x2222
SLURM_CPU_BIND=quiet,mask_cpu:0x1111,0x2222
SLURM_CPU_BIND_VERBOSE=quiet
SLURM_CPU_BIND_TYPE=mask_cpu:
SLURM_CPU_BIND_LIST=0x1111,0x2222
SLURM_CPU_BIND=quiet,mask_cpu:0x1111,0x2222

Andy

On 10/27/2016 11:57 AM, ***@open-mpi.org<mailto:***@open-mpi.org> wrote:

Hey Andy

Is there a SLURM envar that would tell us the binding option from the srun cmd line? We automatically bind when direct launched due to user complaints of poor performance if we don’t. If the user specifies a binding option, then we detect that we were already bound and don’t do it.

However, if the user specifies that they not be bound, then we think they simply didn’t specify anything - and that isn’t the case. If we can see something that tells us “they explicitly said not to do it”, then we can avoid the situation.

Ralph


On Oct 27, 2016, at 8:48 AM, Andy Riebs <***@hpe.com<mailto:***@hpe.com>> wrote:

Hi All,

We are running Open MPI version 1.10.2, built with support for Slurm version 16.05.0. When a user specifies "--cpu_bind=none", MPI tries to bind by core, which segv's if there are more processes than cores.

The user reports:

What I found is that

% srun --ntasks-per-node=8 --cpu_bind=none \
env SHMEM_SYMMETRIC_HEAP_SIZE=1024M bin/all2all.shmem.exe 0

will have the problem, but:

% srun --ntasks-per-node=8 --cpu_bind=none \
env SHMEM_SYMMETRIC_HEAP_SIZE=1024M ./bindit.sh bin/all2all.shmem.exe 0

Will run as expected and print out the usage message because I didn’t provide the right arguments to the code.

So, it appears that the binding has something to do with the issue. My binding script is as follows:

% cat bindit.sh
#!/bin/bash

#echo SLURM_LOCALID=$SLURM_LOCALID

stride=1

if [ ! -z "$SLURM_LOCALID" ]; then
let bindCPU=$SLURM_LOCALID*$stride
exec numactl --membind=0 --physcpubind=$bindCPU $*
fi

$*

%
--
Andy Riebs
***@hpe.com<mailto:***@hpe.com>
Hewlett-Packard Enterprise
High Performance Computing Software Engineering
+1 404 648 9024
My opinions are not necessarily those of HPE
May the source be with you!

_______________________________________________
users mailing list
***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/users





_______________________________________________

users mailing list

***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>

https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/users





_______________________________________________

users mailing list

***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>

https://rfd.newmexicoconsortium.org/mailman/listinfo/users
r***@open-mpi.org
2016-11-02 03:36:54 UTC
Permalink
Ah crumby!! We already solved this on master, but it cannot be backported to the 1.10 series without considerable pain. For some reason, the support for it has been removed from the 2.x series as well. I’ll try to resolve that issue and get the support reinstated there (probably not until 2.1).

Can you manage until then? I think the v2 RM’s are thinking Dec/Jan for 2.1.
Ralph
Post by Riebs, Andy
rank_ldom, map_ldom, and mask_ldom. See the srun man pages for documentation.
From: Riebs, Andy
Sent: Thursday, October 27, 2016 1:53 PM
Subject: Re: [OMPI users] Slurm binding not propagated to MPI jobs
Hi Ralph,
I haven't played around in this code, so I'll flip the question over to the Slurm list, and report back here when I learn anything.
Cheers
Andy
Sigh - of course it wouldn’t be simple :-(
* if it includes the word “none”, then we know the user specified that they don’t want us to bind
* if it includes the word mask_cpu, then we have to check the value of that option.
* If it is all F’s, then they didn’t specify a binding and we should do our thing.
* If it is anything else, then we assume they _did_ specify a binding, and we leave it alone
Would that make sense? Is there anything else that could be in that envar which would trip us up?
$ srun --ntasks-per-node=2 -N1 env | grep BIND | sort -u
SLURM_CPU_BIND_LIST=0xFFFF
SLURM_CPU_BIND=quiet,mask_cpu:0xFFFF
SLURM_CPU_BIND_VERBOSE=quiet
SelectType = select/cons_res
SelectTypeParameters = CR_CPU
And if there is no --cpu_bind on the cmd line? Do these not exist?
Hi Ralph,
I think I've found the magic keys...
$ srun --ntasks-per-node=2 -N1 --cpu_bind=none env | grep BIND
SLURM_CPU_BIND_VERBOSE=quiet
SLURM_CPU_BIND_TYPE=none
SLURM_CPU_BIND_LIST=
SLURM_CPU_BIND=quiet,none
SLURM_CPU_BIND_VERBOSE=quiet
SLURM_CPU_BIND_TYPE=none
SLURM_CPU_BIND_LIST=
SLURM_CPU_BIND=quiet,none
$ srun --ntasks-per-node=2 -N1 --cpu_bind=core env | grep BIND
SLURM_CPU_BIND_VERBOSE=quiet
SLURM_CPU_BIND_LIST=0x1111,0x2222
SLURM_CPU_BIND=quiet,mask_cpu:0x1111,0x2222
SLURM_CPU_BIND_VERBOSE=quiet
SLURM_CPU_BIND_LIST=0x1111,0x2222
SLURM_CPU_BIND=quiet,mask_cpu:0x1111,0x2222
Andy
Hey Andy
Is there a SLURM envar that would tell us the binding option from the srun cmd line? We automatically bind when direct launched due to user complaints of poor performance if we don’t. If the user specifies a binding option, then we detect that we were already bound and don’t do it.
However, if the user specifies that they not be bound, then we think they simply didn’t specify anything - and that isn’t the case. If we can see something that tells us “they explicitly said not to do it”, then we can avoid the situation.
Ralph
Hi All,
We are running Open MPI version 1.10.2, built with support for Slurm version 16.05.0. When a user specifies "--cpu_bind=none", MPI tries to bind by core, which segv's if there are more processes than cores.
What I found is that
% srun --ntasks-per-node=8 --cpu_bind=none \
env SHMEM_SYMMETRIC_HEAP_SIZE=1024M bin/all2all.shmem.exe 0
% srun --ntasks-per-node=8 --cpu_bind=none \
env SHMEM_SYMMETRIC_HEAP_SIZE=1024M ./bindit.sh bin/all2all.shmem.exe 0
Will run as expected and print out the usage message because I didn’t provide the right arguments to the code.
% cat bindit.sh
#!/bin/bash
#echo SLURM_LOCALID=$SLURM_LOCALID
stride=1
if [ ! -z "$SLURM_LOCALID" ]; then
let bindCPU=$SLURM_LOCALID*$stride
exec numactl --membind=0 --physcpubind=$bindCPU $*
fi
$*
%
--
Andy Riebs
Hewlett-Packard Enterprise
High Performance Computing Software Engineering
+1 404 648 9024
My opinions are not necessarily those of HPE
May the source be with you!
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Andy Riebs
2016-11-03 16:48:34 UTC
Permalink
_______________________________________________
users mailing list
***@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
r***@open-mpi.org
2016-11-04 21:18:51 UTC
Permalink
See https://github.com/open-mpi/ompi/pull/2365 <https://github.com/open-mpi/ompi/pull/2365>

Let me know if that solves it for you
Getting that support into 2.1 would be terrific -- and might save us from having to write some Slurm prolog scripts to effect that.
Thanks Ralph!
Ah crumby!! We already solved this on master, but it cannot be backported to the 1.10 series without considerable pain. For some reason, the support for it has been removed from the 2.x series as well. I’ll try to resolve that issue and get the support reinstated there (probably not until 2.1).
Can you manage until then? I think the v2 RM’s are thinking Dec/Jan for 2.1.
Ralph
Post by Riebs, Andy
rank_ldom, map_ldom, and mask_ldom. See the srun man pages for documentation.
From: Riebs, Andy
Sent: Thursday, October 27, 2016 1:53 PM
Subject: Re: [OMPI users] Slurm binding not propagated to MPI jobs
Hi Ralph,
I haven't played around in this code, so I'll flip the question over to the Slurm list, and report back here when I learn anything.
Cheers
Andy
Sigh - of course it wouldn’t be simple :-(
* if it includes the word “none”, then we know the user specified that they don’t want us to bind
* if it includes the word mask_cpu, then we have to check the value of that option.
* If it is all F’s, then they didn’t specify a binding and we should do our thing.
* If it is anything else, then we assume they _did_ specify a binding, and we leave it alone
Would that make sense? Is there anything else that could be in that envar which would trip us up?
$ srun --ntasks-per-node=2 -N1 env | grep BIND | sort -u
SLURM_CPU_BIND_LIST=0xFFFF
SLURM_CPU_BIND=quiet,mask_cpu:0xFFFF
SLURM_CPU_BIND_VERBOSE=quiet
SelectType = select/cons_res
SelectTypeParameters = CR_CPU
And if there is no --cpu_bind on the cmd line? Do these not exist?
Hi Ralph,
I think I've found the magic keys...
$ srun --ntasks-per-node=2 -N1 --cpu_bind=none env | grep BIND
SLURM_CPU_BIND_VERBOSE=quiet
SLURM_CPU_BIND_TYPE=none
SLURM_CPU_BIND_LIST=
SLURM_CPU_BIND=quiet,none
SLURM_CPU_BIND_VERBOSE=quiet
SLURM_CPU_BIND_TYPE=none
SLURM_CPU_BIND_LIST=
SLURM_CPU_BIND=quiet,none
$ srun --ntasks-per-node=2 -N1 --cpu_bind=core env | grep BIND
SLURM_CPU_BIND_VERBOSE=quiet
SLURM_CPU_BIND_LIST=0x1111,0x2222
SLURM_CPU_BIND=quiet,mask_cpu:0x1111,0x2222
SLURM_CPU_BIND_VERBOSE=quiet
SLURM_CPU_BIND_LIST=0x1111,0x2222
SLURM_CPU_BIND=quiet,mask_cpu:0x1111,0x2222
Andy
Hey Andy
Is there a SLURM envar that would tell us the binding option from the srun cmd line? We automatically bind when direct launched due to user complaints of poor performance if we donÃƒÂ¢Ã¢â€šÂ¬Ã¢â€şÂ¢t. If the user specifies a binding option, then we detect that we were already bound and donÃƒÂ¢Ã¢â€šÂ¬Ã¢â€şÂ¢t do it.
However, if the user specifies that they not be bound, then we think they simply didnÃƒÂ¢Ã¢â€šÂ¬Ã¢â€şÂ¢t specify anything - and that isnÃƒÂ¢Ã¢â€šÂ¬Ã¢â€şÂ¢t the case. If we can see something that tells us â€Ã “they explicitly said not to do it”, then we can avoid the situation.
Ralph
Hi All,
We are running Open MPI version 1.10.2, built with support for Slurm version 16.05.0. When a user specifies "--cpu_bind=none", MPI tries to bind by core, which segv's if there are more processes than cores.
What I found is that
% srun --ntasks-per-node=8 --cpu_bind=none \
env SHMEM_SYMMETRIC_HEAP_SIZE=1024M bin/all2all.shmem.exe 0
% srun --ntasks-per-node=8 --cpu_bind=none \
env SHMEM_SYMMETRIC_HEAP_SIZE=1024M ./bindit.sh bin/all2all.shmem.exe 0
Will run as expected and print out the usage message because I didnÃƒÂ¢Ã¢â€šÂ¬Ã¢â€şÂ¢t provide the right arguments to the code.
% cat bindit.sh
#!/bin/bash
#echo SLURM_LOCALID=$SLURM_LOCALID
stride=1
if [ ! -z "$SLURM_LOCALID" ]; then
let bindCPU=$SLURM_LOCALID*$stride
exec numactl --membind=0 --physcpubind=$bindCPU $*
fi
$*
%
--
Andy Riebs
Hewlett-Packard Enterprise
High Performance Computing Software Engineering
+1 404 648 9024
My opinions are not necessarily those of HPE
May the source be with you!
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
Loading...