[OMPI users] No more default core binding since 2.0.2?

There has been no change in the policy - however, if you are oversubscribed, we did fix a bug to ensure that we don’t auto-bind in that situation

Can you pass along your cmd line? So far as I can tell, it still seems to be working.

Post by Reuti
Hi,
- Was this a general decision to no longer enable automatic core binding?
- We define plm_rsh_agent=foo in $OMPI_ROOT/etc/openmpi-mca-params.conf
- We compiled with --with-sge
But also started on the command line by `ssh` to the nodes, there seems no automatic core binding to take place any longer.
-- Reuti
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reuti

2017-04-09 20:49:57 UTC

Hi,

Post by r***@open-mpi.org
There has been no change in the policy - however, if you are oversubscribed, we did fix a bug to ensure that we don’t auto-bind in that situation
Can you pass along your cmd line? So far as I can tell, it still seems to be working.

I'm not sure whether it was the case with 1.8, but according to the man page it binds now to sockets for number of processes > 2 . And this can lead the effect that one sometimes may notice a drop in performance when just this socket has other jobs running (by accident).

So, this is solved - I wasn't aware of the binding by socket.

But I can't see a binding by core for number of processes <= 2. Does it mean 2 per node or 2 overall for the `mpiexec`?

- -- Reuti

_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

r***@open-mpi.org

2017-04-09 21:09:59 UTC

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi,

There has been no change in the policy - however, if you are oversubscribed, we did fix a bug to ensure that we donât auto-bind in that situation
Can you pass along your cmd line? So far as I can tell, it still seems to be working.

I'm not sure whether it was the case with 1.8, but according to the man page it binds now to sockets for number of processes > 2 . And this can lead the effect that one sometimes may notice a drop in performance when just this socket has other jobs running (by accident).
So, this is solved - I wasn't aware of the binding by socket.
But I can't see a binding by core for number of processes <= 2. Does it mean 2 per node or 2 overall for the `mpiexec`?

Itâs 2 processes overall

- -- Reuti

_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - https://gpgtools.org <https://gpgtools.org/>
iEYEARECAAYFAljqnnYACgkQo/GbGkBRnRrwtACgpUAlpvQElzbjoVdvsQubZmTo
Pj4An05kJd3pW0YWW4HXaf/7Zl7xTc+y
=kzwG
-----END PGP SIGNATURE-----
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>

Reuti

2017-04-09 22:45:47 UTC

Post by Reuti
But I can't see a binding by core for number of processes <= 2. Does it mean 2 per node or 2 overall for the `mpiexec`?

It’s 2 processes overall

Having a round-robin allocation in the cluster, this might not be what was intended (to bind only one or two cores per exechost)?

Obviously the default changes (from --bind-to core to --bin-to socket), whether I compiled Open MPI with or w/o libnuma (I wanted to get rid of the warning in the output only – now it works). But "--bind-to core" I could also use w/o libnuma and it worked, I got only that warning in addition about the memory couldn't be bound.

BTW: I always had to use -ldl when using `mpicc`. Now, that I compiled in libnuma, this necessity is gone.

-- Reuti

r***@open-mpi.org

2017-04-09 23:58:40 UTC

Let me try to clarify. If you launch a job that has only 1 or 2 processes in it (total), then we bind to core by default. This is done because a job that small is almost always some kind of benchmark.

If there are more than 2 processes in the job (total), then we default to binding to NUMA (if NUMA’s are present - otherwise, to socket) across the entire job.

You can always override these behaviors.

Post by Reuti
But I can't see a binding by core for number of processes <= 2. Does it mean 2 per node or 2 overall for the `mpiexec`?

It’s 2 processes overall

Having a round-robin allocation in the cluster, this might not be what was intended (to bind only one or two cores per exechost)?
Obviously the default changes (from --bind-to core to --bin-to socket), whether I compiled Open MPI with or w/o libnuma (I wanted to get rid of the warning in the output only – now it works). But "--bind-to core" I could also use w/o libnuma and it worked, I got only that warning in addition about the memory couldn't be bound.
BTW: I always had to use -ldl when using `mpicc`. Now, that I compiled in libnuma, this necessity is gone.
-- Reuti
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reuti

2017-04-10 08:37:30 UTC

Post by r***@open-mpi.org
Let me try to clarify. If you launch a job that has only 1 or 2 processes in it (total), then we bind to core by default. This is done because a job that small is almost always some kind of benchmark.

Yes, I see. But only if libnuma was compiled in AFAICS.

Post by r***@open-mpi.org
If there are more than 2 processes in the job (total), then we default to binding to NUMA (if NUMAâs are present - otherwise, to socket) across the entire job.

Mmh - can I spot a difference in --report-bindings between these two? To me both looks like being bound to socket.

-- Reuti

Post by r***@open-mpi.org
You can always override these behaviors.

Post by Reuti
But I can't see a binding by core for number of processes <= 2. Does it mean 2 per node or 2 overall for the `mpiexec`?

Itâs 2 processes overall

Having a round-robin allocation in the cluster, this might not be what was intended (to bind only one or two cores per exechost)?
Obviously the default changes (from --bind-to core to --bin-to socket), whether I compiled Open MPI with or w/o libnuma (I wanted to get rid of the warning in the output only â now it works). But "--bind-to core" I could also use w/o libnuma and it worked, I got only that warning in addition about the memory couldn't be bound.
BTW: I always had to use -ldl when using `mpicc`. Now, that I compiled in libnuma, this necessity is gone.
-- Reuti
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

r***@open-mpi.org

2017-04-10 15:27:34 UTC

Yes, I see. But only if libnuma was compiled in AFAICS.

Post by r***@open-mpi.org
If there are more than 2 processes in the job (total), then we default to binding to NUMA (if NUMAâs are present - otherwise, to socket) across the entire job.

Mmh - can I spot a difference in --report-bindings between these two? To me both looks like being bound to socket.

You wonât see a difference if the NUMA and socket are identical in terms of the cores they cover.

Post by Reuti
-- Reuti

Post by r***@open-mpi.org
You can always override these behaviors.

Post by Reuti
But I can't see a binding by core for number of processes <= 2. Does it mean 2 per node or 2 overall for the `mpiexec`?

Itâs 2 processes overall

Having a round-robin allocation in the cluster, this might not be what was intended (to bind only one or two cores per exechost)?
Obviously the default changes (from --bind-to core to --bin-to socket), whether I compiled Open MPI with or w/o libnuma (I wanted to get rid of the warning in the output only â now it works). But "--bind-to core" I could also use w/o libnuma and it worked, I got only that warning in addition about the memory couldn't be bound.
BTW: I always had to use -ldl when using `mpicc`. Now, that I compiled in libnuma, this necessity is gone.
-- Reuti
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>

Reuti

2017-04-10 15:47:41 UTC

Yes, I see. But only if libnuma was compiled in AFAICS.

Post by r***@open-mpi.org
If there are more than 2 processes in the job (total), then we default to binding to NUMA (if NUMAâs are present - otherwise, to socket) across the entire job.

Mmh - can I spot a difference in --report-bindings between these two? To me both looks like being bound to socket.

You wonât see a difference if the NUMA and socket are identical in terms of the cores they cover.

Ok, thx.

Post by Reuti
-- Reuti

Post by r***@open-mpi.org
You can always override these behaviors.

Post by Reuti
But I can't see a binding by core for number of processes <= 2. Does it mean 2 per node or 2 overall for the `mpiexec`?

Itâs 2 processes overall

Having a round-robin allocation in the cluster, this might not be what was intended (to bind only one or two cores per exechost)?
Obviously the default changes (from --bind-to core to --bin-to socket), whether I compiled Open MPI with or w/o libnuma (I wanted to get rid of the warning in the output only â now it works). But "--bind-to core" I could also use w/o libnuma and it worked, I got only that warning in addition about the memory couldn't be bound.
BTW: I always had to use -ldl when using `mpicc`. Now, that I compiled in libnuma, this necessity is gone.
-- Reuti
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reuti

2017-04-10 12:43:57 UTC