Discussion:
[OMPI users] OpenMPI providing rank?
Yves Caniou
2010-07-28 03:34:49 UTC
Permalink
Hi,

I have some performance issue on a parallel machine composed of nodes of 16
procs each. The application is launched on multiple of 16 procs for given
numbers of nodes.
I was told by people using MX MPI with this machine to attach a script to
mpiexec, which 'numactl' things, in order to make the execution performance
stable.

Looking on the faq (the oldest one is for OpenMPI v1.3?), I saw that maybe the
solution would be for me to use the --mca mpi_paffinity_alone 1
ompi_info | grep affinity
MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.4.2)
MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.4.2)
MCA maffinity: libnuma (MCA v2.0, API v2.0, Component v1.4.2)
Does it handle memory too, or do I have to use another option like
--mca mpi_maffinity 1?

Still, I would like to test the numactl solution. Does OpenMPI provide an
equivalent to $MXMPI_ID which gives at least gives the NODE on which a
process is launched by OpenMPI, so that I can adapt the script that was given
to me?

Tkx.

.Yves.
Nysal Jan
2010-07-28 04:03:21 UTC
Permalink
OMPI_COMM_WORLD_RANK can be used to get the MPI rank. For other environment
variables -
http://www.open-mpi.org/faq/?category=running#mpi-environmental-variables
For processor affinity see this FAQ entry -
http://www.open-mpi.org/faq/?category=all#using-paffinity

--Nysal
Post by Yves Caniou
Hi,
I have some performance issue on a parallel machine composed of nodes of 16
procs each. The application is launched on multiple of 16 procs for given
numbers of nodes.
I was told by people using MX MPI with this machine to attach a script to
mpiexec, which 'numactl' things, in order to make the execution performance
stable.
Looking on the faq (the oldest one is for OpenMPI v1.3?), I saw that maybe the
solution would be for me to use the --mca mpi_paffinity_alone 1
ompi_info | grep affinity
MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.4.2)
MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.4.2)
MCA maffinity: libnuma (MCA v2.0, API v2.0, Component v1.4.2)
Does it handle memory too, or do I have to use another option like
--mca mpi_maffinity 1?
Still, I would like to test the numactl solution. Does OpenMPI provide an
equivalent to $MXMPI_ID which gives at least gives the NODE on which a
process is launched by OpenMPI, so that I can adapt the script that was given
to me?
Tkx.
.Yves.
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
Yves Caniou
2010-07-28 05:18:02 UTC
Permalink
Post by Nysal Jan
OMPI_COMM_WORLD_RANK can be used to get the MPI rank. For other environment
variables -
http://www.open-mpi.org/faq/?category=running#mpi-environmental-variables
Are processes affected to nodes sequentially, so that I can get the NODE
number from $OMPI_COMM_WORLD_RANK modulo the number of proc per node?
Post by Nysal Jan
For processor affinity see this FAQ entry -
http://www.open-mpi.org/faq/?category=all#using-paffinity
Thank you, but that's where I had the information that I put in my previous
mail, so it doesn't answer to my question.

.Yves.
Post by Nysal Jan
--Nysal
Post by Yves Caniou
Hi,
I have some performance issue on a parallel machine composed of nodes of
16 procs each. The application is launched on multiple of 16 procs for
given numbers of nodes.
I was told by people using MX MPI with this machine to attach a script to
mpiexec, which 'numactl' things, in order to make the execution
performance stable.
Looking on the faq (the oldest one is for OpenMPI v1.3?), I saw that maybe the
solution would be for me to use the --mca mpi_paffinity_alone 1
ompi_info | grep affinity
MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.4.2)
MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.4.2)
MCA maffinity: libnuma (MCA v2.0, API v2.0, Component v1.4.2)
Does it handle memory too, or do I have to use another option like
--mca mpi_maffinity 1?
Still, I would like to test the numactl solution. Does OpenMPI provide an
equivalent to $MXMPI_ID which gives at least gives the NODE on which a
process is launched by OpenMPI, so that I can adapt the script that was given
to me?
Tkx.
.Yves.
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Yves Caniou
Associate Professor at Université Lyon 1,
Member of the team project INRIA GRAAL in the LIP ENS-Lyon,
Délégation CNRS in Japan French Laboratory of Informatics (JFLI),
* in Information Technology Center, The University of Tokyo,
2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan
tel: +81-3-5841-0540
* in National Institute of Informatics
2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
tel: +81-3-4212-2412
http://graal.ens-lyon.fr/~ycaniou/
Ralph Castain
2010-07-28 09:34:13 UTC
Permalink
Post by Yves Caniou
Post by Nysal Jan
OMPI_COMM_WORLD_RANK can be used to get the MPI rank. For other environment
variables -
http://www.open-mpi.org/faq/?category=running#mpi-environmental-variables
Are processes affected to nodes sequentially, so that I can get the NODE
number from $OMPI_COMM_WORLD_RANK modulo the number of proc per node?
By default, yes. However, you can select alternative mapping methods.

Or...you could just use the mpirun cmd line option to report the binding of each process as it is started :-)

Do "mpirun -h" to see all the options. The one you want is --report-bindings
Post by Yves Caniou
Post by Nysal Jan
For processor affinity see this FAQ entry -
http://www.open-mpi.org/faq/?category=all#using-paffinity
Thank you, but that's where I had the information that I put in my previous
mail, so it doesn't answer to my question.
Memory affinity is taken care of under-the-covers when paffinity is active. No other options are required.
Post by Yves Caniou
.Yves.
Post by Nysal Jan
--Nysal
Post by Yves Caniou
Hi,
I have some performance issue on a parallel machine composed of nodes of
16 procs each. The application is launched on multiple of 16 procs for
given numbers of nodes.
I was told by people using MX MPI with this machine to attach a script to
mpiexec, which 'numactl' things, in order to make the execution
performance stable.
Looking on the faq (the oldest one is for OpenMPI v1.3?), I saw that maybe the
solution would be for me to use the --mca mpi_paffinity_alone 1
ompi_info | grep affinity
MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.4.2)
MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.4.2)
MCA maffinity: libnuma (MCA v2.0, API v2.0, Component v1.4.2)
Does it handle memory too, or do I have to use another option like
--mca mpi_maffinity 1?
Still, I would like to test the numactl solution. Does OpenMPI provide an
equivalent to $MXMPI_ID which gives at least gives the NODE on which a
process is launched by OpenMPI, so that I can adapt the script that was given
to me?
Tkx.
.Yves.
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Yves Caniou
Associate Professor at Université Lyon 1,
Member of the team project INRIA GRAAL in the LIP ENS-Lyon,
Délégation CNRS in Japan French Laboratory of Informatics (JFLI),
* in Information Technology Center, The University of Tokyo,
2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan
tel: +81-3-5841-0540
* in National Institute of Informatics
2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
tel: +81-3-4212-2412
http://graal.ens-lyon.fr/~ycaniou/
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
Yves Caniou
2010-07-28 12:37:26 UTC
Permalink
Post by Ralph Castain
Post by Yves Caniou
Post by Nysal Jan
OMPI_COMM_WORLD_RANK can be used to get the MPI rank. For other
environment variables -
http://www.open-mpi.org/faq/?category=running#mpi-environmental-variable
s
Are processes affected to nodes sequentially, so that I can get the NODE
number from $OMPI_COMM_WORLD_RANK modulo the number of proc per node?
By default, yes. However, you can select alternative mapping methods.
Or...you could just use the mpirun cmd line option to report the binding of
each process as it is started :-)
Do "mpirun -h" to see all the options. The one you want is
--report-bindings
It reports to stderr, so the $OMPI_COMM_WORLD_RANK modulo the number of proc
per nodes seems more appropriate for what I need, right?

So is the following valid to put memory affinity?

script.sh:
MYRANK=$OMPI_COMM_WORLD_RANK
MYVAL=$(expr $MYRANK / 4)
NODE=$(expr $MYVAL % 4)
numactl --cpunodebind=$NODE --membind=$NODE $@

mpiexec ./script.sh -n 128 myappli myparam
Post by Ralph Castain
Post by Yves Caniou
Post by Nysal Jan
For processor affinity see this FAQ entry -
http://www.open-mpi.org/faq/?category=all#using-paffinity
Thank you, but that's where I had the information that I put in my
previous mail, so it doesn't answer to my question.
Memory affinity is taken care of under-the-covers when paffinity is active.
No other options are required.
Which is better: using this option, or the cmd line with numactl (if it
works)? What is the difference?

Tkx.

.Yves.
Post by Ralph Castain
Post by Yves Caniou
.Yves.
Post by Nysal Jan
--Nysal
On Wed, Jul 28, 2010 at 9:04 AM, Yves Caniou
Post by Yves Caniou
Hi,
I have some performance issue on a parallel machine composed of nodes
of 16 procs each. The application is launched on multiple of 16 procs
for given numbers of nodes.
I was told by people using MX MPI with this machine to attach a script
to mpiexec, which 'numactl' things, in order to make the execution
performance stable.
Looking on the faq (the oldest one is for OpenMPI v1.3?), I saw that maybe the
solution would be for me to use the --mca mpi_paffinity_alone 1
ompi_info | grep affinity
MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.4.2)
MCA maffinity: first_use (MCA v2.0, API v2.0, Component
v1.4.2) MCA maffinity: libnuma (MCA v2.0, API v2.0, Component v1.4.2)
Does it handle memory too, or do I have to use another option like
--mca mpi_maffinity 1?
Still, I would like to test the numactl solution. Does OpenMPI provide
an equivalent to $MXMPI_ID which gives at least gives the NODE on which
a process is launched by OpenMPI, so that I can adapt the script that
was given
to me?
Tkx.
.Yves.
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
Yves Caniou
2010-07-28 13:11:29 UTC
Permalink
I am confused. I thought all you wanted to do is report out the binding of
the process - yes? Are you trying to set the affinity bindings yourself?
If the latter, then your script doesn't do anything that mpirun wouldn't
do, and doesn't do it as well. You would be far better off just adding
--bind-to-core to the mpirun cmd line.
"mpirun -h" says that it is the default, so there is not even something to do?
I don't even have to add "--mca mpi_paffinity_alone 1" ?

.Yves.
Post by Yves Caniou
Post by Ralph Castain
Post by Yves Caniou
Post by Nysal Jan
OMPI_COMM_WORLD_RANK can be used to get the MPI rank. For other
environment variables -
http://www.open-mpi.org/faq/?category=running#mpi-environmental-variab
le s
Are processes affected to nodes sequentially, so that I can get the
NODE number from $OMPI_COMM_WORLD_RANK modulo the number of proc per
node?
By default, yes. However, you can select alternative mapping methods.
Or...you could just use the mpirun cmd line option to report the binding
of each process as it is started :-)
Do "mpirun -h" to see all the options. The one you want is
--report-bindings
It reports to stderr, so the $OMPI_COMM_WORLD_RANK modulo the number of
proc per nodes seems more appropriate for what I need, right?
So is the following valid to put memory affinity?
MYRANK=$OMPI_COMM_WORLD_RANK
MYVAL=$(expr $MYRANK / 4)
NODE=$(expr $MYVAL % 4)
mpiexec ./script.sh -n 128 myappli myparam
Post by Ralph Castain
Post by Yves Caniou
Post by Nysal Jan
For processor affinity see this FAQ entry -
http://www.open-mpi.org/faq/?category=all#using-paffinity
Thank you, but that's where I had the information that I put in my
previous mail, so it doesn't answer to my question.
Memory affinity is taken care of under-the-covers when paffinity is
active. No other options are required.
Which is better: using this option, or the cmd line with numactl (if it
works)? What is the difference?
Tkx.
.Yves.
Post by Ralph Castain
Post by Yves Caniou
.Yves.
Post by Nysal Jan
--Nysal
On Wed, Jul 28, 2010 at 9:04 AM, Yves Caniou
Post by Yves Caniou
Hi,
I have some performance issue on a parallel machine composed of nodes
of 16 procs each. The application is launched on multiple of 16 procs
for given numbers of nodes.
I was told by people using MX MPI with this machine to attach a
script to mpiexec, which 'numactl' things, in order to make the
execution performance stable.
Looking on the faq (the oldest one is for OpenMPI v1.3?), I saw that maybe the
solution would be for me to use the --mca mpi_paffinity_alone 1
ompi_info | grep affinity
MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.4.2)
MCA maffinity: first_use (MCA v2.0, API v2.0, Component
v1.4.2) MCA maffinity: libnuma (MCA v2.0, API v2.0, Component v1.4.2)
Does it handle memory too, or do I have to use another option like
--mca mpi_maffinity 1?
Still, I would like to test the numactl solution. Does OpenMPI
provide an equivalent to $MXMPI_ID which gives at least gives the
NODE on which a process is launched by OpenMPI, so that I can adapt
the script that was given
to me?
Tkx.
.Yves.
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
Eugene Loh
2010-08-01 05:17:34 UTC
Permalink
_______________________________________________
users mailing list
***@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
Aurélien Bouteiller
2010-08-01 22:14:42 UTC
Permalink
Yves,

In Open MPI you can have a very fine control over how the deployment is bound to the cores. For more information, please refer to the faq concerning the rankfile description (in a rankfile you can specify very precisely what rank goes on what physical PU). For a more single shot option, you can use the --slot-list option with the -nperproc option, to specify the order in which your ranks are deployed on physical PU.

Dr. Aurelien Bouteiller
Innovative Computing Laboratory, The University of Tennessee
Post by Yves Caniou
I am confused. I thought all you wanted to do is report out the binding of
the process - yes? Are you trying to set the affinity bindings yourself?
If the latter, then your script doesn't do anything that mpirun wouldn't
do, and doesn't do it as well. You would be far better off just adding
--bind-to-core to the mpirun cmd line.
"mpirun -h" says that it is the default, so there is not even something to do?
I don't even have to add "--mca mpi_paffinity_alone 1" ?
Wow. I just tried "mpirun -h" and, yes, it claims that "--bind-to-core" is the default. I believe this is wrong... or at least "misleading." :^) You should specify --bind-to-core explicitly. It is the successor to paffinity. Do add --report-bindings to check what you're getting.
Post by Yves Caniou
Post by Yves Caniou
Post by Ralph Castain
Post by Yves Caniou
Post by Nysal Jan
OMPI_COMM_WORLD_RANK can be used to get the MPI rank.
Are processes affected to nodes sequentially, so that I can get the
NODE number from $OMPI_COMM_WORLD_RANK modulo the number of proc per
node?
By default, yes. However, you can select alternative mapping methods.
It reports to stderr, so the $OMPI_COMM_WORLD_RANK modulo the number of
proc per nodes seems more appropriate for what I need, right?
So is the following valid to put memory affinity?
MYRANK=$OMPI_COMM_WORLD_RANK
MYVAL=$(expr $MYRANK / 4)
NODE=$(expr $MYVAL % 4)
mpiexec ./script.sh -n 128 myappli myparam
Another option is to use OMPI_COMM_WORLD_LOCAL_RANK. This environment variable directly gives you the value you're looking for, regardless of how process ranks are mapped to the nodes.
Post by Yves Caniou
Post by Yves Caniou
Which is better: using this option, or the cmd line with numactl (if it
works)? What is the difference?
*) Different MPI implementations use different mechanisms for specifying binding. So, if you want your solution to be "portable"... well, if you want that, you're out of luck. But, perhaps some mechanisms (command-line arguments, run-time scripts, etc.) might seem easier for you to adapt than others.
*) Some mechanisms bind processes at process launch time and some at MPI_Init time. The former might be better. Otherwise, a process might place some NUMA memory in a location before MPI_Init and then be moved away from that memory when MPI_Init is encountered. I believe both the numactl and OMPI --bind-to-core mechanisms have this characteristic. (OMPI's older paffinity might not, but I don't remember for sure.)
Mostly, if you're going to use just OMPI, the --bind-to-core command-line argument might be the simplest.
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
Eugene Loh
2010-08-04 20:19:11 UTC
Permalink
_______________________________________________
users mailing list
***@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Loading...