Discussion:
[OMPI users] openmpi 2.1 large messages
Marlborough, Rick
2016-09-29 21:52:52 UTC
Permalink
Folks;
I am attempting to set up a task that sends large messages via MPI_Bcast api. I am finding that small message work ok, anything less then 8000 bytes. Anything more than this then the whole scenario hangs with most of the worker processes pegged at 100% cpu usage. Tried some of the configuration settings from FAQ page, but these did not make a difference. Is there anything else I can try??

Thanks
Rick
Gilles Gouaillardet
2016-09-29 23:58:23 UTC
Permalink
Rick,


can you please provide some more information :

- Open MPI version

- interconnect used

- number of tasks / number of nodes

- does the hang occur in the first MPI_Bcast of 8000 bytes ?


note there is a known issue if you MPI_Bcast with different but matching
signatures

(e.g. some tasks MPI_Bcast 8000 MPI_BYTE, while some other tasks
MPI_Bcast 1 vector of 8000 MPI_BYTE)

you might want to try
mpirun --mca coll ^tuned
and see if it helps


Cheers,

Gilles
Post by Marlborough, Rick
Folks;
I am attempting to set up a task that sends large
messages via MPI_Bcast api. I am finding that small message work ok,
anything less then 8000 bytes. Anything more than this then the whole
scenario hangs with most of the worker processes pegged at 100% cpu
usage. Tried some of the configuration settings from FAQ page, but
these did not make a difference. Is there anything else I can try??
Thanks
Rick
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Marlborough, Rick
2016-09-30 12:38:31 UTC
Permalink
Gilles;
Thanks for your response. The network setup I have here is 20 computers connected over a 1 gig Ethernet lan. The computers are nehalems with 8 cores per. These are 64 bit machines. Not a high performance setup but this is simply a research bed. I am using a host file most of the time with each node configured for 10 slots. However, I see the same behavior if I run just 2 process instances on a single node. 8000 bytes are ok. 9000 bytes hangs. Here is my test code below. Maybe Im not setting this up properly. I just recently installed OpenMPI 2.1 and did not set any configuration flags. The OS we are using is a variation of RedHat 6.5 with 2.6.32 kernel.

Thanks

Rick

#include "mpi.h"
#include <stdio.h>
#include <iostream>
unsigned int bufsize = 9000;
main(int argc, char *argv[]) {
int numtasks, rank, dest, source, rc, count, tag=1;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
char * inmsg;
std::cout << "Calling allocate" << std::endl;
int x = MPI_Alloc_mem(bufsize,MPI_INFO_NULL, &inmsg);
std::cout << "Return code from input buffer allocation is " << x << std::endl;
char * outmsg;
x = MPI_Alloc_mem(bufsize,MPI_INFO_NULL, &outmsg);
std::cout << "Return code from output buffer allocation is " << x << std::endl;
MPI_Status Stat; // required variable for receive routines
printf("Initializing on %d tasks\n",numtasks);
MPI_Barrier(MPI_COMM_WORLD);
if (rank == 0) {
dest = 1;
source = 1;
std::cout << "Root sending" << std::endl;
MPI_Bcast(outmsg,bufsize, MPI_BYTE,rank,MPI_COMM_WORLD);
std::cout << "Root send complete" << std::endl;
}
else if (rank != 0) {
dest = 0;
source = 0;
std::cout << "Task " << rank << " sending." << std::endl;
MPI_Bcast(inmsg,bufsize, MPI_BYTE,rank,MPI_COMM_WORLD);
std::cout << "Task " << rank << " complete." << std::endl;
}
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
}

From: users [mailto:users-***@lists.open-mpi.org] On Behalf Of Gilles Gouaillardet
Sent: Thursday, September 29, 2016 7:58 PM
To: Open MPI Users
Subject: Re: [OMPI users] openmpi 2.1 large messages


Rick,



can you please provide some more information :

- Open MPI version

- interconnect used

- number of tasks / number of nodes

- does the hang occur in the first MPI_Bcast of 8000 bytes ?



note there is a known issue if you MPI_Bcast with different but matching signatures

(e.g. some tasks MPI_Bcast 8000 MPI_BYTE, while some other tasks MPI_Bcast 1 vector of 8000 MPI_BYTE)
you might want to try
mpirun --mca coll ^tuned
and see if it helps


Cheers,

Gilles
On 9/30/2016 6:52 AM, Marlborough, Rick wrote:
Folks;
I am attempting to set up a task that sends large messages via MPI_Bcast api. I am finding that small message work ok, anything less then 8000 bytes. Anything more than this then the whole scenario hangs with most of the worker processes pegged at 100% cpu usage. Tried some of the configuration settings from FAQ page, but these did not make a difference. Is there anything else I can try??

Thanks
Rick




_______________________________________________

users mailing list

***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>

https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Gilles Gouaillardet
2016-09-30 12:54:44 UTC
Permalink
Rick,

You must use the same value for root on all the tasks of the communicator.
So the 4th parameter of MPI_Bcast should be hard-coded 0 instead of rank.

Fwiw, with this test program
If you MPI_Bcast a "small" message, then all your tasks send a message
(that is never received) in eager mode, so MPI_Bcast completes
If you MPI_Bcast a "long" message, then all your tasks send a message in
rendezvous mode, and since no one receives it, MPI_Bcast hangs.

"small" vs "long" depend on the interconnect and some tuning parameters,
that can explain why 9000 bytes do not hang out of the box with an other
Open MPI version.
Bottom line, this test program is not doing what you expected.

Cheers,

Gilles
Post by Marlborough, Rick
Gilles;
Thanks for your response. The network setup I have here is
20 computers connected over a 1 gig Ethernet lan. The computers are
nehalems with 8 cores per. These are 64 bit machines. Not a high
performance setup but this is simply a research bed. I am using a host file
most of the time with each node configured for 10 slots. However, I see the
same behavior if I run just 2 process instances on a single node. 8000
bytes are ok. 9000 bytes hangs. Here is my test code below. Maybe Im not
setting this up properly. I just recently installed OpenMPI 2.1 and did not
set any configuration flags. The OS we are using is a variation of RedHat
6.5 with 2.6.32 kernel.
Thanks
Rick
#include "mpi.h"
#include <stdio.h>
#include <iostream>
unsigned int bufsize = 9000;
main(int argc, char *argv[]) {
int numtasks, rank, dest, source, rc, count, tag=1;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
char * inmsg;
std::cout << "Calling allocate" << std::endl;
int x = MPI_Alloc_mem(bufsize,MPI_INFO_NULL, &inmsg);
std::cout << "Return code from input buffer allocation is
" << x << std::endl;
char * outmsg;
x = MPI_Alloc_mem(bufsize,MPI_INFO_NULL, &outmsg);
std::cout << "Return code from output buffer allocation is
" << x << std::endl;
MPI_Status Stat; // required variable for receive routines
printf("Initializing on %d tasks\n",numtasks);
MPI_Barrier(MPI_COMM_WORLD);
if (rank == 0) {
dest = 1;
source = 1;
std::cout << "Root sending" <<
std::endl;
MPI_Bcast(outmsg,bufsize,
MPI_BYTE,rank,MPI_COMM_WORLD);
std::cout << "Root send complete" << std::endl;
}
else if (rank != 0) {
dest = 0;
source = 0;
std::cout << "Task " << rank << " sending." << std::endl;
MPI_Bcast(inmsg,bufsize,
MPI_BYTE,rank,MPI_COMM_WORLD);
std::cout << "Task " << rank << " complete." << std::endl;
}
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
}
Behalf Of *Gilles Gouaillardet
*Sent:* Thursday, September 29, 2016 7:58 PM
*To:* Open MPI Users
*Subject:* Re: [OMPI users] openmpi 2.1 large messages
Rick,
- Open MPI version
- interconnect used
- number of tasks / number of nodes
- does the hang occur in the first MPI_Bcast of 8000 bytes ?
note there is a known issue if you MPI_Bcast with different but matching signatures
(e.g. some tasks MPI_Bcast 8000 MPI_BYTE, while some other tasks MPI_Bcast
1 vector of 8000 MPI_BYTE)
you might want to try
mpirun --mca coll ^tuned
and see if it helps
Cheers,
Gilles
Folks;
I am attempting to set up a task that sends large messages
via MPI_Bcast api. I am finding that small message work ok, anything less
then 8000 bytes. Anything more than this then the whole scenario hangs with
most of the worker processes pegged at 100% cpu usage. Tried some of the
configuration settings from FAQ page, but these did not make a difference.
Is there anything else I can try??
Thanks
Rick
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Marlborough, Rick
2016-09-30 13:12:02 UTC
Permalink
Gilles;
It works now. Thanks for pointing that out!

Rick

From: users [mailto:users-***@lists.open-mpi.org] On Behalf Of Gilles Gouaillardet
Sent: Friday, September 30, 2016 8:55 AM
To: Open MPI Users
Subject: Re: [OMPI users] openmpi 2.1 large messages

Rick,

You must use the same value for root on all the tasks of the communicator.
So the 4th parameter of MPI_Bcast should be hard-coded 0 instead of rank.

Fwiw, with this test program
If you MPI_Bcast a "small" message, then all your tasks send a message (that is never received) in eager mode, so MPI_Bcast completes
If you MPI_Bcast a "long" message, then all your tasks send a message in rendezvous mode, and since no one receives it, MPI_Bcast hangs.

"small" vs "long" depend on the interconnect and some tuning parameters, that can explain why 9000 bytes do not hang out of the box with an other Open MPI version.
Bottom line, this test program is not doing what you expected.

Cheers,

Gilles

On Friday, September 30, 2016, Marlborough, Rick <***@aaccorp.com<mailto:***@aaccorp.com>> wrote:
Gilles;
Thanks for your response. The network setup I have here is 20 computers connected over a 1 gig Ethernet lan. The computers are nehalems with 8 cores per. These are 64 bit machines. Not a high performance setup but this is simply a research bed. I am using a host file most of the time with each node configured for 10 slots. However, I see the same behavior if I run just 2 process instances on a single node. 8000 bytes are ok. 9000 bytes hangs. Here is my test code below. Maybe Im not setting this up properly. I just recently installed OpenMPI 2.1 and did not set any configuration flags. The OS we are using is a variation of RedHat 6.5 with 2.6.32 kernel.

Thanks

Rick

#include "mpi.h"
#include <stdio.h>
#include <iostream>
unsigned int bufsize = 9000;
main(int argc, char *argv[]) {
int numtasks, rank, dest, source, rc, count, tag=1;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
char * inmsg;
std::cout << "Calling allocate" << std::endl;
int x = MPI_Alloc_mem(bufsize,MPI_INFO_NULL, &inmsg);
std::cout << "Return code from input buffer allocation is " << x << std::endl;
char * outmsg;
x = MPI_Alloc_mem(bufsize,MPI_INFO_NULL, &outmsg);
std::cout << "Return code from output buffer allocation is " << x << std::endl;
MPI_Status Stat; // required variable for receive routines
printf("Initializing on %d tasks\n",numtasks);
MPI_Barrier(MPI_COMM_WORLD);
if (rank == 0) {
dest = 1;
source = 1;
std::cout << "Root sending" << std::endl;
MPI_Bcast(outmsg,bufsize, MPI_BYTE,rank,MPI_COMM_WORLD);
std::cout << "Root send complete" << std::endl;
}
else if (rank != 0) {
dest = 0;
source = 0;
std::cout << "Task " << rank << " sending." << std::endl;
MPI_Bcast(inmsg,bufsize, MPI_BYTE,rank,MPI_COMM_WORLD);
std::cout << "Task " << rank << " complete." << std::endl;
}
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
}

From: users [mailto:users-***@lists.open-mpi.org<javascript:_e(%7B%7D,'cvml','users-***@lists.open-mpi.org');>] On Behalf Of Gilles Gouaillardet
Sent: Thursday, September 29, 2016 7:58 PM
To: Open MPI Users
Subject: Re: [OMPI users] openmpi 2.1 large messages


Rick,



can you please provide some more information :

- Open MPI version

- interconnect used

- number of tasks / number of nodes

- does the hang occur in the first MPI_Bcast of 8000 bytes ?



note there is a known issue if you MPI_Bcast with different but matching signatures

(e.g. some tasks MPI_Bcast 8000 MPI_BYTE, while some other tasks MPI_Bcast 1 vector of 8000 MPI_BYTE)
you might want to try
mpirun --mca coll ^tuned
and see if it helps


Cheers,

Gilles
On 9/30/2016 6:52 AM, Marlborough, Rick wrote:
Folks;
I am attempting to set up a task that sends large messages via MPI_Bcast api. I am finding that small message work ok, anything less then 8000 bytes. Anything more than this then the whole scenario hangs with most of the worker processes pegged at 100% cpu usage. Tried some of the configuration settings from FAQ page, but these did not make a difference. Is there anything else I can try??

Thanks
Rick



_______________________________________________

users mailing list

***@lists.open-mpi.org<javascript:_e(%7B%7D,'cvml','***@lists.open-mpi.org');>

https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Loading...