Jonathan Wesley Stone
2010-03-07 21:17:33 UTC
Hi,
My supercomputer has OpenMPI 1.4. I am running into a frustrating
problem with my MPI program. I am using only the following calls,
which I expect to be blocking:
MPI_Wtime
MPI_Error_string
MPI_Abort
MPI_Send
MPI_Get_count
MPI_Recv
MPI_Probe
MPI_Init
MPI_Comm_rank
MPI_Comm_size
MPI_Finalize
Somehow I am getting this error when I do a large number of sequential
communications: "c002:2.0.Exhausted 1048576 MQ irecv request
descriptors, which usually indicates a user program error or
insufficient request descriptors (PSM_MQ_RECVREQS_MAX=1048576)"
This seems counter-intuitive to me because I don't think I should be
using irecvs since I am wanting specifically to rely on the documented
blocking behavior of MPI_Recv (not MPI_Irecv, which I am not using).
My main program is quite large, however I have managed to replicate
the irritating behavior in this much smaller program, which executes a
number of MPI_Send or MPI_Recv calls in a loop. The program's default
behaviour is to run 2,000,000 iterations. When I turn it up to
20,000,000, after a short time it generates the PSM_MQ_RECVREQS_MAX
exception.
I would appreciate if anyone could advise why it might be happening in
this "test" case -- basically what is going on that causes my
presumably blocking MPI_Recv calls to "accumulate" such a large number
of "irecv request descriptors", when I expect they should be blocking
and get immediately resolved and the count should go down when the
matching MPI_Send is posted.
I appreciate your assistance. Thank you!
Jonathan Stone
Research Assistant, U. Oklahoma
My supercomputer has OpenMPI 1.4. I am running into a frustrating
problem with my MPI program. I am using only the following calls,
which I expect to be blocking:
MPI_Wtime
MPI_Error_string
MPI_Abort
MPI_Send
MPI_Get_count
MPI_Recv
MPI_Probe
MPI_Init
MPI_Comm_rank
MPI_Comm_size
MPI_Finalize
Somehow I am getting this error when I do a large number of sequential
communications: "c002:2.0.Exhausted 1048576 MQ irecv request
descriptors, which usually indicates a user program error or
insufficient request descriptors (PSM_MQ_RECVREQS_MAX=1048576)"
This seems counter-intuitive to me because I don't think I should be
using irecvs since I am wanting specifically to rely on the documented
blocking behavior of MPI_Recv (not MPI_Irecv, which I am not using).
My main program is quite large, however I have managed to replicate
the irritating behavior in this much smaller program, which executes a
number of MPI_Send or MPI_Recv calls in a loop. The program's default
behaviour is to run 2,000,000 iterations. When I turn it up to
20,000,000, after a short time it generates the PSM_MQ_RECVREQS_MAX
exception.
I would appreciate if anyone could advise why it might be happening in
this "test" case -- basically what is going on that causes my
presumably blocking MPI_Recv calls to "accumulate" such a large number
of "irecv request descriptors", when I expect they should be blocking
and get immediately resolved and the count should go down when the
matching MPI_Send is posted.
I appreciate your assistance. Thank you!
Jonathan Stone
Research Assistant, U. Oklahoma