Discussion:
[OMPI users] Form an intercommunicator across threads?
Clune, Thomas L. (GSFC-6101)
2017-05-15 14:22:30 UTC
Permalink
I am trying to craft a client-server layer that needs to have 2 different modes of operation. In the “remote server” mode, then the server runs on distinct processes, and intercommunicator is a perfect fit for my design. In the “local server” the server will actually run on a dedicate thread within the same pool of processes that are running the client. What I would like is some analog of an intercommunicator that would operate between two comms that are essentially identical (but distinct due to use of MPI_Comm_dup()), with one running on thread 0 and the other running on thread 1.

Now, I know the language in the standard sounds like it explicitly forbids such overlapping groups:

"Overlap of local and remote groups that are bound into an inter-communicator is prohibited.”

But there is then that hint of wiggle room with the parenthetical:


“(If a process is multithreaded, and MPI calls block only a thread, rather than a process, then “dual membership” can be supported. It is then the user’s responsibility to make sure that calls on behalf of the two “roles” of a process are executed by two independent threads.)"

This sounds very much like what I want to do, but the following example code crashes (apologies for Fortran source). It is intended to be run on two threads - each of which creates a dup of MPI_COMM_WORLD. The intercomm create would then have the “other” comm as the remote for each thread. I’m not wed to the details here. Anything that would support the notion of an MPI based server running on extra threads would be a potential solution. I know that in the worst case, I could achieve the same by oversubscribing processes on my nodes, but that pushes some issues in our batch processing system that I’d prefer to avoid. A definitive “Can’t be done. Move along.” would also be useful.

The following fails with OpenMPI 2.1 on OS X 10.12:

program main
use mpi
implicit none

integer, external :: omp_get_thread_num
integer :: ierror
integer :: comms(0:1)
integer :: comm
integer :: tag = 10
integer :: intercomm
integer :: provided
integer :: i

call MPI_Init_thread(MPI_THREAD_MULTIPLE, provided, ierror)
call MPI_Comm_dup(MPI_COMM_WORLD, comm, ierror)

!$omp parallel num_threads(2) default(none), private(i, intercomm, ierror), shared(tag, comms, comm)
i = omp_get_thread_num()

if (i == 0) then
call MPI_Comm_dup(MPI_COMM_WORLD, comms(i), ierror)
end if
!$omp barrier
if (i == 1) then
call MPI_Comm_dup(MPI_COMM_WORLD, comms(i), ierror)
end if
!$omp barrier
call MPI_Intercomm_create(comms(i), 0, comm, 0, tag, intercomm, ierror)

!$omp end parallel

stop
call MPI_Finalize(ierror)
end program main


Thanks in advance.

- Tom
George Bosilca
2017-05-15 16:47:10 UTC
Permalink
A process or rank is not allowed to participate multiple times in the same
group (at least not in the current version of the MPI standard). The
sentence about "dual membership" you pointed out makes sense only for
inter-communicators (and the paragraph where the sentence is located
clearly talks about local and remote groups). Moreover, and this is
something the MPI standard is not clear about, such a sentence only makes
sense when the local_leader of the 2 groups is not the same process.

The problem with your approach is that you have in the peer_comm the same
process as being the leader of the 2 groups. In the implementation of the
MPI_Intercomm_create there is a need for a handshake between the 2 leaders
to exchange the sizes of their respective groups, and this handshake is
using the TAG you provided. The problem (at least in Open MPI) is that the
handshake is implemented as (Irecv + Send + Wait) with the same rank on the
peer_comm. Obviously such a communication pattern will not successfully
exchange the data between the peers in most cases (in the same communicator
the matching is guaranteed to be in FIFO order by the MPI standard).

So the complete answer is: It can be done but not in the context you are
trying to achieve.

George.



On Mon, May 15, 2017 at 10:22 AM, Clune, Thomas L. (GSFC-6101) <
Post by Clune, Thomas L. (GSFC-6101)
I am trying to craft a client-server layer that needs to have 2 different
modes of operation. In the “remote server” mode, then the server runs on
distinct processes, and intercommunicator is a perfect fit for my design.
In the “local server” the server will actually run on a dedicate thread
within the same pool of processes that are running the client. What I
would like is some analog of an intercommunicator that would operate
between two comms that are essentially identical (but distinct due to use
of MPI_Comm_dup()), with one running on thread 0 and the other running on
thread 1.
Now, I know the language in the standard sounds like it explicitly forbids
"Overlap of local and remote groups that are bound into an
inter-communicator is prohibited.”
“(If a process is multithreaded, and MPI calls block only a thread,
rather than a process, then “dual membership” can be supported. It is
then the user’s responsibility to make sure that calls on behalf of the two
“roles” of a process are executed by two independent threads.)"
This sounds very much like what I want to do, but the following example
code crashes (apologies for Fortran source). It is intended to be run on
two threads - each of which creates a dup of MPI_COMM_WORLD. The
intercomm create would then have the “other” comm as the remote for each
thread. I’m not wed to the details here. Anything that would
support the notion of an MPI based server running on extra threads would be
a potential solution. I know that in the worst case, I could achieve the
same by oversubscribing processes on my nodes, but that pushes some issues
in our batch processing system that I’d prefer to avoid. A definitive “
Can’t be done. Move along.” would also be useful.
program main
use mpi
implicit none
integer, external :: omp_get_thread_num
integer :: ierror
integer :: comms(0:1)
integer :: comm
integer :: tag = 10
integer :: intercomm
integer :: provided
integer :: i
call MPI_Init_thread(MPI_THREAD_MULTIPLE, provided, ierror)
call MPI_Comm_dup(MPI_COMM_WORLD, comm, ierror)
!$omp parallel num_threads(2) default(none), private(i, intercomm,
ierror), shared(tag, comms, comm)
i = omp_get_thread_num()
if (i == 0) then
call MPI_Comm_dup(MPI_COMM_WORLD, comms(i), ierror)
end if
!$omp barrier
if (i == 1) then
call MPI_Comm_dup(MPI_COMM_WORLD, comms(i), ierror)
end if
!$omp barrier
call MPI_Intercomm_create(comms(i), 0, comm, 0, tag, intercomm, ierror)
!$omp end parallel
stop
call MPI_Finalize(ierror)
end program main
Thanks in advance.
- Tom
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Loading...