Discussion:
[OMPI users] a question about MPI dynamic process manage
g***@buaa.edu.cn
2017-03-23 04:23:24 UTC
Permalink
Hi team:
I have a question about MPI dynamic process manage, I hope you can provide some help.
First of all, the MPI program running on multiple nodes, the group with MPI_COMM_WORLD
was splitted into some subgroups by nodes and sub-communicators were created respectively
so that MPI processes in one node can communicate with each other through these sub-communicators.
Then using MPI_Comm_spawn("./child", NULL, 1, hostinfo, 0, sub-communicator, &newcomm, &errs)
to spawn one child process in each node . Children processes were expected to form a group
and further to create an intra-communicator so that using it some message passing can be done between these children processes.
The question is how can I achieve that? Or only have to use MPI_Comm_accept &MPI_Comm_connect to establish
a connection?

best regards!

-------------------------------------
Eric
Jeff Squyres (jsquyres)
2017-03-23 09:49:19 UTC
Permalink
It's likely a lot more efficient to MPI_COMM_SPAWN *all* of your children at once, and then subdivide up the resulting newcomm communicator as desired.

It is *possible* to have a series MPI_COMM_SPAWN calls that spawn a single child process, and then later join all of those children into a single communicator, but it is somewhat tricky and likely not worth it (i.e., you'll save a lot of code complexity if you can spawn all the children at once).
Post by g***@buaa.edu.cn
I have a question about MPI dynamic process manage, I hope you can provide some help.
First of all, the MPI program running on multiple nodes, the group with MPI_COMM_WORLD
was splitted into some subgroups by nodes and sub-communicators were created respectively
so that MPI processes in one node can communicate with each other through these sub-communicators.
Then using MPI_Comm_spawn("./child", NULL, 1, hostinfo, 0, sub-communicator, &newcomm, &errs)
to spawn one child process in each node . Children processes were expected to form a group
and further to create an intra-communicator so that using it some message passing can be done between these children processes.
The question is how can I achieve that? Or only have to use MPI_Comm_accept &MPI_Comm_connect to establish
a connection?
best regards!
-------------------------------------
Eric
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
--
Jeff Squyres
***@cisco.com
Jeff Squyres (jsquyres)
2017-03-24 10:22:26 UTC
Permalink
(keeping the user's list in the CC)
I tried to call MPI_Comm_spawn("./child", MPI_ARGV_NULL, 1, MPI_INFO_NULL, root, MPI_COMM_WORLD, &newcomm, &errs)
in order every MPI process in MPI_COMM_WORLD can spawn one child process.
The MPI_COMM_SPAWN call is collective, meaning that all processes in the source communicator have to invoke the call with the same parameters. Meaning: in your example, this will launch exactly one additional process. Hence, if after MPI_INIT MPI_COMM_WORLD has N processes, if you immediately make this call to MPI_COMM_SPAWN, you will have a total of (N+1) processes.
Then all of these children processes are expected
to form their own MPI_COMM_WORLD even they are located on multiple nodes.
In your example, there will only be 1 child process launched.
I found if the parameter "root" was a same value in every MPI process, for example root =1, there is just one MPI process "rank=1"
can really spawn its child process, although other MPI processes call this function but didn't create their child process.
That is correct. That is the nature of a collective MPI call.
MPI_Comm_spawn("./child", MPI_ARGV_NULL, 1, MPI_INFO_NULL, rank, MPI_COMM_WORLD, &newcomm, &errs)
some wrong occured. I don't quite understand.
That is also correct behavior. MPI-3.1 section 2.4, p11:42-45 initially defines the term "collective":

-----
collective A procedure is collective if all processes in a process group need to invoke the procedure. A collective call may or may not be synchronizing. Collective calls over the same communicator must be executed in the same order by all members of the process group.
-----

MPI_COMM_SPAWN is a collective operation in that all processes in the communicator come together to perform one action (i.e., spawn one or more children). Put differently: you cannot invoke MPI_COMM_SPAWN in one process in a communicator without also invoking it (with the same parameters) in all others processes in the same communicator.
--
Jeff Squyres
***@cisco.com
Loading...