[OMPI users] Controlling spawned process

Discussion:

George Reeke

2017-10-06 20:35:29 UTC

Dear colleagues,
I need some help controlling where a process spawned with
MPI_Comm_spawn goes. I am in openmpi-1.10 under Centos 6.7.
My application is written in C and am running on a RedBarn
system with a master node (hardware box) that connects to the
outside world and two other nodes connected to it via ethernet and
Infiniband. There are two executable files, one (I'll call it
"Rank0Pgm") that expects to be rank 0 and does all the I/O and
the other ("RanknPgm") that only communicates via MPI messages.
There are two MPI_Comm_spawns that run just after MPI_Init and
an initial broadcast that shares some setup info, like this:
MPI_Comm_spawn("andmsg", argv, 1, MPI_INFO_NULL,
hostid, commc, &commd, &sperr);
where "andmsg" is a program that needs to communicate with the
internet and with all the other processes via a new communicator
that will be called commd (and another name for the other one).
When I run this program with no hostfile and an mpirun line
something like this on a node with 32 cores:
/usr/lib64/openmpi-1.10/bin/mpirun -n 1 Rank0Pgm : -n 28 RanknPgm \
< InputFile
everything works fine. I assume the spawns use 2 of the 3 available
cores that I did not ask the program to use.

Now I want to run on the full network, so I make a hostfile like this
(call it "nodes120"):
node0 slots=22 max-slots=22
n0003 slots=40 max-slots=40
n0004 slots=56 max-slots=56
where node0 has 24 cores and I am trying to leave room for my two
spawned processes. The spawned processes have to be able to contact
the internet, so I make an MPI_INFO with MPI_Info_create and
MPI_Info_set(mpinfo, "host", "node0")
and change the MPI_INFO_NULL in the spawn calls to point to this
new MPI_Info. (If I leave the MPI_INFO_NULL I get a different
error that is probably not of interest here.)

Now I run the mpirun like above except now with
"--hostfile nodes120" and "-n 116" after the colon. Now I get this
error:

"There are not enough slots available in the system to satisfy the 1
slots that were requested by the application:
andmsg
Either request fewer slots for your application, or make more slots
available for use."

I get the same error with "max-slots=24" on the first line of the
hosts file.

Sorry for the length of all that. Request for help: How do I set
things up to run my rank 0 program and enough copies of RanknPgm to fill
all but some number of cores on the master hardware node, and all the
other rank n programs on the other hardware "nodes" (boxes of CPUs).
[My application will do best with the default "by slot" scheduling.]

Suggestions much appreciated. I am quite convinced my code is OK
in that it runs OK as shown above on one hardware box. Also runs
on my laptop with 4 cores and "-n 3 RanknPgm" so I guess I don't
even really need to reserve cores for the two spawned processes.
I thought of using old-fashioned 'fork' but I really want the
extra communicators to keep asynchronous messages separated.
The documentation says overloading is OK by default, so maybe
something else is wrong here.

George Reeke

r***@open-mpi.org

2017-10-06 20:55:55 UTC

Permalink

Couple of things you can try:

* add --oversubscribe to your mpirun cmd line so it doesn’t care how many slots there are

* modify your MPI_INFO to be “host”, “node0:22” so it thinks there are more slots available

It’s possible that the “host” info processing has a bug in it, but this will tell us a little more and hopefully get your running. If you want to bind your processes to cores, then add “--bind-to core” to the cmd line

Post by George Reeke
Dear colleagues,
I need some help controlling where a process spawned with
MPI_Comm_spawn goes. I am in openmpi-1.10 under Centos 6.7.
My application is written in C and am running on a RedBarn
system with a master node (hardware box) that connects to the
outside world and two other nodes connected to it via ethernet and
Infiniband. There are two executable files, one (I'll call it
"Rank0Pgm") that expects to be rank 0 and does all the I/O and
the other ("RanknPgm") that only communicates via MPI messages.
There are two MPI_Comm_spawns that run just after MPI_Init and
MPI_Comm_spawn("andmsg", argv, 1, MPI_INFO_NULL,
hostid, commc, &commd, &sperr);
where "andmsg" is a program that needs to communicate with the
internet and with all the other processes via a new communicator
that will be called commd (and another name for the other one).
When I run this program with no hostfile and an mpirun line
/usr/lib64/openmpi-1.10/bin/mpirun -n 1 Rank0Pgm : -n 28 RanknPgm \
< InputFile
everything works fine. I assume the spawns use 2 of the 3 available
cores that I did not ask the program to use.
Now I want to run on the full network, so I make a hostfile like this
node0 slots=22 max-slots=22
n0003 slots=40 max-slots=40
n0004 slots=56 max-slots=56
where node0 has 24 cores and I am trying to leave room for my two
spawned processes. The spawned processes have to be able to contact
the internet, so I make an MPI_INFO with MPI_Info_create and
MPI_Info_set(mpinfo, "host", "node0")
and change the MPI_INFO_NULL in the spawn calls to point to this
new MPI_Info. (If I leave the MPI_INFO_NULL I get a different
error that is probably not of interest here.)
Now I run the mpirun like above except now with
"--hostfile nodes120" and "-n 116" after the colon. Now I get this
"There are not enough slots available in the system to satisfy the 1
andmsg
Either request fewer slots for your application, or make more slots
available for use."
I get the same error with "max-slots=24" on the first line of the
hosts file.
Sorry for the length of all that. Request for help: How do I set
things up to run my rank 0 program and enough copies of RanknPgm to fill
all but some number of cores on the master hardware node, and all the
other rank n programs on the other hardware "nodes" (boxes of CPUs).
[My application will do best with the default "by slot" scheduling.]
Suggestions much appreciated. I am quite convinced my code is OK
in that it runs OK as shown above on one hardware box. Also runs
on my laptop with 4 cores and "-n 3 RanknPgm" so I guess I don't
even really need to reserve cores for the two spawned processes.
I thought of using old-fashioned 'fork' but I really want the
extra communicators to keep asynchronous messages separated.
The documentation says overloading is OK by default, so maybe
something else is wrong here.
George Reeke
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users

George Reeke

2017-10-09 17:16:26 UTC

Permalink

To rhc,
Thanks for those suggestions. Here are the results:
(1) Add "--oversubscribe" to mpirun cmd (I also added
"--output-filename junk" -- see other output below).
Terminal output had this fairly usual error message (shortened):

-------------------------------------------------------
Child job 2 terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.

mpirun detected that one or more processes exited with non-zero status,
thus causing the job to be terminated. The first process to do so was:
Process name: [[37749,2],0]
Exit code: 1
------------------------------------------------------

And a file junk.2.000 (presumably stderr) was written--edited
contents here (deleted duplicate output from multiple nodes):

-------------------------------------------------------
[Node0.rockefeller.edu:20366] PSM EP connect error (Endpoint could not
be reached):
[Node0.rockefeller.edu:20366] Node0
[Node0.rockefeller.edu:20366] Node0
[Node0.rockefeller.edu:20366] Node0
----A bunch of identical lines deleted----
[Node0.rockefeller.edu:20366] n0003
[Node0.rockefeller.edu:20366] n0003
[Node0.rockefeller.edu:20366] n0003
----A bunch of identical lines deleted----
[Node0.rockefeller.edu:20366] n0004
[Node0.rockefeller.edu:20366] n0004
[Node0.rockefeller.edu:20366] n0004
----A bunch of identical lines deleted----
[Node0.rockefeller.edu:20366]
[Node0.rockefeller.edu:20366] [[37749,2],0] ORTE_ERROR_LOG: Error in
file dpm_orte.c at line 523
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[Node0.rockefeller.edu:20366] Local abort before MPI_INIT completed
successfully; not able to aggregate error messages, and not able to
guarantee that all other processes were killed!
----------------------------------------------------

I note that these errors apparently occurred in MPI_Init, before
my attempt to spawn additional processes.

(2) Modify your MPI_INFO to be “host”, “node0:22” so it thinks there
are more slots available
When I did this, since I actually try to spawn two processes,
I put "Node0:22" for the first one and "Node0:23" for the second
one. I get simply on the terminal output with no "junk" files:
--------------------------------------------------------------------------
All nodes which are allocated for this job are already filled.
--------------------------------------------------------------------------

This is the same whether I have "slots=22 max-slots=22" or
"slots=21 max-slots=24" in the hostfile.

(3) Using the MPI_INFO as in (2), I also tried adding "--bind-to core"
to the mpirun line. This may be the most interesting output:

--------------------------------------------------------------------------
WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.

Node: Node0

This usually is due to not having the required NUMA support installed
on the node. In some Linux distributions, the required support is
contained in the libnumactl and libnumactl-devel packages.
This is a warning only; your job will continue, though performance may
be degraded.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
A request was made to bind to that would result in binding more
processes than cpus on a resource:

Bind to: CORE
Node: Node0
#processes: 2
#cpus: 1

You can override this protection by adding the "overload-allowed"
option to your binding directive.
--------------------------------------------------------------------------

Indeed the packages mentioned are not installed. I found some
discussion of this at https://github.com/open-mpi/ompi/issues/1087
which claims this message should really be about "hwloc" which is
another thing I know nothing about.
Does any of this help or suggest something else to try?
Thanks,
George Reeke

Post by r***@open-mpi.org
* add --oversubscribe to your mpirun cmd line so it doesn’t care how many slots there are
* modify your MPI_INFO to be “host”, “node0:22” so it thinks there are more slots available
It’s possible that the “host” info processing has a bug in it, but this will tell us a little more and hopefully get your running. If you want to bind your processes to cores, then add “--bind-to core” to the cmd line

_______________________________________________
users mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.open-2Dmpi.org_mailman_listinfo_users&d=DwIGaQ&c=JeTkUgVztGMmhKYjxsy2rfoWYibK1YmxXez1G3oNStg&r=-0HYJje2XxONzoGLV3ECU5R_Z00xayE_1fNBml0KNOw&m=zv8ir_0njk4_4Ebke1aTrY6O79nvjut_1oq0ATd0HA4&s=QKt8TgCrL7-PSfnJbaoWYyGoC2vKk5vhsz8hP-WkNTc&e=