Discussion:
[OMPI users] Fwd: OpenMPI does not obey hostfile
Anthony Thyssen
2017-09-27 01:11:02 UTC
Permalink
I have been having problems with OpenMPI on a new cluster of machines, using
stock RHEL7 packages.

ASIDE: This will be used with Torque-PBS (from EPEL archives), though
OpenMPI
(currently) does not have the "tm" resource manager configured to use PBS,
as you
will be able to see in the debug output below.

*# mpirun -V*
mpirun (Open MPI) 1.10.6

*# sudo yum list installed openmpi*
...
Installed Packages
openmpi.x86_64 1.10.6-2.el7 @rhel-7-server-rpms
...

More than likely I am doing something fundamentally stupid, but I have no
idea what.

The problem is that OpenMPI is not obeying the given hostfile, and running
one
process on each host given in the list. The manual and all my (meagre)
experience
is that that is what it is meant to do.

Instead it runs the maximum number of processes that is allowed to run for
the CPU
of that machine. That is a nice feature, but NOT what is wanted.

There is no "/etc/openmpi-x86_64/openmpi-default-hostfile" configuration
present.

For example given the hostfile

*# cat hostfile.txt*
node21.emperor
node22.emperor
node22.emperor
node23.emperor

Running OpenMPI on the head node "shrek", I get the following,
(ras debugging enabled to see the result)

*# mpirun --hostfile hostfile.txt --mca ras_base_verbose 5 mpi_hello*
[shrek.emperor:93385] mca:base:select:( ras) Querying component
[gridengine]
[shrek.emperor:93385] mca:base:select:( ras) Skipping component
[gridengine]. Query failed to return a module
[shrek.emperor:93385] mca:base:select:( ras) Querying component
[loadleveler]
[shrek.emperor:93385] mca:base:select:( ras) Skipping component
[loadleveler]. Query failed to return a module
[shrek.emperor:93385] mca:base:select:( ras) Querying component [simulator]
[shrek.emperor:93385] mca:base:select:( ras) Skipping component
[simulator]. Query failed to return a module
[shrek.emperor:93385] mca:base:select:( ras) Querying component [slurm]
[shrek.emperor:93385] mca:base:select:( ras) Skipping component [slurm].
Query failed to return a module
[shrek.emperor:93385] mca:base:select:( ras) No component selected!

====================== ALLOCATED NODES ======================
node21.emperor: slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN
node22.emperor: slots=2 max_slots=0 slots_inuse=0 state=UNKNOWN
node23.emperor: slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN
=================================================================
Hello World! from process 0 out of 6 on node21.emperor
Hello World! from process 2 out of 6 on node22.emperor
Hello World! from process 1 out of 6 on node21.emperor
Hello World! from process 3 out of 6 on node22.emperor
Hello World! from process 4 out of 6 on node23.emperor
Hello World! from process 5 out of 6 on node23.emperor

These machines are all dual core CPU's. If a quad core is added to the list
I get 4 processes on that node. And so on, BUT NOT always.

*Note that the "ALLOCATED NODES" list is NOT obeyed.*

If on the other hand I add "slot=#" to the provided hostfile it works as
expected!
(the debug output was not included as it is essentially the same as above)


*# awk '{n[$0]++} END {for(i in n)print i,"slots="n[i]}' hostfile.txt
hostfile_slots.txt*
*# cat hostfile_slots.txt*
node23.emperor slots=1
node22.emperor slots=2
node21.emperor slots=1

*# mpirun --hostfile hostfile_slots.txt mpi_hello*
Hello World! from process 0 out of 4 on node23.emperor
Hello World! from process 1 out of 4 on node22.emperor
Hello World! from process 3 out of 4 on node21.emperor
Hello World! from process 2 out of 4 on node22.emperor

Or if I convert the hostfile into a comma separated host list it also works.

*# tr '\n' , <hostfile.txt; echo*
node21.emperor,node22.emperor,node22.emperor,node23.emperor,
*# mpirun --host $(tr '\n' , <hostfile.txt) mpi_hello*
Hello World! from process 0 out of 4 on node21.emperor
Hello World! from process 1 out of 4 on node22.emperor
Hello World! from process 3 out of 4 on node23.emperor
Hello World! from process 2 out of 4 on node22.emperor


Any help as to why --hostfile does not work as expected and debugged says it
should be working would be appreciated.

As you can see I have been studing this problem a long time. Google has not
been very helpful. All I seem to get are man pages, and general help
guides.


Anthony Thyssen ( System Programmer ) <***@griffith.edu.au>
--------------------------------------------------------------------------
All the books of Power had their own particular nature.
The "Octavo" was harsh and imperious.
The "Bumper Fun Grimore" went in for deadly practical jokes.
The "Joy of Tantric Sex" had to be kept under iced water.
-- Terry Pratchett, "Moving Pictures"
--------------------------------------------------------------------------
r***@open-mpi.org
2017-09-27 02:40:08 UTC
Permalink
That is correct. If you don’t specify a slot count, we auto-discover the number of cores on each node and set #slots to that number. If an RM is involved, then we use what they give us

Sent from my iPad
Post by Anthony Thyssen
I have been having problems with OpenMPI on a new cluster of machines, using
stock RHEL7 packages.
ASIDE: This will be used with Torque-PBS (from EPEL archives), though OpenMPI
(currently) does not have the "tm" resource manager configured to use PBS, as you
will be able to see in the debug output below.
# mpirun -V
mpirun (Open MPI) 1.10.6
# sudo yum list installed openmpi
...
Installed Packages
...
More than likely I am doing something fundamentally stupid, but I have no idea what.
The problem is that OpenMPI is not obeying the given hostfile, and running one
process on each host given in the list. The manual and all my (meagre) experience
is that that is what it is meant to do.
Instead it runs the maximum number of processes that is allowed to run for the CPU
of that machine. That is a nice feature, but NOT what is wanted.
There is no "/etc/openmpi-x86_64/openmpi-default-hostfile" configuration present.
For example given the hostfile
# cat hostfile.txt
node21.emperor
node22.emperor
node22.emperor
node23.emperor
Running OpenMPI on the head node "shrek", I get the following,
(ras debugging enabled to see the result)
# mpirun --hostfile hostfile.txt --mca ras_base_verbose 5 mpi_hello
[shrek.emperor:93385] mca:base:select:( ras) Querying component [gridengine]
[shrek.emperor:93385] mca:base:select:( ras) Skipping component [gridengine]. Query failed to return a module
[shrek.emperor:93385] mca:base:select:( ras) Querying component [loadleveler]
[shrek.emperor:93385] mca:base:select:( ras) Skipping component [loadleveler]. Query failed to return a module
[shrek.emperor:93385] mca:base:select:( ras) Querying component [simulator]
[shrek.emperor:93385] mca:base:select:( ras) Skipping component [simulator]. Query failed to return a module
[shrek.emperor:93385] mca:base:select:( ras) Querying component [slurm]
[shrek.emperor:93385] mca:base:select:( ras) Skipping component [slurm]. Query failed to return a module
[shrek.emperor:93385] mca:base:select:( ras) No component selected!
====================== ALLOCATED NODES ======================
node21.emperor: slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN
node22.emperor: slots=2 max_slots=0 slots_inuse=0 state=UNKNOWN
node23.emperor: slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN
=================================================================
Hello World! from process 0 out of 6 on node21.emperor
Hello World! from process 2 out of 6 on node22.emperor
Hello World! from process 1 out of 6 on node21.emperor
Hello World! from process 3 out of 6 on node22.emperor
Hello World! from process 4 out of 6 on node23.emperor
Hello World! from process 5 out of 6 on node23.emperor
These machines are all dual core CPU's. If a quad core is added to the list
I get 4 processes on that node. And so on, BUT NOT always.
Note that the "ALLOCATED NODES" list is NOT obeyed.
If on the other hand I add "slot=#" to the provided hostfile it works as expected!
(the debug output was not included as it is essentially the same as above)
# awk '{n[$0]++} END {for(i in n)print i,"slots="n[i]}' hostfile.txt > hostfile_slots.txt
# cat hostfile_slots.txt
node23.emperor slots=1
node22.emperor slots=2
node21.emperor slots=1
# mpirun --hostfile hostfile_slots.txt mpi_hello
Hello World! from process 0 out of 4 on node23.emperor
Hello World! from process 1 out of 4 on node22.emperor
Hello World! from process 3 out of 4 on node21.emperor
Hello World! from process 2 out of 4 on node22.emperor
Or if I convert the hostfile into a comma separated host list it also works.
# tr '\n' , <hostfile.txt; echo
node21.emperor,node22.emperor,node22.emperor,node23.emperor,
# mpirun --host $(tr '\n' , <hostfile.txt) mpi_hello
Hello World! from process 0 out of 4 on node21.emperor
Hello World! from process 1 out of 4 on node22.emperor
Hello World! from process 3 out of 4 on node23.emperor
Hello World! from process 2 out of 4 on node22.emperor
Any help as to why --hostfile does not work as expected and debugged says it
should be working would be appreciated.
As you can see I have been studing this problem a long time. Google has not
been very helpful. All I seem to get are man pages, and general help guides.
--------------------------------------------------------------------------
All the books of Power had their own particular nature.
The "Octavo" was harsh and imperious.
The "Bumper Fun Grimore" went in for deadly practical jokes.
The "Joy of Tantric Sex" had to be kept under iced water.
-- Terry Pratchett, "Moving Pictures"
--------------------------------------------------------------------------
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Anthony Thyssen
2017-09-27 03:57:31 UTC
Permalink
This is not explained in the manual, when giving a hostfile (though I was
suspecting that was the case).

However running one process on each node listed WAS the default behaviour
in the past. In fact that is the default behaviour on a old Version 1.5.4
OpenMPI, I have on an old cluster which I am replacing.

I suggest that this be explicitly explained in at least the manpages, and
preferably the OpenMPI FAQ too.


It explains why the manpages and FAQ seems to avoids specifying a host
twice in a --hostfile, and yet specifically does specify a host twice in
the next section on the --hosts option. But no explanation is given!

It explains why if I give a --pernode option, it runs only one process on
each host BUT ignores the fact that a host was listed twice. And if a -np
option was also given with --pernode errors with "more processes than the
ppr"


What that does NOT explain was why it completely ignores the "ALLOCATED
NODES" that was reported in the debug output, as shown above.

The only reason I posted for help was because the debug out seems to
indicate that it should be performing as I expected.

---

*Is there an option to force OpenMPi to use the OLD behaviour?* Just as
many web pages indicates it should be doing?
I have found no such option in the man pages.

Without such an option, it makes the passing the $PBS_NODEFILE (from
torque) to the "mpirun" command much more difficult. Which was why I
developed the "awk" script above, or try an convert it to a comma separated
--host argument, that does work.

It seems a LOT of webpages on the net all, assume the old behaviour of
--hostfile which is why this new behaviour is confusing me, especially with
no explicit mention of this behaviour in the manual or OpenMPI FAQ pages.

---

I have seen many PBS guides specify a --np option for the MPI command.
Though I could not see the point of it.

A quick test seemed to indicate that it works, so I thought perhaps that
was the way to specify the old behaviour.

*# mpirun --hostfile hostfile.txt hostname*
node21.emperor
node22.emperor
node21.emperor
node22.emperor
node23.emperor
node23.emperor

*# mpirun --hostfile hostfile.txt --np $(wc -l <hostfile.txt) hostname*
node21.emperor
node22.emperor
node22.emperor
node21.emperor

I think however that was purely a fluke. As when I expand it to a PBS
batch script command, to run on a larger number of nodes...

*mpirun --hostfile $PBS_NODEFILE -np $PBS_NP hostname*

Results is that OpenMPI still runs as many of the processes as it can (up
to the NP limit) on the first few nodes given. And node as Torque PBS
specified.

---

ASIDE: The auto-discover does not appear to work very well. Tests with a
mix of dual and quad-core machines, often result in only
2 processes on some of the quad-core machines.

I saw mention of a --hetero-nodes which works to make auto-discovery work
as expected. BUT it is NOT mentioned in the manual, and to me "hetero"
implies a heterogeneous set of computers (all the same) rather than a mix
of computer types. As such the option name does not make any real sense to
me.

---

Now I have attempted to recompile the OpenMPI package, to include torque
support, but the RPM build specifications is overly complex (as is typical
for RHEL) . I have yet to succeed in getting a replacement OpenMPI package
with the "tm" resource manager, that works. Redhat has declared that it
will not do it as "Torque" is EPEL, and not RHEL as "OpenMPI" is.

Also I hate having to build local versions of packages as it means I then
no longer get package updates automatically.
Post by r***@open-mpi.org
That is correct. If you don’t specify a slot count, we auto-discover the
number of cores on each node and set #slots to that number. If an RM is
involved, then we use what they give us
Sent from my iPad
I have been having problems with OpenMPI on a new cluster of machines, using
stock RHEL7 packages.
ASIDE: This will be used with Torque-PBS (from EPEL archives), though OpenMPI
(currently) does not have the "tm" resource manager configured to use PBS, as you
will be able to see in the debug output below.
*# mpirun -V*
mpirun (Open MPI) 1.10.6
*# sudo yum list installed openmpi*
...
Installed Packages
...
More than likely I am doing something fundamentally stupid, but I have no idea what.
The problem is that OpenMPI is not obeying the given hostfile, and running one
process on each host given in the list. The manual and all my (meagre) experience
is that that is what it is meant to do.
Instead it runs the maximum number of processes that is allowed to run for the CPU
of that machine. That is a nice feature, but NOT what is wanted.
There is no "/etc/openmpi-x86_64/openmpi-default-hostfile" configuration present.
For example given the hostfile
*# cat hostfile.txt*
node21.emperor
node22.emperor
node22.emperor
node23.emperor
Running OpenMPI on the head node "shrek", I get the following,
(ras debugging enabled to see the result)
*# mpirun --hostfile hostfile.txt --mca ras_base_verbose 5 mpi_hello*
[shrek.emperor:93385] mca:base:select:( ras) Querying component [gridengine]
[shrek.emperor:93385] mca:base:select:( ras) Skipping component
[gridengine]. Query failed to return a module
[shrek.emperor:93385] mca:base:select:( ras) Querying component [loadleveler]
[shrek.emperor:93385] mca:base:select:( ras) Skipping component
[loadleveler]. Query failed to return a module
[shrek.emperor:93385] mca:base:select:( ras) Querying component [simulator]
[shrek.emperor:93385] mca:base:select:( ras) Skipping component
[simulator]. Query failed to return a module
[shrek.emperor:93385] mca:base:select:( ras) Querying component [slurm]
[shrek.emperor:93385] mca:base:select:( ras) Skipping component [slurm].
Query failed to return a module
[shrek.emperor:93385] mca:base:select:( ras) No component selected!
====================== ALLOCATED NODES ======================
node21.emperor: slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN
node22.emperor: slots=2 max_slots=0 slots_inuse=0 state=UNKNOWN
node23.emperor: slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN
=================================================================
Hello World! from process 0 out of 6 on node21.emperor
Hello World! from process 2 out of 6 on node22.emperor
Hello World! from process 1 out of 6 on node21.emperor
Hello World! from process 3 out of 6 on node22.emperor
Hello World! from process 4 out of 6 on node23.emperor
Hello World! from process 5 out of 6 on node23.emperor
These machines are all dual core CPU's. If a quad core is added to the list
I get 4 processes on that node. And so on, BUT NOT always.
*Note that the "ALLOCATED NODES" list is NOT obeyed.*
If on the other hand I add "slot=#" to the provided hostfile it works as expected!
(the debug output was not included as it is essentially the same as above)
*# awk '{n[$0]++} END {for(i in n)print i,"slots="n[i]}' hostfile.txt
hostfile_slots.txt*
*# cat hostfile_slots.txt*
node23.emperor slots=1
node22.emperor slots=2
node21.emperor slots=1
*# mpirun --hostfile hostfile_slots.txt mpi_hello*
Hello World! from process 0 out of 4 on node23.emperor
Hello World! from process 1 out of 4 on node22.emperor
Hello World! from process 3 out of 4 on node21.emperor
Hello World! from process 2 out of 4 on node22.emperor
Or if I convert the hostfile into a comma separated host list it also works.
*# tr '\n' , <hostfile.txt; echo*
node21.emperor,node22.emperor,node22.emperor,node23.emperor,
*# mpirun --host $(tr '\n' , <hostfile.txt) mpi_hello*
Hello World! from process 0 out of 4 on node21.emperor
Hello World! from process 1 out of 4 on node22.emperor
Hello World! from process 3 out of 4 on node23.emperor
Hello World! from process 2 out of 4 on node22.emperor
Any help as to why --hostfile does not work as expected and debugged says it
should be working would be appreciated.
As you can see I have been studing this problem a long time. Google has not
been very helpful. All I seem to get are man pages, and general help guides.
-----------------------------------------------------------
---------------
All the books of Power had their own particular nature.
The "Octavo" was harsh and imperious.
The "Bumper Fun Grimore" went in for deadly practical jokes.
The "Joy of Tantric Sex" had to be kept under iced water.
-- Terry Pratchett, "Moving Pictures"
-----------------------------------------------------------
---------------
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Gilles Gouaillardet
2017-09-27 04:55:04 UTC
Permalink
Anthony,

a few things ...
- Open MPI v1.10 is no more supported
- you should at least use v2.0, preferably v2.1 or even the newly released 3.0
- if you need to run under torque/pbs, then Open MPI should be built
with tm support
- openhpc.org provides Open MPI 1.10.7 with tm support

Cheers,

Gilles

On Wed, Sep 27, 2017 at 12:57 PM, Anthony Thyssen
Post by Anthony Thyssen
This is not explained in the manual, when giving a hostfile (though I was
suspecting that was the case).
However running one process on each node listed WAS the default behaviour in
the past. In fact that is the default behaviour on a old Version 1.5.4
OpenMPI, I have on an old cluster which I am replacing.
I suggest that this be explicitly explained in at least the manpages, and
preferably the OpenMPI FAQ too.
It explains why the manpages and FAQ seems to avoids specifying a host twice
in a --hostfile, and yet specifically does specify a host twice in the next
section on the --hosts option. But no explanation is given!
It explains why if I give a --pernode option, it runs only one process on
each host BUT ignores the fact that a host was listed twice. And if a -np
option was also given with --pernode errors with "more processes than the
ppr"
What that does NOT explain was why it completely ignores the "ALLOCATED
NODES" that was reported in the debug output, as shown above.
The only reason I posted for help was because the debug out seems to
indicate that it should be performing as I expected.
---
Is there an option to force OpenMPi to use the OLD behaviour? Just as many
web pages indicates it should be doing?
I have found no such option in the man pages.
Without such an option, it makes the passing the $PBS_NODEFILE (from torque)
to the "mpirun" command much more difficult. Which was why I developed the
"awk" script above, or try an convert it to a comma separated --host
argument, that does work.
It seems a LOT of webpages on the net all, assume the old behaviour of
--hostfile which is why this new behaviour is confusing me, especially with
no explicit mention of this behaviour in the manual or OpenMPI FAQ pages.
---
I have seen many PBS guides specify a --np option for the MPI command.
Though I could not see the point of it.
A quick test seemed to indicate that it works, so I thought perhaps that was
the way to specify the old behaviour.
# mpirun --hostfile hostfile.txt hostname
node21.emperor
node22.emperor
node21.emperor
node22.emperor
node23.emperor
node23.emperor
# mpirun --hostfile hostfile.txt --np $(wc -l <hostfile.txt) hostname
node21.emperor
node22.emperor
node22.emperor
node21.emperor
I think however that was purely a fluke. As when I expand it to a PBS batch
script command, to run on a larger number of nodes...
mpirun --hostfile $PBS_NODEFILE -np $PBS_NP hostname
Results is that OpenMPI still runs as many of the processes as it can (up to
the NP limit) on the first few nodes given. And node as Torque PBS
specified.
---
ASIDE: The auto-discover does not appear to work very well. Tests with a mix
of dual and quad-core machines, often result in only
2 processes on some of the quad-core machines.
I saw mention of a --hetero-nodes which works to make auto-discovery work as
expected. BUT it is NOT mentioned in the manual, and to me "hetero"
implies a heterogeneous set of computers (all the same) rather than a mix of
computer types. As such the option name does not make any real sense to me.
---
Now I have attempted to recompile the OpenMPI package, to include torque
support, but the RPM build specifications is overly complex (as is typical
for RHEL) . I have yet to succeed in getting a replacement OpenMPI package
with the "tm" resource manager, that works. Redhat has declared that it
will not do it as "Torque" is EPEL, and not RHEL as "OpenMPI" is.
Also I hate having to build local versions of packages as it means I then no
longer get package updates automatically.
That is correct. If you don’t specify a slot count, we auto-discover the
number of cores on each node and set #slots to that number. If an RM is
involved, then we use what they give us
Sent from my iPad
I have been having problems with OpenMPI on a new cluster of machines, using
stock RHEL7 packages.
ASIDE: This will be used with Torque-PBS (from EPEL archives), though OpenMPI
(currently) does not have the "tm" resource manager configured to use PBS, as you
will be able to see in the debug output below.
# mpirun -V
mpirun (Open MPI) 1.10.6
# sudo yum list installed openmpi
...
Installed Packages
...
More than likely I am doing something fundamentally stupid, but I have no idea what.
The problem is that OpenMPI is not obeying the given hostfile, and running one
process on each host given in the list. The manual and all my (meagre) experience
is that that is what it is meant to do.
Instead it runs the maximum number of processes that is allowed to run for the CPU
of that machine. That is a nice feature, but NOT what is wanted.
There is no "/etc/openmpi-x86_64/openmpi-default-hostfile" configuration present.
For example given the hostfile
# cat hostfile.txt
node21.emperor
node22.emperor
node22.emperor
node23.emperor
Running OpenMPI on the head node "shrek", I get the following,
(ras debugging enabled to see the result)
# mpirun --hostfile hostfile.txt --mca ras_base_verbose 5 mpi_hello
[shrek.emperor:93385] mca:base:select:( ras) Querying component [gridengine]
[shrek.emperor:93385] mca:base:select:( ras) Skipping component
[gridengine]. Query failed to return a module
[shrek.emperor:93385] mca:base:select:( ras) Querying component [loadleveler]
[shrek.emperor:93385] mca:base:select:( ras) Skipping component
[loadleveler]. Query failed to return a module
[shrek.emperor:93385] mca:base:select:( ras) Querying component [simulator]
[shrek.emperor:93385] mca:base:select:( ras) Skipping component
[simulator]. Query failed to return a module
[shrek.emperor:93385] mca:base:select:( ras) Querying component [slurm]
[shrek.emperor:93385] mca:base:select:( ras) Skipping component [slurm].
Query failed to return a module
[shrek.emperor:93385] mca:base:select:( ras) No component selected!
====================== ALLOCATED NODES ======================
node21.emperor: slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN
node22.emperor: slots=2 max_slots=0 slots_inuse=0 state=UNKNOWN
node23.emperor: slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN
=================================================================
Hello World! from process 0 out of 6 on node21.emperor
Hello World! from process 2 out of 6 on node22.emperor
Hello World! from process 1 out of 6 on node21.emperor
Hello World! from process 3 out of 6 on node22.emperor
Hello World! from process 4 out of 6 on node23.emperor
Hello World! from process 5 out of 6 on node23.emperor
These machines are all dual core CPU's. If a quad core is added to the list
I get 4 processes on that node. And so on, BUT NOT always.
Note that the "ALLOCATED NODES" list is NOT obeyed.
If on the other hand I add "slot=#" to the provided hostfile it works as expected!
(the debug output was not included as it is essentially the same as above)
# awk '{n[$0]++} END {for(i in n)print i,"slots="n[i]}' hostfile.txt > hostfile_slots.txt
# cat hostfile_slots.txt
node23.emperor slots=1
node22.emperor slots=2
node21.emperor slots=1
# mpirun --hostfile hostfile_slots.txt mpi_hello
Hello World! from process 0 out of 4 on node23.emperor
Hello World! from process 1 out of 4 on node22.emperor
Hello World! from process 3 out of 4 on node21.emperor
Hello World! from process 2 out of 4 on node22.emperor
Or if I convert the hostfile into a comma separated host list it also works.
# tr '\n' , <hostfile.txt; echo
node21.emperor,node22.emperor,node22.emperor,node23.emperor,
# mpirun --host $(tr '\n' , <hostfile.txt) mpi_hello
Hello World! from process 0 out of 4 on node21.emperor
Hello World! from process 1 out of 4 on node22.emperor
Hello World! from process 3 out of 4 on node23.emperor
Hello World! from process 2 out of 4 on node22.emperor
Any help as to why --hostfile does not work as expected and debugged says it
should be working would be appreciated.
As you can see I have been studing this problem a long time. Google has not
been very helpful. All I seem to get are man pages, and general help guides.
--------------------------------------------------------------------------
All the books of Power had their own particular nature.
The "Octavo" was harsh and imperious.
The "Bumper Fun Grimore" went in for deadly practical jokes.
The "Joy of Tantric Sex" had to be kept under iced water.
-- Terry Pratchett, "Moving Pictures"
--------------------------------------------------------------------------
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Anthony Thyssen
2017-09-29 05:17:19 UTC
Permalink
Thank you Gilles for the pointer.

However that package "openmpi-gnu-ohpc-1.10.6-23.1.x86_64.rpm" has other
dependencies from the OpenHPC. Basically it is strongly tied to the whole
OpenHPC concept.


I did however follow your suggestion and rebuild the OpenMPI RPM package
from redhat adding the "tm" module needed to integration with torque. But
that only produced another (similar but not quite the same) problem.

OpenMPI now does correct pick up a node allocation from Torque (according
to -display-allocation, and --display-map), but for some reason is
completely ignoring it, and just running everything (over-subscribing) on
the first node given. The previous problem did not over subscribe the
nodes. It just did not spread out the processes as requested.

I am starting a new thread about this problem to try and get some help.


Anthony Thyssen ( System Programmer ) <***@griffith.edu.au>
--------------------------------------------------------------------------
Warning: May contain traces of nuts.
--------------------------------------------------------------------------


On Wed, Sep 27, 2017 at 2:55 PM, Gilles Gouaillardet <
Post by Gilles Gouaillardet
Anthony,
a few things ...
- Open MPI v1.10 is no more supported
- you should at least use v2.0, preferably v2.1 or even the newly released 3.0
- if you need to run under torque/pbs, then Open MPI should be built
with tm support
- openhpc.org provides Open MPI 1.10.7 with tm support
Cheers,
Gilles
On Wed, Sep 27, 2017 at 12:57 PM, Anthony Thyssen
Post by Anthony Thyssen
This is not explained in the manual, when giving a hostfile (though I was
suspecting that was the case).
However running one process on each node listed WAS the default
behaviour in
Post by Anthony Thyssen
the past. In fact that is the default behaviour on a old Version 1.5.4
OpenMPI, I have on an old cluster which I am replacing.
I suggest that this be explicitly explained in at least the manpages, and
preferably the OpenMPI FAQ too.
It explains why the manpages and FAQ seems to avoids specifying a host
twice
Post by Anthony Thyssen
in a --hostfile, and yet specifically does specify a host twice in the
next
Post by Anthony Thyssen
section on the --hosts option. But no explanation is given!
It explains why if I give a --pernode option, it runs only one process on
each host BUT ignores the fact that a host was listed twice. And if a -np
option was also given with --pernode errors with "more processes than the
ppr"
What that does NOT explain was why it completely ignores the "ALLOCATED
NODES" that was reported in the debug output, as shown above.
The only reason I posted for help was because the debug out seems to
indicate that it should be performing as I expected.
---
Is there an option to force OpenMPi to use the OLD behaviour? Just as
many
Post by Anthony Thyssen
web pages indicates it should be doing?
I have found no such option in the man pages.
Without such an option, it makes the passing the $PBS_NODEFILE (from
torque)
Post by Anthony Thyssen
to the "mpirun" command much more difficult. Which was why I developed
the
Post by Anthony Thyssen
"awk" script above, or try an convert it to a comma separated --host
argument, that does work.
It seems a LOT of webpages on the net all, assume the old behaviour of
--hostfile which is why this new behaviour is confusing me, especially
with
Post by Anthony Thyssen
no explicit mention of this behaviour in the manual or OpenMPI FAQ pages.
---
I have seen many PBS guides specify a --np option for the MPI command.
Though I could not see the point of it.
A quick test seemed to indicate that it works, so I thought perhaps that
was
Post by Anthony Thyssen
the way to specify the old behaviour.
# mpirun --hostfile hostfile.txt hostname
node21.emperor
node22.emperor
node21.emperor
node22.emperor
node23.emperor
node23.emperor
# mpirun --hostfile hostfile.txt --np $(wc -l <hostfile.txt) hostname
node21.emperor
node22.emperor
node22.emperor
node21.emperor
I think however that was purely a fluke. As when I expand it to a PBS
batch
Post by Anthony Thyssen
script command, to run on a larger number of nodes...
mpirun --hostfile $PBS_NODEFILE -np $PBS_NP hostname
Results is that OpenMPI still runs as many of the processes as it can
(up to
Post by Anthony Thyssen
the NP limit) on the first few nodes given. And node as Torque PBS
specified.
---
ASIDE: The auto-discover does not appear to work very well. Tests with a
mix
Post by Anthony Thyssen
of dual and quad-core machines, often result in only
2 processes on some of the quad-core machines.
I saw mention of a --hetero-nodes which works to make auto-discovery
work as
Post by Anthony Thyssen
expected. BUT it is NOT mentioned in the manual, and to me "hetero"
implies a heterogeneous set of computers (all the same) rather than a
mix of
Post by Anthony Thyssen
computer types. As such the option name does not make any real sense to
me.
Post by Anthony Thyssen
---
Now I have attempted to recompile the OpenMPI package, to include torque
support, but the RPM build specifications is overly complex (as is
typical
Post by Anthony Thyssen
for RHEL) . I have yet to succeed in getting a replacement OpenMPI
package
Post by Anthony Thyssen
with the "tm" resource manager, that works. Redhat has declared that it
will not do it as "Torque" is EPEL, and not RHEL as "OpenMPI" is.
Also I hate having to build local versions of packages as it means I
then no
Post by Anthony Thyssen
longer get package updates automatically.
Post by r***@open-mpi.org
That is correct. If you don’t specify a slot count, we auto-discover the
number of cores on each node and set #slots to that number. If an RM is
involved, then we use what they give us
Sent from my iPad
I have been having problems with OpenMPI on a new cluster of machines, using
stock RHEL7 packages.
ASIDE: This will be used with Torque-PBS (from EPEL archives), though OpenMPI
(currently) does not have the "tm" resource manager configured to use
PBS,
Post by Anthony Thyssen
Post by r***@open-mpi.org
as you
will be able to see in the debug output below.
# mpirun -V
mpirun (Open MPI) 1.10.6
# sudo yum list installed openmpi
...
Installed Packages
...
More than likely I am doing something fundamentally stupid, but I have
no
Post by Anthony Thyssen
Post by r***@open-mpi.org
idea what.
The problem is that OpenMPI is not obeying the given hostfile, and
running
Post by Anthony Thyssen
Post by r***@open-mpi.org
one
process on each host given in the list. The manual and all my (meagre) experience
is that that is what it is meant to do.
Instead it runs the maximum number of processes that is allowed to run
for
Post by Anthony Thyssen
Post by r***@open-mpi.org
the CPU
of that machine. That is a nice feature, but NOT what is wanted.
There is no "/etc/openmpi-x86_64/openmpi-default-hostfile"
configuration
Post by Anthony Thyssen
Post by r***@open-mpi.org
present.
For example given the hostfile
# cat hostfile.txt
node21.emperor
node22.emperor
node22.emperor
node23.emperor
Running OpenMPI on the head node "shrek", I get the following,
(ras debugging enabled to see the result)
# mpirun --hostfile hostfile.txt --mca ras_base_verbose 5 mpi_hello
[shrek.emperor:93385] mca:base:select:( ras) Querying component [gridengine]
[shrek.emperor:93385] mca:base:select:( ras) Skipping component
[gridengine]. Query failed to return a module
[shrek.emperor:93385] mca:base:select:( ras) Querying component [loadleveler]
[shrek.emperor:93385] mca:base:select:( ras) Skipping component
[loadleveler]. Query failed to return a module
[shrek.emperor:93385] mca:base:select:( ras) Querying component [simulator]
[shrek.emperor:93385] mca:base:select:( ras) Skipping component
[simulator]. Query failed to return a module
[shrek.emperor:93385] mca:base:select:( ras) Querying component [slurm]
[shrek.emperor:93385] mca:base:select:( ras) Skipping component
[slurm].
Post by Anthony Thyssen
Post by r***@open-mpi.org
Query failed to return a module
[shrek.emperor:93385] mca:base:select:( ras) No component selected!
====================== ALLOCATED NODES ======================
node21.emperor: slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN
node22.emperor: slots=2 max_slots=0 slots_inuse=0 state=UNKNOWN
node23.emperor: slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN
=================================================================
Hello World! from process 0 out of 6 on node21.emperor
Hello World! from process 2 out of 6 on node22.emperor
Hello World! from process 1 out of 6 on node21.emperor
Hello World! from process 3 out of 6 on node22.emperor
Hello World! from process 4 out of 6 on node23.emperor
Hello World! from process 5 out of 6 on node23.emperor
These machines are all dual core CPU's. If a quad core is added to the list
I get 4 processes on that node. And so on, BUT NOT always.
Note that the "ALLOCATED NODES" list is NOT obeyed.
If on the other hand I add "slot=#" to the provided hostfile it works
as
Post by Anthony Thyssen
Post by r***@open-mpi.org
expected!
(the debug output was not included as it is essentially the same as
above)
Post by Anthony Thyssen
Post by r***@open-mpi.org
# awk '{n[$0]++} END {for(i in n)print i,"slots="n[i]}' hostfile.txt >
hostfile_slots.txt
# cat hostfile_slots.txt
node23.emperor slots=1
node22.emperor slots=2
node21.emperor slots=1
# mpirun --hostfile hostfile_slots.txt mpi_hello
Hello World! from process 0 out of 4 on node23.emperor
Hello World! from process 1 out of 4 on node22.emperor
Hello World! from process 3 out of 4 on node21.emperor
Hello World! from process 2 out of 4 on node22.emperor
Or if I convert the hostfile into a comma separated host list it also works.
# tr '\n' , <hostfile.txt; echo
node21.emperor,node22.emperor,node22.emperor,node23.emperor,
# mpirun --host $(tr '\n' , <hostfile.txt) mpi_hello
Hello World! from process 0 out of 4 on node21.emperor
Hello World! from process 1 out of 4 on node22.emperor
Hello World! from process 3 out of 4 on node23.emperor
Hello World! from process 2 out of 4 on node22.emperor
Any help as to why --hostfile does not work as expected and debugged
says
Post by Anthony Thyssen
Post by r***@open-mpi.org
it
should be working would be appreciated.
As you can see I have been studing this problem a long time. Google has not
been very helpful. All I seem to get are man pages, and general help guides.
------------------------------------------------------------
--------------
Post by Anthony Thyssen
Post by r***@open-mpi.org
All the books of Power had their own particular nature.
The "Octavo" was harsh and imperious.
The "Bumper Fun Grimore" went in for deadly practical jokes.
The "Joy of Tantric Sex" had to be kept under iced water.
-- Terry Pratchett, "Moving
Pictures"
Post by Anthony Thyssen
Post by r***@open-mpi.org
------------------------------------------------------------
--------------
Post by Anthony Thyssen
Post by r***@open-mpi.org
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Loading...