[OMPI users] --host works but --hostfile does not
Info via users
2017-06-22 14:49:02 UTC
I am just learning to use openmpi 1.8.4 that is installed on our cluster. I am running into a baffling issue. If I run:

mpirun -np 3 --host b1,b2,b3 hostname

I get the expected output:


But if I do:

mpirun -np 3 --hostfile hostfile hostname

I get:


Where hostfile contains:


Any ideas what could going on?
2017-06-22 17:14:08 UTC
From “man mpirun” - note that not specifying “slots=N” in a hostfile defaults to slots=#cores on that node (as it states in the text):

Specifying Host Nodes
Host nodes can be identified on the mpirun command line with the -host option or in a hostfile.

For example,

mpirun -H aa,aa,bb ./a.out
launches two processes on node aa and one on bb.

Or, consider the hostfile

% cat myhostfile
aa slots=2
bb slots=2
cc slots=2

Here, we list both the host names (aa, bb, and cc) but also how many "slots" there are for each. Slots indicate how many processes can potentially
execute on a node. For best performance, the number of slots may be chosen to be the number of cores on the node or the number of processor sock‐
ets. If the hostfile does not provide slots information, Open MPI will attempt to discover the number of cores (or hwthreads, if the use-
hwthreads-as-cpus option is set) and set the number of slots to that value. This default behavior also occurs when specifying the -host option with
a single hostname. Thus, the command

mpirun -H aa ./a.out
launches a number of processes equal to the number of cores on node aa.

mpirun -hostfile myhostfile ./a.out
will launch two processes on each of the three nodes.

mpirun -hostfile myhostfile -host aa ./a.out
will launch two processes, both on node aa.

mpirun -hostfile myhostfile -host dd ./a.out
will find no hosts to run on and abort with an error. That is, the specified host dd is not in the specified hostfile.

When running under resource managers (e.g., SLURM, Torque, etc.), Open MPI will obtain both the hostnames and the number of slots directly from the
resource manger.

Specifying Number of Processes
As we have just seen, the number of processes to run can be set using the hostfile. Other mechanisms exist.

The number of processes launched can be specified as a multiple of the number of nodes or processor sockets available. For example,

mpirun -H aa,bb -npersocket 2 ./a.out
launches processes 0-3 on node aa and process 4-7 on node bb, where aa and bb are both dual-socket nodes. The -npersocket option also turns on
the -bind-to-socket option, which is discussed in a later section.

mpirun -H aa,bb -npernode 2 ./a.out
launches processes 0-1 on node aa and processes 2-3 on node bb.

mpirun -H aa,bb -npernode 1 ./a.out
launches one process per host node.

mpirun -H aa,bb -pernode ./a.out
is the same as -npernode 1.

Another alternative is to specify the number of processes with the -np option. Consider now the hostfile

% cat myhostfile
aa slots=4
bb slots=4
cc slots=4


mpirun -hostfile myhostfile -np 6 ./a.out
will launch processes 0-3 on node aa and processes 4-5 on node bb. The remaining slots in the hostfile will not be used since the -np option
indicated that only 6 processes should be launched.
Post by Info via users
mpirun -np 3 --host b1,b2,b3 hostname
mpirun -np 3 --hostfile hostfile hostname
Any ideas what could going on?
users mailing list