Discussion:
[OMPI users] -host vs -hostfile
Mahmood Naderan
2017-08-03 12:48:52 UTC
Permalink
Well, it seems that the default Rocks-openmpi dominates the systems. So, at
the moment, I stick with that which is 1.6.5 and uses -machinefile.
I will later debug to see why 2.0.1 doesn't work.

Thanks.

Regards,
Mahmood
Maybe something is wrong with the Torque installation?
Or perhaps with the Open MPI + Torque integration?
1) Make sure your Open MPI was configured and compiled with the
Torque "tm" library of your Torque installation.
configure --with-tm=/path/to/your/Torque/tm_library ...
2) Check if your $TORQUE/server_priv/nodes file has all the nodes
in your cluster. If not, edit the file and add the missing nodes.
Then restart the Torque server (service pbs_server restart).
3) Run "pbsnodes" to see if all nodes are listed.
#PBS -l nodes=4:ppn=1
...
mpirun hostname
The output should show all four nodes.
Good luck!
Gus Correa
Well it is confusing!! As you can see, I added four nodes to the host
file (the same nodes are used by PBS). The --map-by ppr:1:node works well.
However, the PBS directive doesn't work
-hostfile hosts --map-by ppr:1:node a.out
************************************************************
****************
* hwloc 1.11.2 has encountered what looks like an error from the
operating system.
*
* Package (P#1 cpuset 0xffff0000) intersects with NUMANode (P#1 cpuset
0xff00ffff) without inclusion!
* Error occurred in topology.c line 1048
*
* What should I do when hwloc reports "operating system" warnings?
* Otherwise please report this error message to the hwloc user's mailing
list,
* along with the output+tarball generated by the hwloc-gather-topology
script.
************************************************************
****************
Hello world from processor cluster.hpc.org <http://cluster.hpc.org>,
rank 0 out of 4 processors
Hello world from processor compute-0-0.local, rank 1 out of 4 processors
Hello world from processor compute-0-1.local, rank 2 out of 4 processors
Hello world from processor compute-0-2.local, rank 3 out of 4 processors
#!/bin/bash
#PBS -V
#PBS -q default
#PBS -j oe
#PBS -l nodes=4:ppn=1
#PBS -N job1
#PBS -o .
cd $PBS_O_WORKDIR
/share/apps/computer/openmpi-2.0.1/bin/mpirun a.out
6428.cluster.hpc.org <http://6428.cluster.hpc.org>
Hello world from processor compute-0-1.local, rank 0 out of 32 processors
Hello world from processor compute-0-1.local, rank 2 out of 32 processors
Hello world from processor compute-0-1.local, rank 3 out of 32 processors
Hello world from processor compute-0-1.local, rank 4 out of 32 processors
Hello world from processor compute-0-1.local, rank 5 out of 32 processors
Hello world from processor compute-0-1.local, rank 6 out of 32 processors
Hello world from processor compute-0-1.local, rank 8 out of 32 processors
Hello world from processor compute-0-1.local, rank 9 out of 32 processors
Hello world from processor compute-0-1.local, rank 12 out of 32 processors
Hello world from processor compute-0-1.local, rank 15 out of 32 processors
Hello world from processor compute-0-1.local, rank 16 out of 32 processors
Hello world from processor compute-0-1.local, rank 18 out of 32 processors
Hello world from processor compute-0-1.local, rank 19 out of 32 processors
Hello world from processor compute-0-1.local, rank 20 out of 32 processors
Hello world from processor compute-0-1.local, rank 21 out of 32 processors
Hello world from processor compute-0-1.local, rank 22 out of 32 processors
Hello world from processor compute-0-1.local, rank 24 out of 32 processors
Hello world from processor compute-0-1.local, rank 26 out of 32 processors
Hello world from processor compute-0-1.local, rank 27 out of 32 processors
Hello world from processor compute-0-1.local, rank 28 out of 32 processors
Hello world from processor compute-0-1.local, rank 29 out of 32 processors
Hello world from processor compute-0-1.local, rank 30 out of 32 processors
Hello world from processor compute-0-1.local, rank 31 out of 32 processors
Hello world from processor compute-0-1.local, rank 7 out of 32 processors
Hello world from processor compute-0-1.local, rank 10 out of 32 processors
Hello world from processor compute-0-1.local, rank 14 out of 32 processors
Hello world from processor compute-0-1.local, rank 1 out of 32 processors
Hello world from processor compute-0-1.local, rank 11 out of 32 processors
Hello world from processor compute-0-1.local, rank 13 out of 32 processors
Hello world from processor compute-0-1.local, rank 17 out of 32 processors
Hello world from processor compute-0-1.local, rank 23 out of 32 processors
Hello world from processor compute-0-1.local, rank 25 out of 32 processors
Any idea?
Regards,
Mahmood
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Gilles Gouaillardet
2017-08-03 13:52:43 UTC
Permalink
Mahmood,

you might want to have a look at OpenHPC (which comes with a recent Open MPI)

Cheers,

Gilles
Post by Mahmood Naderan
Well, it seems that the default Rocks-openmpi dominates the systems. So, at
the moment, I stick with that which is 1.6.5 and uses -machinefile.
I will later debug to see why 2.0.1 doesn't work.
Thanks.
Regards,
Mahmood
Maybe something is wrong with the Torque installation?
Or perhaps with the Open MPI + Torque integration?
1) Make sure your Open MPI was configured and compiled with the
Torque "tm" library of your Torque installation.
configure --with-tm=/path/to/your/Torque/tm_library ...
2) Check if your $TORQUE/server_priv/nodes file has all the nodes
in your cluster. If not, edit the file and add the missing nodes.
Then restart the Torque server (service pbs_server restart).
3) Run "pbsnodes" to see if all nodes are listed.
#PBS -l nodes=4:ppn=1
...
mpirun hostname
The output should show all four nodes.
Good luck!
Gus Correa
Well it is confusing!! As you can see, I added four nodes to the host
file (the same nodes are used by PBS). The --map-by ppr:1:node works well.
However, the PBS directive doesn't work
-hostfile hosts --map-by ppr:1:node a.out
****************************************************************************
* hwloc 1.11.2 has encountered what looks like an error from the
operating system.
*
* Package (P#1 cpuset 0xffff0000) intersects with NUMANode (P#1 cpuset
0xff00ffff) without inclusion!
* Error occurred in topology.c line 1048
*
* What should I do when hwloc reports "operating system" warnings?
* Otherwise please report this error message to the hwloc user's mailing
list,
* along with the output+tarball generated by the hwloc-gather-topology
script.
****************************************************************************
Hello world from processor cluster.hpc.org <http://cluster.hpc.org>, rank
0 out of 4 processors
Hello world from processor compute-0-0.local, rank 1 out of 4 processors
Hello world from processor compute-0-1.local, rank 2 out of 4 processors
Hello world from processor compute-0-2.local, rank 3 out of 4 processors
#!/bin/bash
#PBS -V
#PBS -q default
#PBS -j oe
#PBS -l nodes=4:ppn=1
#PBS -N job1
#PBS -o .
cd $PBS_O_WORKDIR
/share/apps/computer/openmpi-2.0.1/bin/mpirun a.out
6428.cluster.hpc.org <http://6428.cluster.hpc.org>
Hello world from processor compute-0-1.local, rank 0 out of 32 processors
Hello world from processor compute-0-1.local, rank 2 out of 32 processors
Hello world from processor compute-0-1.local, rank 3 out of 32 processors
Hello world from processor compute-0-1.local, rank 4 out of 32 processors
Hello world from processor compute-0-1.local, rank 5 out of 32 processors
Hello world from processor compute-0-1.local, rank 6 out of 32 processors
Hello world from processor compute-0-1.local, rank 8 out of 32 processors
Hello world from processor compute-0-1.local, rank 9 out of 32 processors
Hello world from processor compute-0-1.local, rank 12 out of 32 processors
Hello world from processor compute-0-1.local, rank 15 out of 32 processors
Hello world from processor compute-0-1.local, rank 16 out of 32 processors
Hello world from processor compute-0-1.local, rank 18 out of 32 processors
Hello world from processor compute-0-1.local, rank 19 out of 32 processors
Hello world from processor compute-0-1.local, rank 20 out of 32 processors
Hello world from processor compute-0-1.local, rank 21 out of 32 processors
Hello world from processor compute-0-1.local, rank 22 out of 32 processors
Hello world from processor compute-0-1.local, rank 24 out of 32 processors
Hello world from processor compute-0-1.local, rank 26 out of 32 processors
Hello world from processor compute-0-1.local, rank 27 out of 32 processors
Hello world from processor compute-0-1.local, rank 28 out of 32 processors
Hello world from processor compute-0-1.local, rank 29 out of 32 processors
Hello world from processor compute-0-1.local, rank 30 out of 32 processors
Hello world from processor compute-0-1.local, rank 31 out of 32 processors
Hello world from processor compute-0-1.local, rank 7 out of 32 processors
Hello world from processor compute-0-1.local, rank 10 out of 32 processors
Hello world from processor compute-0-1.local, rank 14 out of 32 processors
Hello world from processor compute-0-1.local, rank 1 out of 32 processors
Hello world from processor compute-0-1.local, rank 11 out of 32 processors
Hello world from processor compute-0-1.local, rank 13 out of 32 processors
Hello world from processor compute-0-1.local, rank 17 out of 32 processors
Hello world from processor compute-0-1.local, rank 23 out of 32 processors
Hello world from processor compute-0-1.local, rank 25 out of 32 processors
Any idea?
Regards,
Mahmood
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Loading...