Discussion:
[OMPI users] MPI Behaviour Question
Mark Potter
2016-10-11 12:56:22 UTC
Permalink
This question is related to OpenMPI 2.0.1 compiled with GCC 4.8.2 on
RHEL 6.8 using Torque 6.0.2 with Moab 9.0.2. To be clear, I am an
administrator and not a coder and I suspect this is expected behavior
but I have been asked by a client to explain why this is happening.

Using Torque, the following command returns the hostname of the first
node only, regardless of how the nodes/cores are split up:

mpirun -np 20 echo "Hello from $HOSTNAME"

(the behaviour is the same with "echo $(hostname))

The Torque script looks like this:

#PBS -V
#PBS -N test-job
#PBS -l nodes=2:ppn=16
#PBS -e ERROR
#PBS -o OUTPUT


cd $PBS_O_WORKDIR
date
cat $PBS_NODEFILE

mpirun -np32 echo "Hello from $HOSTNAME"

If the echo statement is replaced with "hostname" then a proper
response is received from all nodes.

While I know there are better ways to test OpenMPI's functionality,
like compiling and using the programs in examples/, this is the method
a specific client chose. I was using both the examples and a Torque job
script calling just "hostname" as a command and not using echo and the
client was using the script above. It took some doing to figure out why
he thought it wasn't working and all my tests were successful and when
I figured it, he wanted an explanation that's beyond my current
knowledge. Any help towards explaining the behaviour would be greatly
appreciated.

--
Regards,

Mark L. Potter
Senior Consultant
PCPC Direct, Ltd.
O: 713-344-0952 
M: 713-965-4133
S: ***@pcpcdirect.com
Gilles Gouaillardet
2016-10-11 13:17:26 UTC
Permalink
Mark,

My understanding is that shell meta expansion occurs once on the first node, so from an Open MPI point of view, you really invoke
mpirun echo node0
I suspect
mpirun echo 'Hello from $(hostname)'
Is what you want to do
I do not know about
mpirun echo 'Hello from $HOSTNAME'
$HOSTNAME might be passed by the first node to all tasks, and hence might not have the value you expect on all the nodes
Feel free to
mpirun env | grep ^HOSTNAME=
To check if the HOSTNAME variable is set to what you expect

/* i an afk, so i cannot check that right now ... */


Cheers,

Gilles
Post by Mark Potter
This question is related to OpenMPI 2.0.1 compiled with GCC 4.8.2 on
RHEL 6.8 using Torque 6.0.2 with Moab 9.0.2. To be clear, I am an
administrator and not a coder and I suspect this is expected behavior
but I have been asked by a client to explain why this is happening.
Using Torque, the following command returns the hostname of the first
mpirun -np 20 echo "Hello from $HOSTNAME"
(the behaviour is the same with "echo $(hostname))
#PBS -V
#PBS -N test-job
#PBS -l nodes=2:ppn=16
#PBS -e ERROR
#PBS -o OUTPUT
cd $PBS_O_WORKDIR
date
cat $PBS_NODEFILE
mpirun -np32 echo "Hello from $HOSTNAME"
If the echo statement is replaced with "hostname" then a proper
response is received from all nodes.
While I know there are better ways to test OpenMPI's functionality,
like compiling and using the programs in examples/, this is the method
a specific client chose. I was using both the examples and a Torque job
script calling just "hostname" as a command and not using echo and the
client was using the script above. It took some doing to figure out why
he thought it wasn't working and all my tests were successful and when
I figured it, he wanted an explanation that's beyond my current
knowledge. Any help towards explaining the behaviour would be greatly
appreciated.
--
Regards,
Mark L. Potter
Senior Consultant
PCPC Direct, Ltd.
O: 713-344-0952 
M: 713-965-4133
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Mark Potter
2016-10-12 15:13:09 UTC
Permalink
After the responses I did more testing. Even $(hostname) and `hostname`
get expanded on the first node. A script using echo (then any of them
from the environment variable to the backticks works. I'm guessing all
shell expansion on the CLI happens on the first node, from my limited
testing. That explanation makes sense and fits the results. It's easy
enough to explain as well!
Post by Gilles Gouaillardet
Mark,
My understanding is that shell meta expansion occurs once on the
first node, so from an Open MPI point of view, you really invoke
mpirun echo node0
I suspect
mpirun echo 'Hello from $(hostname)'
Is what you want to do
I do not know about
mpirun echo 'Hello from $HOSTNAME'
$HOSTNAME might be passed by the first node to all tasks, and hence
might not have the value you expect on all the nodes
Feel free to
mpirun env | grep ^HOSTNAME=
To check if the HOSTNAME variable is set to what you expect
/* i an afk, so i cannot check that right now ... */
Cheers,
Gilles
Post by Mark Potter
This question is related to OpenMPI 2.0.1 compiled with GCC 4.8.2 on
RHEL 6.8 using Torque 6.0.2 with Moab 9.0.2. To be clear, I am an
administrator and not a coder and I suspect this is expected
behavior
but I have been asked by a client to explain why this is happening.
Using Torque, the following command returns the hostname of the first
mpirun -np 20 echo "Hello from $HOSTNAME"
(the behaviour is the same with "echo $(hostname))
#PBS -V
#PBS -N test-job
#PBS -l nodes=2:ppn=16
#PBS -e ERROR
#PBS -o OUTPUT
cd $PBS_O_WORKDIR
date
cat $PBS_NODEFILE
mpirun -np32 echo "Hello from $HOSTNAME"
If the echo statement is replaced with "hostname" then a proper
response is received from all nodes.
While I know there are better ways to test OpenMPI's functionality,
like compiling and using the programs in examples/, this is the method
a specific client chose. I was using both the examples and a Torque job
script calling just "hostname" as a command and not using echo and the
client was using the script above. It took some doing to figure out why
he thought it wasn't working and all my tests were successful and when
I figured it, he wanted an explanation that's beyond my current
knowledge. Any help towards explaining the behaviour would be greatly
appreciated.
--
Regards,

Mark L. Potter
Senior Consultant
PCPC Direct, Ltd.
O: 713-344-0952 
M: 713-965-4133
S: ***@pcpcdirect.com
Reuti
2016-10-11 13:27:14 UTC
Permalink
Hi,
Post by Mark Potter
This question is related to OpenMPI 2.0.1 compiled with GCC 4.8.2 on
RHEL 6.8 using Torque 6.0.2 with Moab 9.0.2. To be clear, I am an
administrator and not a coder and I suspect this is expected behavior
but I have been asked by a client to explain why this is happening.
Using Torque, the following command returns the hostname of the first
mpirun -np 20 echo "Hello from $HOSTNAME"
The $HOSTNAME will be expanded and used as argument before `mpirun` even starts. Instead it has to be evaluated on the nodes:

$ mpirun bash -c "echo \$HOSTNAME"
Post by Mark Potter
(the behaviour is the same with "echo $(hostname))
#PBS -V
#PBS -N test-job
#PBS -l nodes=2:ppn=16
#PBS -e ERROR
#PBS -o OUTPUT
cd $PBS_O_WORKDIR
date
cat $PBS_NODEFILE
mpirun -np32 echo "Hello from $HOSTNAME"
If the echo statement is replaced with "hostname" then a proper
response is received from all nodes.
While I know there are better ways to test OpenMPI's functionality,
like compiling and using the programs in examples/, this is the method
a specific client chose.
There are small "Hello world" programs like here:

http://mpitutorial.com/tutorials/mpi-hello-world/

to test whether e.g. the libraries are found at runtime by the application(s).

-- Reuti
Post by Mark Potter
I was using both the examples and a Torque job
script calling just "hostname" as a command and not using echo and the
client was using the script above. It took some doing to figure out why
he thought it wasn't working and all my tests were successful and when
I figured it, he wanted an explanation that's beyond my current
knowledge. Any help towards explaining the behaviour would be greatly
appreciated.
--
Regards,
Mark L. Potter
Senior Consultant
PCPC Direct, Ltd.
O: 713-344-0952
M: 713-965-4133
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Mark Potter
2016-10-12 15:15:42 UTC
Permalink
Thanks, between yourself and the Gilles I've got plenty of information
to use in an explanation! And thanks for the hello world link, I've
used the examples that come with OpenMPI but hadn't used that one.
Usually I end up assuming it works and just running HPL. ;)
Post by Reuti
Hi,
Post by Mark Potter
This question is related to OpenMPI 2.0.1 compiled with GCC 4.8.2 on
RHEL 6.8 using Torque 6.0.2 with Moab 9.0.2. To be clear, I am an
administrator and not a coder and I suspect this is expected
behavior
but I have been asked by a client to explain why this is happening.
Using Torque, the following command returns the hostname of the first
mpirun -np 20 echo "Hello from $HOSTNAME"
The $HOSTNAME will be expanded and used as argument before `mpirun`
$ mpirun bash -c "echo \$HOSTNAME"
Post by Mark Potter
(the behaviour is the same with "echo $(hostname))
#PBS -V
#PBS -N test-job
#PBS -l nodes=2:ppn=16
#PBS -e ERROR
#PBS -o OUTPUT
cd $PBS_O_WORKDIR
date
cat $PBS_NODEFILE
mpirun -np32 echo "Hello from $HOSTNAME"
If the echo statement is replaced with "hostname" then a proper
response is received from all nodes.
While I know there are better ways to test OpenMPI's functionality,
like compiling and using the programs in examples/, this is the method
a specific client chose.
http://mpitutorial.com/tutorials/mpi-hello-world/
to test whether e.g. the libraries are found at runtime by the
application(s).
-- Reuti
Post by Mark Potter
I was using both the examples and a Torque job
script calling just "hostname" as a command and not using echo and the
client was using the script above. It took some doing to figure out why
he thought it wasn't working and all my tests were successful and when
I figured it, he wanted an explanation that's beyond my current
knowledge. Any help towards explaining the behaviour would be greatly
appreciated.
--
Regards,

Mark L. Potter
Senior Consultant
PCPC Direct, Ltd.
O: 713-344-0952 
M: 713-965-4133
S: ***@pcpcdirect.com

Loading...