Discussion:
[OMPI users] running multiple executables under Torque/PBS PRO
Tom Rosmond
2009-11-10 21:46:07 UTC
Permalink
I want to run a number of MPI executables simultaneously in a PBS job.
For example on my system I do 'cat $PBS_NODEFILE' and get a list like
this:

n04
n04
n04
n04
n06
n06
n06
n06
n07
n07
n07
n07
n09
n09
n09
n09

i.e, 16 processors on 4 nodes. from which I can parse into file(s) as
desired. If I want to run prog1 on 1 node (4 processors), prog2 on 1
node (4 processors), and prog3 on 2 nodes (8 processors), I think the
syntax will be something like:

mpirun -np 4 --hostfile nodefile1 prog1: \
-np 4 --hostfile nodefile2 prog2: \
-np 8 --hostfile nodefile3 prog3

Where nodefile1, nodefile2, and nodefile3 are the lists extracted from
PBS_NODEFILE. Is this correct? Any suggestion/advice, (e.g. syntax of
the 'nodefiles'), is appreciated.

T. Rosmond
Ralph Castain
2009-11-10 21:54:05 UTC
Permalink
What version are you trying to do this with?

Reason I ask: in 1.3.x, we introduced relative node syntax for
specifying hosts to use. This would eliminate the need to create the
hostfiles.

You might do a "man orte_hosts" (assuming you installed the man pages)
and see what it says.

Ralph
Post by Tom Rosmond
I want to run a number of MPI executables simultaneously in a PBS job.
For example on my system I do 'cat $PBS_NODEFILE' and get a list like
n04
n04
n04
n04
n06
n06
n06
n06
n07
n07
n07
n07
n09
n09
n09
n09
i.e, 16 processors on 4 nodes. from which I can parse into file(s) as
desired. If I want to run prog1 on 1 node (4 processors), prog2 on 1
node (4 processors), and prog3 on 2 nodes (8 processors), I think the
mpirun -np 4 --hostfile nodefile1 prog1: \
-np 4 --hostfile nodefile2 prog2: \
-np 8 --hostfile nodefile3 prog3
Where nodefile1, nodefile2, and nodefile3 are the lists extracted from
PBS_NODEFILE. Is this correct? Any suggestion/advice, (e.g. syntax of
the 'nodefiles'), is appreciated.
T. Rosmond
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
Tom Rosmond
2009-11-11 00:48:06 UTC
Permalink
Ralph,

I am using 1.3.2, so the relative node syntax certainly seems the way to
go. However, I seem to be missing something. On the 'orte_hosts' man
page near the top is the simple example:

mpirun -pernode -host +n1,+n2 ./app1 : -host +n3,+n4 ./app2

I set up my job to run on 4 nodes (4 processors/node), and slavishly
copied this line into my PBS script. However, I got the following error
message:

--------------------------------------------------------------------------
mpirun found multiple applications specified on the command line, with
at least one that failed to specify the number of processes to execute.
When specifying multiple applications, you must specify how many
processes of each to launch via the -np argument.
--------------------------------------------------------------------------


I suspect an '-npernode 4' option, rather than '-pernode', is what I
really need, since I want 4 processes per node. Either way, however, I
don't think that explains the above error message. Correct? Do I still
need to extract node-name information from the PBS_NODEFILE for this
approach, and replace n1, n2, etc, with the actual node-names?

T. Rosmond
Post by Ralph Castain
What version are you trying to do this with?
Reason I ask: in 1.3.x, we introduced relative node syntax for
specifying hosts to use. This would eliminate the need to create the
hostfiles.
You might do a "man orte_hosts" (assuming you installed the man pages)
and see what it says.
Ralph
Post by Tom Rosmond
I want to run a number of MPI executables simultaneously in a PBS job.
For example on my system I do 'cat $PBS_NODEFILE' and get a list like
n04
n04
n04
n04
n06
n06
n06
n06
n07
n07
n07
n07
n09
n09
n09
n09
i.e, 16 processors on 4 nodes. from which I can parse into file(s) as
desired. If I want to run prog1 on 1 node (4 processors), prog2 on 1
node (4 processors), and prog3 on 2 nodes (8 processors), I think the
mpirun -np 4 --hostfile nodefile1 prog1: \
-np 4 --hostfile nodefile2 prog2: \
-np 8 --hostfile nodefile3 prog3
Where nodefile1, nodefile2, and nodefile3 are the lists extracted from
PBS_NODEFILE. Is this correct? Any suggestion/advice, (e.g. syntax of
the 'nodefiles'), is appreciated.
T. Rosmond
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
Ralph Castain
2009-11-11 00:56:40 UTC
Permalink
You can use the relative host syntax, but you cannot use a "pernode"
or "npernode" option when you have more than one application on the
cmd line. You have to specify the number of procs for each
application, as the error message says. :-)

IIRC, the reason was that we couldn't decide on how to interpret the
cmd line - though looking at this example, I think I could figure it
out. Anyway, that is the problem.

HTH
Ralph
Post by Tom Rosmond
Ralph,
I am using 1.3.2, so the relative node syntax certainly seems the way to
go. However, I seem to be missing something. On the 'orte_hosts' man
mpirun -pernode -host +n1,+n2 ./app1 : -host +n3,+n4 ./app2
I set up my job to run on 4 nodes (4 processors/node), and slavishly
copied this line into my PBS script. However, I got the following error
--------------------------------------------------------------------------
mpirun found multiple applications specified on the command line, with
at least one that failed to specify the number of processes to
execute.
When specifying multiple applications, you must specify how many
processes of each to launch via the -np argument.
--------------------------------------------------------------------------
I suspect an '-npernode 4' option, rather than '-pernode', is what I
really need, since I want 4 processes per node. Either way,
however, I
don't think that explains the above error message. Correct? Do I still
need to extract node-name information from the PBS_NODEFILE for this
approach, and replace n1, n2, etc, with the actual node-names?
T. Rosmond
Post by Ralph Castain
What version are you trying to do this with?
Reason I ask: in 1.3.x, we introduced relative node syntax for
specifying hosts to use. This would eliminate the need to create the
hostfiles.
You might do a "man orte_hosts" (assuming you installed the man pages)
and see what it says.
Ralph
Post by Tom Rosmond
I want to run a number of MPI executables simultaneously in a PBS job.
For example on my system I do 'cat $PBS_NODEFILE' and get a list like
n04
n04
n04
n04
n06
n06
n06
n06
n07
n07
n07
n07
n09
n09
n09
n09
i.e, 16 processors on 4 nodes. from which I can parse into file(s) as
desired. If I want to run prog1 on 1 node (4 processors), prog2 on 1
node (4 processors), and prog3 on 2 nodes (8 processors), I think the
mpirun -np 4 --hostfile nodefile1 prog1: \
-np 4 --hostfile nodefile2 prog2: \
-np 8 --hostfile nodefile3 prog3
Where nodefile1, nodefile2, and nodefile3 are the lists extracted from
PBS_NODEFILE. Is this correct? Any suggestion/advice, (e.g. syntax of
the 'nodefiles'), is appreciated.
T. Rosmond
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
Loading...