[OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel

Discussion:

Brock Palen

2017-02-23 21:41:15 UTC

Is it possible to use mpirun / orte as a load balancer for running serial
jobs in parallel similar to GNU Parallel?
https://www.biostars.org/p/63816/

Reason is on any major HPC system you normally want to use a resource
manager launcher (TM, slurm etc) and not ssh like gnu parallel.

I recall there being a way to give OMPI a stack of work todo from the talk
at SC this year, but I can't figure it out if it does what I think it
should do.

Thanks,

Brock Palen
www.umich.edu/~brockp <http://www.umich.edu/%7Ebrockp>
Director Advanced Research Computing - TS
XSEDE Campus Champion
***@umich.edu
(734)936-1985

r***@open-mpi.org

2017-02-23 21:55:45 UTC

Permalink

You might want to try using the DVM (distributed virtual machine) mode in ORTE. You can start it on an allocation using the âorte-dvmâ cmd, and then submit jobs to it with âmpirun --hnp <foo>â, where foo is either the contact info printed out by orte-dvm, or the name of the file you told orte-dvm to put that info in. Youâll need to take it from OMPI master at this point.

Alternatively, you can get just the DVM bits by downloading the PMIx Reference Server (https://github.com/pmix/pmix-reference-server <https://github.com/pmix/pmix-reference-server>). Itâs just ORTE, but with it locked to the DVM operation. So a simple âpsrvrâ starts the machine, and then âprunâ executes cmds (supports all the orterun options, doesnât need to be told how to contact psrvr).

Both will allow you to run serial as well as parallel codes (so long as they are built against OMPI master). We are working on providing cross-version PMIx support - at that time, youâll be able to run OMPI v2.0 and above against either one as well.

HTH
Ralph

Post by Brock Palen
Is it possible to use mpirun / orte as a load balancer for running serial
jobs in parallel similar to GNU Parallel?
https://www.biostars.org/p/63816/ <https://www.biostars.org/p/63816/>
Reason is on any major HPC system you normally want to use a resource
manager launcher (TM, slurm etc) and not ssh like gnu parallel.
I recall there being a way to give OMPI a stack of work todo from the talk
at SC this year, but I can't figure it out if it does what I think it
should do.
Thanks,
Brock Palen
www.umich.edu/~brockp <http://www.umich.edu/%7Ebrockp>
Director Advanced Research Computing - TS
XSEDE Campus Champion
(734)936-1985
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Angel de Vicente

2017-02-27 12:58:07 UTC

Permalink

Hi,

Post by r***@open-mpi.org
You might want to try using the DVM (distributed virtual machine)
mode in ORTE. You can start it on an allocation using the “orte-dvm”
cmd, and then submit jobs to it with “mpirun --hnp <foo>”, where foo
is either the contact info printed out by orte-dvm, or the name of
the file you told orte-dvm to put that info in. You’ll need to take
it from OMPI master at this point.

this question looked interesting so I gave it a try. In a cluster with
Slurm I had no problem submitting a job which launched an orte-dvm
-report-uri ... and then use that file to launch jobs onto that virtual
machine via orte-submit.

To be useful to us at this point, I should be able to start executing
jobs if there are cores available and just hold them in a queue if the
cores are already filled. At this point this is not happenning, and if I
try to submit a second job while the previous one has not finished, I
get a message like:

,----
| DVM ready
| --------------------------------------------------------------------------
| All nodes which are allocated for this job are already filled.
| --------------------------------------------------------------------------
`----

With the DVM, is it possible to keep these jobs in some sort of queue,
so that they will be executed when the cores get free?

Thanks,

--
Ángel de Vicente
http://www.iac.es/galeria/angelv/
---------------------------------------------------------------------------------------------
ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de Datos, acceda a http://www.iac.es/disclaimer.php
WARNING: For more information on privacy and fulfilment of the Law concerning the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en

r***@open-mpi.org

2017-02-27 13:23:58 UTC

Permalink

Post by Angel de Vicente
Hi,

this question looked interesting so I gave it a try. In a cluster with
Slurm I had no problem submitting a job which launched an orte-dvm
-report-uri ... and then use that file to launch jobs onto that virtual
machine via orte-submit.
To be useful to us at this point, I should be able to start executing
jobs if there are cores available and just hold them in a queue if the
cores are already filled. At this point this is not happenning, and if I
try to submit a second job while the previous one has not finished, I
,----
| DVM ready
| --------------------------------------------------------------------------
| All nodes which are allocated for this job are already filled.
| --------------------------------------------------------------------------
`----
With the DVM, is it possible to keep these jobs in some sort of queue,
so that they will be executed when the cores get free?

It wouldn’t be hard to do so - as long as it was just a simple FIFO scheduler. I wouldn’t want it to get too complex.

Post by Angel de Vicente
Thanks,
--
Ángel de Vicente
http://www.iac.es/galeria/angelv/
---------------------------------------------------------------------------------------------
ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de Datos, acceda a http://www.iac.es/disclaimer.php
WARNING: For more information on privacy and fulfilment of the Law concerning the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Angel de Vicente

2017-02-27 13:33:44 UTC

Permalink

Hi,

Post by r***@open-mpi.org

Post by Angel de Vicente
With the DVM, is it possible to keep these jobs in some sort of queue,
so that they will be executed when the cores get free?

It wouldn’t be hard to do so - as long as it was just a simple FIFO scheduler. I wouldn’t want it to get too complex.

a simple FIFO should be probably enough. This can be useful as a simple
way to make a multi-core machine accessible to a small group of (friendly)
users, making sure that they don't oversubscribe the machine, but
without going the full route of installing/maintaining a full resource
manager.

Cheers,

Reuti

2017-02-27 13:54:33 UTC

Permalink

Hi,

Post by Angel de Vicente
Hi,

Post by Angel de Vicente
With the DVM, is it possible to keep these jobs in some sort of queue,
so that they will be executed when the cores get free?

It wouldnât be hard to do so - as long as it was just a simple FIFO scheduler. I wouldnât want it to get too complex.

At first I thought you want to run a queuing system inside a queuing system, but this looks like you want to replace the resource manager.

Under which user account the DVM daemons will run? Are all users using the same account?

-- Reuti

Angel de Vicente

2017-02-27 17:24:19 UTC

Permalink

Hi,

Post by Reuti
At first I thought you want to run a queuing system inside a queuing
system, but this looks like you want to replace the resource manager.

yes, if this could work reasonably well, we could do without the
resource manager.

Post by Reuti
Under which user account the DVM daemons will run? Are all users using the same account?

Well, if this could work only for one user, this could still be useful
as I could use it as I do now use GNU Parallel or a private Condor
system, where I can submit hundreds of jobs, but make sure they get
executed without oversubscribing.

For a small group of users if the DVM can run with my user and there is
no restriction on who can use it or if I somehow can authorize others to
use it (via an authority file or similar) that should be enough.

Thanks,

Reuti

2017-02-27 17:39:12 UTC

Permalink

Post by Angel de Vicente
[âŠ]
For a small group of users if the DVM can run with my user and there is
no restriction on who can use it or if I somehow can authorize others to
use it (via an authority file or similar) that should be enough.

AFAICS there is no user authorization at all. Everyone can hijack a running DVM once he knows the URI. The only problem might be, that all processes are running under the account of the user who started the DVM. I.e. output files have to go to the home directory of this user, as any other user can't write to his own directory any longer this way.

Running the DVM under root might help, but this would be a high risk that any faulty script might write to a place where sensible system information is stored and may leave the machine unusable afterwards.

My first attempts using DVM often leads to a terminated DVM once a process returned with a non-zero exit code. But once the DVM is gone, the queued jobs might be lost too I fear. I would wish that the DVM could be more forgivable (or this feature be adjustable what to do in case of a non-zero exit code).

-- Reuti

r***@open-mpi.org

2017-02-27 18:20:04 UTC

Permalink

Post by Reuti

Post by Angel de Vicente
[…]
For a small group of users if the DVM can run with my user and there is
no restriction on who can use it or if I somehow can authorize others to
use it (via an authority file or similar) that should be enough.

We can add some authorization protection, at least at the user/group level. One can resolve the directory issue by creating some place that has group authorities, and then requesting that to be the working directory.

Post by Reuti
Running the DVM under root might help, but this would be a high risk that any faulty script might write to a place where sensible system information is stored and may leave the machine unusable afterwards.

I would advise against that

Post by Reuti
My first attempts using DVM often leads to a terminated DVM once a process returned with a non-zero exit code. But once the DVM is gone, the queued jobs might be lost too I fear. I would wish that the DVM could be more forgivable (or this feature be adjustable what to do in case of a non-zero exit code).

We just fixed that issue the other day :-)

Post by Reuti
-- Reuti
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Mark Santcroos

2017-02-28 14:44:51 UTC

Permalink

Hi Brock, Angel, Reuti,

<shameless plug>

You might want to look at a tool we developed:
http://radical-cybertools.github.io/radical-pilot/index.html

This was actually one of the drivers for isolating the persistent ORTE DVM thats being discussed in this thread.

With RADICAL-Pilot you can use a Python API to launch an ORTE DVM on a computational resource and then run tasks on top of that.

Happy to answer questions off-list.

</>

Regards,

Mark