[OMPI users] MPI running in Unikernels

Discussion:

Keith Collister

2017-08-11 15:11:04 UTC

Hi,

I'm currently looking into whether it's possible to run MPI applications
within unikernels <https://en.wikipedia.org/wiki/Unikernel>.

The idea is to have multiple unikernels as virtual compute nodes in the
cluster, with physical nodes hosting the virtual nodes. As I understand it,
in a normal cluster mpirun would be invoked on a physical node and all the
compute nodes would be processes on the same machine. In contrast, with
unikernels the compute nodes would need to effectively run in virtual
machines.

I thought I was making progress when I found the "--ompi-server" switch for
mpirun: I figured I could spawn an OMPI server instance on a host, then
just invoke mpirun telling it to start the unikernel (in an emulator
(QEMU), instead of an application), passing the unikernel the uri of the
OMPI server. In my mind, the unikernels would be able to connect to the
server happily and all would be smooth.

In reality, it seems like mpirun doesn't "pass" anything to the application
it runs (I expected it to pass configuration options via the command line).
This implies all the configuration is stored somewhere on the host or as
environment variables, which would make it muuuch harder to configure the
unikernel. I couldn't find much documentation on this part of the process
though (how mpirun configures the application), so I figured I'd ask the
experts.

Is this sort of thing possible? Is the MPI ecosystem tied too tightly to
virtual nodes being run with mpirun to make it infeasible to run insulated
virtual nodes like this? Is there some command-line switch I've missed that
would make my life a lot easier?

Any advice/ideas/discussion would be much appreciated,
Keith

Gilles Gouaillardet

2017-08-11 16:13:28 UTC

Permalink

Keith,

MPI is running on both shared memory (e.g. one single node) and
distributed memory (e.g. several independent nodes).
here is what happens when you
mpirun -np <n> a.out

1. an orted process is remotely spawned to each node
2. mpirun and orted fork&exec a.out

unless a batch manager is used, remote spawn is implemented by SSH

note it is possible to
- use a custom SSH-like command
- use a custom command instead of the orted command
- use a wrapper when fork&exec'ing a.out

last but not least, an other option is direct run, but that requires
some support from the resource manager (e.g. a PMI(x) server)
for example, with SLURM, you can
srun a.out
and then slurm will remotely spawn a.out on all the nodes.

i am pretty sure Open MPI provides enough flexibility so with a
minimum a creativity, you can run a MPI app on unikernel.
if a ssh daemon runs in each unikernel, that should be straightforward.
if you want to run one orted and several a.out per unikernel, a bit of
creativity is needed (e.g. scripting and wrapping)
if you want to run a single a.out per unikernel, that is a bit
trickier since you have to somehow implement a PMIx server within each
unikernel

Cheers,

Gilles

Post by Keith Collister
Hi,
I'm currently looking into whether it's possible to run MPI applications
within unikernels.
The idea is to have multiple unikernels as virtual compute nodes in the
cluster, with physical nodes hosting the virtual nodes. As I understand it,
in a normal cluster mpirun would be invoked on a physical node and all the
compute nodes would be processes on the same machine. In contrast, with
unikernels the compute nodes would need to effectively run in virtual
machines.
I thought I was making progress when I found the "--ompi-server" switch for
mpirun: I figured I could spawn an OMPI server instance on a host, then just
invoke mpirun telling it to start the unikernel (in an emulator (QEMU),
instead of an application), passing the unikernel the uri of the OMPI
server. In my mind, the unikernels would be able to connect to the server
happily and all would be smooth.
In reality, it seems like mpirun doesn't "pass" anything to the application
it runs (I expected it to pass configuration options via the command line).
This implies all the configuration is stored somewhere on the host or as
environment variables, which would make it muuuch harder to configure the
unikernel. I couldn't find much documentation on this part of the process
though (how mpirun configures the application), so I figured I'd ask the
experts.
Is this sort of thing possible? Is the MPI ecosystem tied too tightly to
virtual nodes being run with mpirun to make it infeasible to run insulated
virtual nodes like this? Is there some command-line switch I've missed that
would make my life a lot easier?
Any advice/ideas/discussion would be much appreciated,
Keith
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users

Keith Collister

2017-08-13 13:00:02 UTC

Permalink

Gilles,

Thank you for your reply. Your last point is what I'm trying to achieve, as
unikernels are only really able to run one process at a time.

Given that this seems to require a "heavier" system than what's available
inside a unikernel, I think it's going to be either impossible or an
inordinate amount of work to get this going.

Thanks for your time,
Keith

On 11 August 2017 at 17:13, Gilles Gouaillardet <

Post by Gilles Gouaillardet
Keith,
MPI is running on both shared memory (e.g. one single node) and
distributed memory (e.g. several independent nodes).
here is what happens when you
mpirun -np <n> a.out
1. an orted process is remotely spawned to each node
2. mpirun and orted fork&exec a.out
unless a batch manager is used, remote spawn is implemented by SSH
note it is possible to
- use a custom SSH-like command
- use a custom command instead of the orted command
- use a wrapper when fork&exec'ing a.out
last but not least, an other option is direct run, but that requires
some support from the resource manager (e.g. a PMI(x) server)
for example, with SLURM, you can
srun a.out
and then slurm will remotely spawn a.out on all the nodes.
i am pretty sure Open MPI provides enough flexibility so with a
minimum a creativity, you can run a MPI app on unikernel.
if a ssh daemon runs in each unikernel, that should be straightforward.
if you want to run one orted and several a.out per unikernel, a bit of
creativity is needed (e.g. scripting and wrapping)
if you want to run a single a.out per unikernel, that is a bit
trickier since you have to somehow implement a PMIx server within each
unikernel
Cheers,
Gilles

it,

Post by Keith Collister
in a normal cluster mpirun would be invoked on a physical node and all

the

Post by Keith Collister
compute nodes would be processes on the same machine. In contrast, with
unikernels the compute nodes would need to effectively run in virtual
machines.
I thought I was making progress when I found the "--ompi-server" switch

for

Post by Keith Collister
mpirun: I figured I could spawn an OMPI server instance on a host, then

just

Post by Keith Collister
invoke mpirun telling it to start the unikernel (in an emulator (QEMU),
instead of an application), passing the unikernel the uri of the OMPI
server. In my mind, the unikernels would be able to connect to the server
happily and all would be smooth.
In reality, it seems like mpirun doesn't "pass" anything to the

application

Post by Keith Collister
it runs (I expected it to pass configuration options via the command

line).

Post by Keith Collister
This implies all the configuration is stored somewhere on the host or as
environment variables, which would make it muuuch harder to configure the
unikernel. I couldn't find much documentation on this part of the process
though (how mpirun configures the application), so I figured I'd ask the
experts.
Is this sort of thing possible? Is the MPI ecosystem tied too tightly to
virtual nodes being run with mpirun to make it infeasible to run

insulated

Post by Keith Collister
virtual nodes like this? Is there some command-line switch I've missed

that

Post by Keith Collister
would make my life a lot easier?
Any advice/ideas/discussion would be much appreciated,
Keith
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users

_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users