Discussion:
[OMPI users] Communicating MPI processes running in Docker containers in the same host by means of shared memory?
Jordi Guitart
2017-03-24 08:54:07 UTC
Permalink
Hello,

Docker allows several containers running in the same host to share the
same IPC namespace, thus they can share memory (see example here:
https://github.com/docker/docker/pull/8211#issuecomment-56873448). I
assume this could be used by OpenMPI to communicate MPI processes
running in different Docker containers in the same host by using shared
memory (sm or vader). However, I cannot make it work. I tried to force
mpirun to use shared memory (--mca btl self, sm) but it complains that
MPI processes running in other Docker containers are not reachable. It
seems like OpenMPI cannot recognize that shared memory is available
between containers. Has anybody any hint about how this could be worked out?

Thanks


http://bsc.es/disclaimer
John Hearns via users
2017-03-24 09:00:59 UTC
Permalink
Jordi,
this is not an answer to your question. However have you looked at
Singularity:
http://singularity.lbl.gov/
Post by Jordi Guitart
Hello,
Docker allows several containers running in the same host to share the
https://github.com/docker/docker/pull/8211#issuecomment-56873448). I
assume this could be used by OpenMPI to communicate MPI processes running
in different Docker containers in the same host by using shared memory (sm
or vader). However, I cannot make it work. I tried to force mpirun to use
shared memory (--mca btl self, sm) but it complains that MPI processes
running in other Docker containers are not reachable. It seems like OpenMPI
cannot recognize that shared memory is available between containers. Has
anybody any hint about how this could be worked out?
Thanks
http://bsc.es/disclaimer
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Jordi Guitart
2017-03-24 09:47:29 UTC
Permalink
Hello John,

Yes, in fact, I'm comparing Docker with Singularity regarding the
execution of MPI applications :-)

I'd like to make the comparison fairer by allowing Docker containers to
share memory.

Thanks
Post by John Hearns via users
Jordi,
this is not an answer to your question. However have you looked at
http://singularity.lbl.gov/
Hello,
Docker allows several containers running in the same host to share
the same IPC namespace, thus they can share memory (see example
https://github.com/docker/docker/pull/8211#issuecomment-56873448
<https://github.com/docker/docker/pull/8211#issuecomment-56873448>).
I assume this could be used by OpenMPI to communicate MPI
processes running in different Docker containers in the same host
by using shared memory (sm or vader). However, I cannot make it
work. I tried to force mpirun to use shared memory (--mca btl
self, sm) but it complains that MPI processes running in other
Docker containers are not reachable. It seems like OpenMPI cannot
recognize that shared memory is available between containers. Has
anybody any hint about how this could be worked out?
Thanks
http://bsc.es/disclaimer
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
<https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
http://bsc.es/disclaimer
Jeff Squyres (jsquyres)
2017-03-24 10:27:18 UTC
Permalink
If the Docker containers have different IP addresses, Open MPI will think that they are different "nodes" (or "hosts" or "servers" or whatever your favorite word is), and therefore will assume that they processes in these different containers are unable to share memory.

Meaning: no work has been done to make Open MPI understand Docker shared memory (i.e., you're the first person to ask about it). Pull requests would always be appreciated. ;-)
Post by Jordi Guitart
Hello John,
Yes, in fact, I'm comparing Docker with Singularity regarding the execution of MPI applications :-)
I'd like to make the comparison fairer by allowing Docker containers to share memory.
Thanks
Post by John Hearns via users
Jordi,
http://singularity.lbl.gov/
Hello,
Docker allows several containers running in the same host to share the same IPC namespace, thus they can share memory (see example here: https://github.com/docker/docker/pull/8211#issuecomment-56873448). I assume this could be used by OpenMPI to communicate MPI processes running in different Docker containers in the same host by using shared memory (sm or vader). However, I cannot make it work. I tried to force mpirun to use shared memory (--mca btl self, sm) but it complains that MPI processes running in other Docker containers are not reachable. It seems like OpenMPI cannot recognize that shared memory is available between containers. Has anybody any hint about how this could be worked out?
Thanks
http://bsc.es/disclaimer
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
WARNING / LEGAL TEXT: This message is intended only for the use of the individual or entity to which it is addressed and may contain information which is privileged, confidential, proprietary, or exempt from disclosure under applicable law. If you are not the intended recipient or the person responsible for delivering the message to the intended recipient, you are strictly prohibited from disclosing, distributing, copying, or in any way using this message. If you have received this communication in error, please notify the sender and destroy and delete any copies you may have received.
http://www.bsc.es/disclaimer
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
--
Jeff Squyres
***@cisco.com
Jordi Guitart
2017-03-24 10:41:51 UTC
Permalink
Hello Jeff,

Docker containers have different IP addresses, indeed, so now we know
why it does not work. I think that this could be a nice feature for
OpenMPI, so I'll probably issue a request for it ;-)

Thanks for your help.
Post by Jeff Squyres (jsquyres)
If the Docker containers have different IP addresses, Open MPI will think that they are different "nodes" (or "hosts" or "servers" or whatever your favorite word is), and therefore will assume that they processes in these different containers are unable to share memory.
Meaning: no work has been done to make Open MPI understand Docker shared memory (i.e., you're the first person to ask about it). Pull requests would always be appreciated. ;-)
Post by Jordi Guitart
Hello John,
Yes, in fact, I'm comparing Docker with Singularity regarding the execution of MPI applications :-)
I'd like to make the comparison fairer by allowing Docker containers to share memory.
Thanks
Post by John Hearns via users
Jordi,
http://singularity.lbl.gov/
Hello,
Docker allows several containers running in the same host to share the same IPC namespace, thus they can share memory (see example here: https://github.com/docker/docker/pull/8211#issuecomment-56873448). I assume this could be used by OpenMPI to communicate MPI processes running in different Docker containers in the same host by using shared memory (sm or vader). However, I cannot make it work. I tried to force mpirun to use shared memory (--mca btl self, sm) but it complains that MPI processes running in other Docker containers are not reachable. It seems like OpenMPI cannot recognize that shared memory is available between containers. Has anybody any hint about how this could be worked out?
Thanks
http://bsc.es/disclaimer
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
WARNING / LEGAL TEXT: This message is intended only for the use of the individual or entity to which it is addressed and may contain information which is privileged, confidential, proprietary, or exempt from disclosure under applicable law. If you are not the intended recipient or the person responsible for delivering the message to the intended recipient, you are strictly prohibited from disclosing, distributing, copying, or in any way using this message. If you have received this communication in error, please notify the sender and destroy and delete any copies you may have received.
http://www.bsc.es/disclaimer
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
http://bsc.es/disclaimer
Jeff Squyres (jsquyres)
2017-03-24 19:10:55 UTC
Permalink
Docker containers have different IP addresses, indeed, so now we know why it does not work. I think that this could be a nice feature for OpenMPI, so I'll probably issue a request for it ;-)
Cool.

I don't think any of the current developers in the Open MPI community are actively working with Docker (several are working with Singularity). Would this be a feature you'd be willing to submit a patch for?
--
Jeff Squyres
***@cisco.com
Jordi Guitart
2017-03-25 15:07:26 UTC
Permalink
Hi,

I don't have previous expertise on the source code of OpenMPI, so I
don't have a clear idea of the needed changes to implement this feature.
This probably requires some preliminary brainstorming to decide the most
appropriate way to inform OpenMPI that underlying nodes can share memory
even if they have different IP addresses.
Post by Jeff Squyres (jsquyres)
Docker containers have different IP addresses, indeed, so now we know why it does not work. I think that this could be a nice feature for OpenMPI, so I'll probably issue a request for it ;-)
Cool.
I don't think any of the current developers in the Open MPI community are actively working with Docker (several are working with Singularity). Would this be a feature you'd be willing to submit a patch for?
http://bsc.es/disclaimer
r***@open-mpi.org
2017-03-26 16:18:31 UTC
Permalink
There are a couple of things you’d need to resolve before worrying about code:

* IIRC, there is a separate ORTE daemon in each Docker container since OMPI thinks these are separate nodes. So you’ll first need to find some way those daemons can “discover” that they are on the same physical node. Is there something in the container environment that could be used for this purpose?

* Once the daemons can determine they are on a shared node, then you have to be able to create a shared memory backing file that can be accessed from within any of the containers. In other words, one of the procs in one of the containers is going to have to create the backing file, and then pass the filename to the other procs on that physical node. Then those other procs need to be able to open that file from within their container.

Are those doable in Docker? Note that Singularity doesn’t have these issues because it only abstracts the file system, and so every container “sees” that it is on the same node (and the ORTE daemon sits outside the container). This is why we push people in that direction for HPC with containers.

Ralph
Hi,
I don't have previous expertise on the source code of OpenMPI, so I don't have a clear idea of the needed changes to implement this feature. This probably requires some preliminary brainstorming to decide the most appropriate way to inform OpenMPI that underlying nodes can share memory even if they have different IP addresses.
Post by Jeff Squyres (jsquyres)
Docker containers have different IP addresses, indeed, so now we know why it does not work. I think that this could be a nice feature for OpenMPI, so I'll probably issue a request for it ;-)
Cool.
I don't think any of the current developers in the Open MPI community are actively working with Docker (several are working with Singularity). Would this be a feature you'd be willing to submit a patch for?
http://bsc.es/disclaimer
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Jordi Guitart
2017-03-29 13:37:37 UTC
Permalink
Hi,

I try to provide some insights about how this could be accomplished (see
inline). Do they seem feasible?
Post by r***@open-mpi.org
* IIRC, there is a separate ORTE daemon in each Docker container since OMPI thinks these are separate nodes. So you’ll first need to find some way those daemons can “discover” that they are on the same physical node. Is there something in the container environment that could be used for this purpose?
Following the idea is this example
(https://docs.docker.com/engine/userguide/networking/work-with-networks/#basic-container-networking-example),
you could create a bridge network connecting (some of) the containers
running in the same physical host. Each container could use the 'docker
network inspect' command to obtain the list of containers connected to
that bridge network. Note that this requires exposing the Docker socket
to the container, by bind-mounting it with the -v flag.
Post by r***@open-mpi.org
* Once the daemons can determine they are on a shared node, then you have to be able to create a shared memory backing file that can be accessed from within any of the containers. In other words, one of the procs in one of the containers is going to have to create the backing file, and then pass the filename to the other procs on that physical node. Then those other procs need to be able to open that file from within their container.
As shown here
(https://github.com/docker/docker/pull/8211#issuecomment-56873448), it
would be possible to start a container CONTAINER_ID that creates a
shared memory segment, and then create other containers using the
--ipc=container:CONTAINER_ID option, which can access the shared memory
segment from the first.
Post by r***@open-mpi.org
Are those doable in Docker? Note that Singularity doesn’t have these issues because it only abstracts the file system, and so every container “sees” that it is on the same node (and the ORTE daemon sits outside the container). This is why we push people in that direction for HPC with containers.
Ralph
Hi,
I don't have previous expertise on the source code of OpenMPI, so I don't have a clear idea of the needed changes to implement this feature. This probably requires some preliminary brainstorming to decide the most appropriate way to inform OpenMPI that underlying nodes can share memory even if they have different IP addresses.
Post by Jeff Squyres (jsquyres)
Docker containers have different IP addresses, indeed, so now we know why it does not work. I think that this could be a nice feature for OpenMPI, so I'll probably issue a request for it ;-)
Cool.
I don't think any of the current developers in the Open MPI community are actively working with Docker (several are working with Singularity). Would this be a feature you'd be willing to submit a patch for?
http://bsc.es/disclaimer
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
http://bsc.es/disclaimer

Loading...