Discussion:
[OMPI users] OpenMPI and Singularity
Bennet Fauber
2017-02-17 23:49:20 UTC
Permalink
I am wishing to follow the instructions on the Singularity web site,

http://singularity.lbl.gov/docs-hpc

to test Singularity and OMPI on our cluster. My previously normal
configure for the 1.x series looked like this.

./configure --prefix=/usr/local \
--mandir=${PREFIX}/share/man \
--with-tm --with-verbs \
--disable-dlopen --enable-shared
CC=gcc CXX=g++ FC=gfortran

I have a couple of wonderments.

First, I presume it will be best to have the same version of OMPI
inside the container as out, but how sensitive will it be to minor
versions? All 2.1.x version should be fine, but not mix 2.1.x outside
with 2.2.x inside or vice-versa (might be backward compatible but not
forward)?

Second, if someone builds OMPI inside their container on an external
system, without tm and verbs, then brings the container to our system,
will the tm and verbs be handled by the calling mpirun from the host
system, and the OMPI inside the container won't care? Will not having
those inside the container cause them to be suppressed outside?

Thanks in advance, -- bennet
r***@open-mpi.org
2017-02-18 00:24:11 UTC
Permalink
The embedded Singularity support hasn’t made it into the OMPI 2.x release series yet, though OMPI will still work within a Singularity container anyway.

Compatibility across the container boundary is always a problem, as your examples illustrate. If the system is using one OMPI version and the container is using another, then the only concern is compatibility across the container boundary of the process-to-ORTE daemon communication. In the OMPI 2.x series and beyond, this is done with PMIx. OMPI v2.0 is based on PMIx v1.x, and so will OMPI v2.1. Thus, there is no compatibility issue there. However, that statement is _not_ true for OMPI v1.10 and earlier series.

Future OMPI versions will utilize PMIx v2 and above, which include a cross-version compatibility layer. Thus, you shouldn’t have any issues mixing and matching OMPI versions from this regard.

However, your second example is a perfect illustration of where containerization can break down. If you build your container on a system that doesn’t have (for example) tm and verbs installed on it, then those OMPI components will not be built. The tm component won’t matter as the system version of mpirun will be executing, and it presumably knows how to interact with Torque.

However, if you run that container on a system that has verbs, your application won’t be able to utilize the verbs support because those components were never compiled. Note that the converse is not true: if you build your container on a system that has verbs installed, you can then run it on a system that doesn’t have verbs support and those components will dynamically disqualify themselves.

Remember, you only need the verbs headers to be installed - you don’t have to build on a machine that actually has a verbs-supporting NIC installed (this is how the distributions get around the problem). Thus, it isn’t hard to avoid this portability problem - you just need to think ahead a bit.

HTH
Ralph
Post by Bennet Fauber
I am wishing to follow the instructions on the Singularity web site,
http://singularity.lbl.gov/docs-hpc
to test Singularity and OMPI on our cluster. My previously normal
configure for the 1.x series looked like this.
./configure --prefix=/usr/local \
--mandir=${PREFIX}/share/man \
--with-tm --with-verbs \
--disable-dlopen --enable-shared
CC=gcc CXX=g++ FC=gfortran
I have a couple of wonderments.
First, I presume it will be best to have the same version of OMPI
inside the container as out, but how sensitive will it be to minor
versions? All 2.1.x version should be fine, but not mix 2.1.x outside
with 2.2.x inside or vice-versa (might be backward compatible but not
forward)?
Second, if someone builds OMPI inside their container on an external
system, without tm and verbs, then brings the container to our system,
will the tm and verbs be handled by the calling mpirun from the host
system, and the OMPI inside the container won't care? Will not having
those inside the container cause them to be suppressed outside?
Thanks in advance, -- bennet
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Bennet Fauber
2017-02-18 00:34:38 UTC
Permalink
Ralph.

I will be building from the Master branch at github.com for testing
purposes. We are not 'supporting' Singularity container creation, but
we do hope to be able to offer some guidance, so I think we can
finesse the PMIx version, yes?

That is good to know about the verbs headers being the only thing
needed; thanks for that detail. Sometimes the library also needs to
be present.

Also very good to know that the host mpirun will start processes, as
we are using cgroups, and if the processes get started by a
non-tm-supporting MPI, they will be outside the proper cgroup.

So, just to recap, if I install from the current master at
http://github.com/open-mpi/ompi.git on the host system and within the
container, I copy the verbs headers into the container, then configure
and build OMPI within the container and ignore TM support, I should be
able to copy the container to the cluster and run it with verbs and
the system OMPI using tm.

If a user were to build without the verbs support, it would still run,
but it would fall back to non-verbs communication, so it would just be
commensurately slower.

Let me know if I've garbled things. Otherwise, wish me luck, and have
a good weekend!

Thanks, -- bennet
Post by r***@open-mpi.org
The embedded Singularity support hasn’t made it into the OMPI 2.x release series yet, though OMPI will still work within a Singularity container anyway.
Compatibility across the container boundary is always a problem, as your examples illustrate. If the system is using one OMPI version and the container is using another, then the only concern is compatibility across the container boundary of the process-to-ORTE daemon communication. In the OMPI 2.x series and beyond, this is done with PMIx. OMPI v2.0 is based on PMIx v1.x, and so will OMPI v2.1. Thus, there is no compatibility issue there. However, that statement is _not_ true for OMPI v1.10 and earlier series.
Future OMPI versions will utilize PMIx v2 and above, which include a cross-version compatibility layer. Thus, you shouldn’t have any issues mixing and matching OMPI versions from this regard.
However, your second example is a perfect illustration of where containerization can break down. If you build your container on a system that doesn’t have (for example) tm and verbs installed on it, then those OMPI components will not be built. The tm component won’t matter as the system version of mpirun will be executing, and it presumably knows how to interact with Torque.
However, if you run that container on a system that has verbs, your application won’t be able to utilize the verbs support because those components were never compiled. Note that the converse is not true: if you build your container on a system that has verbs installed, you can then run it on a system that doesn’t have verbs support and those components will dynamically disqualify themselves.
Remember, you only need the verbs headers to be installed - you don’t have to build on a machine that actually has a verbs-supporting NIC installed (this is how the distributions get around the problem). Thus, it isn’t hard to avoid this portability problem - you just need to think ahead a bit.
HTH
Ralph
Post by Bennet Fauber
I am wishing to follow the instructions on the Singularity web site,
http://singularity.lbl.gov/docs-hpc
to test Singularity and OMPI on our cluster. My previously normal
configure for the 1.x series looked like this.
./configure --prefix=/usr/local \
--mandir=${PREFIX}/share/man \
--with-tm --with-verbs \
--disable-dlopen --enable-shared
CC=gcc CXX=g++ FC=gfortran
I have a couple of wonderments.
First, I presume it will be best to have the same version of OMPI
inside the container as out, but how sensitive will it be to minor
versions? All 2.1.x version should be fine, but not mix 2.1.x outside
with 2.2.x inside or vice-versa (might be backward compatible but not
forward)?
Second, if someone builds OMPI inside their container on an external
system, without tm and verbs, then brings the container to our system,
will the tm and verbs be handled by the calling mpirun from the host
system, and the OMPI inside the container won't care? Will not having
those inside the container cause them to be suppressed outside?
Thanks in advance, -- bennet
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
r***@open-mpi.org
2017-02-18 04:20:40 UTC
Permalink
I -think- that is correct, but you may need the verbs library as well - I honestly don’t remember if the configury checks for functions in the library or not. If so, then you’ll need that wherever you build OMPI, but everything else is accurate

Good luck - and let us know how it goes!
Ralph
Post by Bennet Fauber
Ralph.
I will be building from the Master branch at github.com for testing
purposes. We are not 'supporting' Singularity container creation, but
we do hope to be able to offer some guidance, so I think we can
finesse the PMIx version, yes?
That is good to know about the verbs headers being the only thing
needed; thanks for that detail. Sometimes the library also needs to
be present.
Also very good to know that the host mpirun will start processes, as
we are using cgroups, and if the processes get started by a
non-tm-supporting MPI, they will be outside the proper cgroup.
So, just to recap, if I install from the current master at
http://github.com/open-mpi/ompi.git on the host system and within the
container, I copy the verbs headers into the container, then configure
and build OMPI within the container and ignore TM support, I should be
able to copy the container to the cluster and run it with verbs and
the system OMPI using tm.
If a user were to build without the verbs support, it would still run,
but it would fall back to non-verbs communication, so it would just be
commensurately slower.
Let me know if I've garbled things. Otherwise, wish me luck, and have
a good weekend!
Thanks, -- bennet
Post by r***@open-mpi.org
The embedded Singularity support hasn’t made it into the OMPI 2.x release series yet, though OMPI will still work within a Singularity container anyway.
Compatibility across the container boundary is always a problem, as your examples illustrate. If the system is using one OMPI version and the container is using another, then the only concern is compatibility across the container boundary of the process-to-ORTE daemon communication. In the OMPI 2.x series and beyond, this is done with PMIx. OMPI v2.0 is based on PMIx v1.x, and so will OMPI v2.1. Thus, there is no compatibility issue there. However, that statement is _not_ true for OMPI v1.10 and earlier series.
Future OMPI versions will utilize PMIx v2 and above, which include a cross-version compatibility layer. Thus, you shouldn’t have any issues mixing and matching OMPI versions from this regard.
However, your second example is a perfect illustration of where containerization can break down. If you build your container on a system that doesn’t have (for example) tm and verbs installed on it, then those OMPI components will not be built. The tm component won’t matter as the system version of mpirun will be executing, and it presumably knows how to interact with Torque.
However, if you run that container on a system that has verbs, your application won’t be able to utilize the verbs support because those components were never compiled. Note that the converse is not true: if you build your container on a system that has verbs installed, you can then run it on a system that doesn’t have verbs support and those components will dynamically disqualify themselves.
Remember, you only need the verbs headers to be installed - you don’t have to build on a machine that actually has a verbs-supporting NIC installed (this is how the distributions get around the problem). Thus, it isn’t hard to avoid this portability problem - you just need to think ahead a bit.
HTH
Ralph
Post by Bennet Fauber
I am wishing to follow the instructions on the Singularity web site,
http://singularity.lbl.gov/docs-hpc
to test Singularity and OMPI on our cluster. My previously normal
configure for the 1.x series looked like this.
./configure --prefix=/usr/local \
--mandir=${PREFIX}/share/man \
--with-tm --with-verbs \
--disable-dlopen --enable-shared
CC=gcc CXX=g++ FC=gfortran
I have a couple of wonderments.
First, I presume it will be best to have the same version of OMPI
inside the container as out, but how sensitive will it be to minor
versions? All 2.1.x version should be fine, but not mix 2.1.x outside
with 2.2.x inside or vice-versa (might be backward compatible but not
forward)?
Second, if someone builds OMPI inside their container on an external
system, without tm and verbs, then brings the container to our system,
will the tm and verbs be handled by the calling mpirun from the host
system, and the OMPI inside the container won't care? Will not having
those inside the container cause them to be suppressed outside?
Thanks in advance, -- bennet
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Bennet Fauber
2017-02-20 17:13:31 UTC
Permalink
I got mixed results when bringing a container that doesn't have the IB
and Torque libraries compiled into the OMPI inside the container to a
cluster where it does.

The short summary is that mutlinode communication seems unreliable. I
can mostly get up to 8 procs, two-per-node, to run, but beyond that
not. In a couple of cases, a particular node seemed able to cause a
problem. I am going to try again making the configure line inside the
container the same as outside, but I have to chase down the IB and
Torque to do so.

If you're interested in how it breaks, I can send you some more
information. If there are diagnostics you would like, I can try to
provide those. I will be gone starting Thu for a week.

-- bennet
Post by r***@open-mpi.org
I -think- that is correct, but you may need the verbs library as well - I honestly don’t remember if the configury checks for functions in the library or not. If so, then you’ll need that wherever you build OMPI, but everything else is accurate
Good luck - and let us know how it goes!
Ralph
Post by Bennet Fauber
Ralph.
I will be building from the Master branch at github.com for testing
purposes. We are not 'supporting' Singularity container creation, but
we do hope to be able to offer some guidance, so I think we can
finesse the PMIx version, yes?
That is good to know about the verbs headers being the only thing
needed; thanks for that detail. Sometimes the library also needs to
be present.
Also very good to know that the host mpirun will start processes, as
we are using cgroups, and if the processes get started by a
non-tm-supporting MPI, they will be outside the proper cgroup.
So, just to recap, if I install from the current master at
http://github.com/open-mpi/ompi.git on the host system and within the
container, I copy the verbs headers into the container, then configure
and build OMPI within the container and ignore TM support, I should be
able to copy the container to the cluster and run it with verbs and
the system OMPI using tm.
If a user were to build without the verbs support, it would still run,
but it would fall back to non-verbs communication, so it would just be
commensurately slower.
Let me know if I've garbled things. Otherwise, wish me luck, and have
a good weekend!
Thanks, -- bennet
Post by r***@open-mpi.org
The embedded Singularity support hasn’t made it into the OMPI 2.x release series yet, though OMPI will still work within a Singularity container anyway.
Compatibility across the container boundary is always a problem, as your examples illustrate. If the system is using one OMPI version and the container is using another, then the only concern is compatibility across the container boundary of the process-to-ORTE daemon communication. In the OMPI 2.x series and beyond, this is done with PMIx. OMPI v2.0 is based on PMIx v1.x, and so will OMPI v2.1. Thus, there is no compatibility issue there. However, that statement is _not_ true for OMPI v1.10 and earlier series.
Future OMPI versions will utilize PMIx v2 and above, which include a cross-version compatibility layer. Thus, you shouldn’t have any issues mixing and matching OMPI versions from this regard.
However, your second example is a perfect illustration of where containerization can break down. If you build your container on a system that doesn’t have (for example) tm and verbs installed on it, then those OMPI components will not be built. The tm component won’t matter as the system version of mpirun will be executing, and it presumably knows how to interact with Torque.
However, if you run that container on a system that has verbs, your application won’t be able to utilize the verbs support because those components were never compiled. Note that the converse is not true: if you build your container on a system that has verbs installed, you can then run it on a system that doesn’t have verbs support and those components will dynamically disqualify themselves.
Remember, you only need the verbs headers to be installed - you don’t have to build on a machine that actually has a verbs-supporting NIC installed (this is how the distributions get around the problem). Thus, it isn’t hard to avoid this portability problem - you just need to think ahead a bit.
HTH
Ralph
Post by Bennet Fauber
I am wishing to follow the instructions on the Singularity web site,
http://singularity.lbl.gov/docs-hpc
to test Singularity and OMPI on our cluster. My previously normal
configure for the 1.x series looked like this.
./configure --prefix=/usr/local \
--mandir=${PREFIX}/share/man \
--with-tm --with-verbs \
--disable-dlopen --enable-shared
CC=gcc CXX=g++ FC=gfortran
I have a couple of wonderments.
First, I presume it will be best to have the same version of OMPI
inside the container as out, but how sensitive will it be to minor
versions? All 2.1.x version should be fine, but not mix 2.1.x outside
with 2.2.x inside or vice-versa (might be backward compatible but not
forward)?
Second, if someone builds OMPI inside their container on an external
system, without tm and verbs, then brings the container to our system,
will the tm and verbs be handled by the calling mpirun from the host
system, and the OMPI inside the container won't care? Will not having
those inside the container cause them to be suppressed outside?
Thanks in advance, -- bennet
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
r***@open-mpi.org
2017-02-20 18:10:49 UTC
Permalink
If you can send us some more info on how it breaks, that would be helpful. I’ll file it as an issue so we can track things

Thanks
Ralph
Post by Bennet Fauber
I got mixed results when bringing a container that doesn't have the IB
and Torque libraries compiled into the OMPI inside the container to a
cluster where it does.
The short summary is that mutlinode communication seems unreliable. I
can mostly get up to 8 procs, two-per-node, to run, but beyond that
not. In a couple of cases, a particular node seemed able to cause a
problem. I am going to try again making the configure line inside the
container the same as outside, but I have to chase down the IB and
Torque to do so.
If you're interested in how it breaks, I can send you some more
information. If there are diagnostics you would like, I can try to
provide those. I will be gone starting Thu for a week.
-- bennet
Post by r***@open-mpi.org
I -think- that is correct, but you may need the verbs library as well - I honestly don’t remember if the configury checks for functions in the library or not. If so, then you’ll need that wherever you build OMPI, but everything else is accurate
Good luck - and let us know how it goes!
Ralph
Post by Bennet Fauber
Ralph.
I will be building from the Master branch at github.com for testing
purposes. We are not 'supporting' Singularity container creation, but
we do hope to be able to offer some guidance, so I think we can
finesse the PMIx version, yes?
That is good to know about the verbs headers being the only thing
needed; thanks for that detail. Sometimes the library also needs to
be present.
Also very good to know that the host mpirun will start processes, as
we are using cgroups, and if the processes get started by a
non-tm-supporting MPI, they will be outside the proper cgroup.
So, just to recap, if I install from the current master at
http://github.com/open-mpi/ompi.git on the host system and within the
container, I copy the verbs headers into the container, then configure
and build OMPI within the container and ignore TM support, I should be
able to copy the container to the cluster and run it with verbs and
the system OMPI using tm.
If a user were to build without the verbs support, it would still run,
but it would fall back to non-verbs communication, so it would just be
commensurately slower.
Let me know if I've garbled things. Otherwise, wish me luck, and have
a good weekend!
Thanks, -- bennet
Post by r***@open-mpi.org
The embedded Singularity support hasn’t made it into the OMPI 2.x release series yet, though OMPI will still work within a Singularity container anyway.
Compatibility across the container boundary is always a problem, as your examples illustrate. If the system is using one OMPI version and the container is using another, then the only concern is compatibility across the container boundary of the process-to-ORTE daemon communication. In the OMPI 2.x series and beyond, this is done with PMIx. OMPI v2.0 is based on PMIx v1.x, and so will OMPI v2.1. Thus, there is no compatibility issue there. However, that statement is _not_ true for OMPI v1.10 and earlier series.
Future OMPI versions will utilize PMIx v2 and above, which include a cross-version compatibility layer. Thus, you shouldn’t have any issues mixing and matching OMPI versions from this regard.
However, your second example is a perfect illustration of where containerization can break down. If you build your container on a system that doesn’t have (for example) tm and verbs installed on it, then those OMPI components will not be built. The tm component won’t matter as the system version of mpirun will be executing, and it presumably knows how to interact with Torque.
However, if you run that container on a system that has verbs, your application won’t be able to utilize the verbs support because those components were never compiled. Note that the converse is not true: if you build your container on a system that has verbs installed, you can then run it on a system that doesn’t have verbs support and those components will dynamically disqualify themselves.
Remember, you only need the verbs headers to be installed - you don’t have to build on a machine that actually has a verbs-supporting NIC installed (this is how the distributions get around the problem). Thus, it isn’t hard to avoid this portability problem - you just need to think ahead a bit.
HTH
Ralph
Post by Bennet Fauber
I am wishing to follow the instructions on the Singularity web site,
http://singularity.lbl.gov/docs-hpc
to test Singularity and OMPI on our cluster. My previously normal
configure for the 1.x series looked like this.
./configure --prefix=/usr/local \
--mandir=${PREFIX}/share/man \
--with-tm --with-verbs \
--disable-dlopen --enable-shared
CC=gcc CXX=g++ FC=gfortran
I have a couple of wonderments.
First, I presume it will be best to have the same version of OMPI
inside the container as out, but how sensitive will it be to minor
versions? All 2.1.x version should be fine, but not mix 2.1.x outside
with 2.2.x inside or vice-versa (might be backward compatible but not
forward)?
Second, if someone builds OMPI inside their container on an external
system, without tm and verbs, then brings the container to our system,
will the tm and verbs be handled by the calling mpirun from the host
system, and the OMPI inside the container won't care? Will not having
those inside the container cause them to be suppressed outside?
Thanks in advance, -- bennet
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Bennet Fauber
2017-02-20 18:38:59 UTC
Permalink
Ralph, attached please find a file with the results of

$ mpirun -d singularity exec ./mpi_test.img /usr/bin/ring

I pressed Ctrl-C at about line 200 of the output file.

I hope there is something useful in it.

My nodefile looks like this

nyx6219
nyx6219
nyx6145
nyx6145
nyx6191
nyx6191
nyx6155
nyx6155
nyx6213
nyx6213
nyx6223
nyx6223
nyx6233
nyx6233
nyx6127
nyx6127

where nyx6219 is the host from which I run mpirun. I can cycle all of
the remaining pairs of nodes such that

$ mpirun -np 4 singularity exec ./mpi_test.img /usr/bin/ring

will run successfully with every pair of processors in the list, which
leads me to believe it is not some sort of base communication issue
(but I could be wrong).

I built the host system's OMPI from 2.x-devel-7762c21

That inside the container is from

Singularity.mpi_test.img> git log
commit 5c64c0bc3bfd2b60c8c38a10482f8387cff1d879
Merge: af7e2cc bb2481a
Author: Gilles Gouaillardet <***@users.noreply.github.com>
Date: Mon Feb 20 11:29:35 2017 +0900

The one inside the container was built with only --PREFIX specified.
That outside had

--disable-dlopen --enable-shared --with-tm --with-verbs

Both inside and outside the container are CentOS 7, both built with
gcc 4.8.5 as shipped.

-- bennet
If you can send us some more info on how it breaks, that would be helpful. I’ll file it as an issue so we can track things
Thanks
Ralph
Post by Bennet Fauber
I got mixed results when bringing a container that doesn't have the IB
and Torque libraries compiled into the OMPI inside the container to a
cluster where it does.
The short summary is that mutlinode communication seems unreliable. I
can mostly get up to 8 procs, two-per-node, to run, but beyond that
not. In a couple of cases, a particular node seemed able to cause a
problem. I am going to try again making the configure line inside the
container the same as outside, but I have to chase down the IB and
Torque to do so.
If you're interested in how it breaks, I can send you some more
information. If there are diagnostics you would like, I can try to
provide those. I will be gone starting Thu for a week.
-- bennet
I -think- that is correct, but you may need the verbs library as well - I honestly don’t remember if the configury checks for functions in the library or not. If so, then you’ll need that wherever you build OMPI, but everything else is accurate
Good luck - and let us know how it goes!
Ralph
Post by Bennet Fauber
Ralph.
I will be building from the Master branch at github.com for testing
purposes. We are not 'supporting' Singularity container creation, but
we do hope to be able to offer some guidance, so I think we can
finesse the PMIx version, yes?
That is good to know about the verbs headers being the only thing
needed; thanks for that detail. Sometimes the library also needs to
be present.
Also very good to know that the host mpirun will start processes, as
we are using cgroups, and if the processes get started by a
non-tm-supporting MPI, they will be outside the proper cgroup.
So, just to recap, if I install from the current master at
http://github.com/open-mpi/ompi.git on the host system and within the
container, I copy the verbs headers into the container, then configure
and build OMPI within the container and ignore TM support, I should be
able to copy the container to the cluster and run it with verbs and
the system OMPI using tm.
If a user were to build without the verbs support, it would still run,
but it would fall back to non-verbs communication, so it would just be
commensurately slower.
Let me know if I've garbled things. Otherwise, wish me luck, and have
a good weekend!
Thanks, -- bennet
The embedded Singularity support hasn’t made it into the OMPI 2.x release series yet, though OMPI will still work within a Singularity container anyway.
Compatibility across the container boundary is always a problem, as your examples illustrate. If the system is using one OMPI version and the container is using another, then the only concern is compatibility across the container boundary of the process-to-ORTE daemon communication. In the OMPI 2.x series and beyond, this is done with PMIx. OMPI v2.0 is based on PMIx v1.x, and so will OMPI v2.1. Thus, there is no compatibility issue there. However, that statement is _not_ true for OMPI v1.10 and earlier series.
Future OMPI versions will utilize PMIx v2 and above, which include a cross-version compatibility layer. Thus, you shouldn’t have any issues mixing and matching OMPI versions from this regard.
However, your second example is a perfect illustration of where containerization can break down. If you build your container on a system that doesn’t have (for example) tm and verbs installed on it, then those OMPI components will not be built. The tm component won’t matter as the system version of mpirun will be executing, and it presumably knows how to interact with Torque.
However, if you run that container on a system that has verbs, your application won’t be able to utilize the verbs support because those components were never compiled. Note that the converse is not true: if you build your container on a system that has verbs installed, you can then run it on a system that doesn’t have verbs support and those components will dynamically disqualify themselves.
Remember, you only need the verbs headers to be installed - you don’t have to build on a machine that actually has a verbs-supporting NIC installed (this is how the distributions get around the problem). Thus, it isn’t hard to avoid this portability problem - you just need to think ahead a bit.
HTH
Ralph
Post by Bennet Fauber
I am wishing to follow the instructions on the Singularity web site,
http://singularity.lbl.gov/docs-hpc
to test Singularity and OMPI on our cluster. My previously normal
configure for the 1.x series looked like this.
./configure --prefix=/usr/local \
--mandir=${PREFIX}/share/man \
--with-tm --with-verbs \
--disable-dlopen --enable-shared
CC=gcc CXX=g++ FC=gfortran
I have a couple of wonderments.
First, I presume it will be best to have the same version of OMPI
inside the container as out, but how sensitive will it be to minor
versions? All 2.1.x version should be fine, but not mix 2.1.x outside
with 2.2.x inside or vice-versa (might be backward compatible but not
forward)?
Second, if someone builds OMPI inside their container on an external
system, without tm and verbs, then brings the container to our system,
will the tm and verbs be handled by the calling mpirun from the host
system, and the OMPI inside the container won't care? Will not having
those inside the container cause them to be suppressed outside?
Thanks in advance, -- bennet
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Loading...