Discussion:
[OMPI users] Memory Leak in 3.1.2 + UCX
Charles A Taylor
2018-10-04 21:39:01 UTC
Permalink
We are seeing a gaping memory leak when running OpenMPI 3.1.x (or 2.1.2, for that matter) built with UCX support. The leak shows up
whether the “ucx” PML is specified for the run or not. The applications in question are arepo and gizmo but it I have no reason to believe
that others are not affected as well.

Basically the MPI processes grow without bound until SLURM kills the job or the host memory is exhausted.
If I configure and build with “--without-ucx” the problem goes away.

I didn’t see anything about this on the UCX github site so I thought I’d ask here. Anyone else seeing the same or similar?

What version of UCX is OpenMPI 3.1.x tested against?

Regards,

Charlie Taylor
UF Research Computing

Details:
—————————————
RHEL7.5
OpenMPI 3.1.2 (and any other version I’ve tried).
ucx 1.2.2-1.el7 (RH native)
RH native IB stack
Mellanox FDR/EDR IB fabric
Intel Parallel Studio 2018.1.163

Configuration Options:
—————————————————
CFG_OPTS=""
CFG_OPTS="$CFG_OPTS C=icc CXX=icpc FC=ifort FFLAGS=\"-O2 -g -warn -m64\" LDFLAGS=\"\" "
CFG_OPTS="$CFG_OPTS --enable-static"
CFG_OPTS="$CFG_OPTS --enable-orterun-prefix-by-default"
CFG_OPTS="$CFG_OPTS --with-slurm=/opt/slurm"
CFG_OPTS="$CFG_OPTS --with-pmix=/opt/pmix/2.1.1"
CFG_OPTS="$CFG_OPTS --with-pmi=/opt/slurm"
CFG_OPTS="$CFG_OPTS --with-libevent=external"
CFG_OPTS="$CFG_OPTS --with-hwloc=external"
CFG_OPTS="$CFG_OPTS --with-verbs=/usr"
CFG_OPTS="$CFG_OPTS --with-libfabric=/usr"
CFG_OPTS="$CFG_OPTS --with-ucx=/usr"
CFG_OPTS="$CFG_OPTS --with-verbs-libdir=/usr/lib64"
CFG_OPTS="$CFG_OPTS --with-mxm=no"
CFG_OPTS="$CFG_OPTS --with-cuda=${HPC_CUDA_DIR}"
CFG_OPTS="$CFG_OPTS --enable-openib-udcm"
CFG_OPTS="$CFG_OPTS --enable-openib-rdmacm"
CFG_OPTS="$CFG_OPTS --disable-pmix-dstore"

rpmbuild --ba \
--define '_name openmpi' \
--define "_version $OMPI_VER" \
--define "_release ${RELEASE}" \
--define "_prefix $PREFIX" \
--define '_mandir %{_prefix}/share/man' \
--define '_defaultdocdir %{_prefix}' \
--define 'mflags -j 8' \
--define 'use_default_rpm_opt_flags 1' \
--define 'use_check_files 0' \
--define 'install_shell_scripts 1' \
--define 'shell_scripts_basename mpivars' \
--define "configure_options $CFG_OPTS " \
openmpi-${OMPI_VER}.spec 2>&1 | tee rpmbuild.log
Pavel Shamis
2018-10-05 15:13:41 UTC
Permalink
Posting this on UCX list.
Post by Charles A Taylor
We are seeing a gaping memory leak when running OpenMPI 3.1.x (or 2.1.2,
for that matter) built with UCX support. The leak shows up
whether the “ucx” PML is specified for the run or not. The applications
in question are arepo and gizmo but it I have no reason to believe
that others are not affected as well.
Basically the MPI processes grow without bound until SLURM kills the job
or the host memory is exhausted.
If I configure and build with “--without-ucx” the problem goes away.
I didn’t see anything about this on the UCX github site so I thought I’d
ask here. Anyone else seeing the same or similar?
What version of UCX is OpenMPI 3.1.x tested against?
Regards,
Charlie Taylor
UF Research Computing
—————————————
RHEL7.5
OpenMPI 3.1.2 (and any other version I’ve tried).
ucx 1.2.2-1.el7 (RH native)
RH native IB stack
Mellanox FDR/EDR IB fabric
Intel Parallel Studio 2018.1.163
—————————————————
CFG_OPTS=""
CFG_OPTS="$CFG_OPTS C=icc CXX=icpc FC=ifort FFLAGS=\"-O2 -g -warn -m64\" LDFLAGS=\"\" "
CFG_OPTS="$CFG_OPTS --enable-static"
CFG_OPTS="$CFG_OPTS --enable-orterun-prefix-by-default"
CFG_OPTS="$CFG_OPTS --with-slurm=/opt/slurm"
CFG_OPTS="$CFG_OPTS --with-pmix=/opt/pmix/2.1.1"
CFG_OPTS="$CFG_OPTS --with-pmi=/opt/slurm"
CFG_OPTS="$CFG_OPTS --with-libevent=external"
CFG_OPTS="$CFG_OPTS --with-hwloc=external"
CFG_OPTS="$CFG_OPTS --with-verbs=/usr"
CFG_OPTS="$CFG_OPTS --with-libfabric=/usr"
CFG_OPTS="$CFG_OPTS --with-ucx=/usr"
CFG_OPTS="$CFG_OPTS --with-verbs-libdir=/usr/lib64"
CFG_OPTS="$CFG_OPTS --with-mxm=no"
CFG_OPTS="$CFG_OPTS --with-cuda=${HPC_CUDA_DIR}"
CFG_OPTS="$CFG_OPTS --enable-openib-udcm"
CFG_OPTS="$CFG_OPTS --enable-openib-rdmacm"
CFG_OPTS="$CFG_OPTS --disable-pmix-dstore"
rpmbuild --ba \
--define '_name openmpi' \
--define "_version $OMPI_VER" \
--define "_release ${RELEASE}" \
--define "_prefix $PREFIX" \
--define '_mandir %{_prefix}/share/man' \
--define '_defaultdocdir %{_prefix}' \
--define 'mflags -j 8' \
--define 'use_default_rpm_opt_flags 1' \
--define 'use_check_files 0' \
--define 'install_shell_scripts 1' \
--define 'shell_scripts_basename mpivars' \
--define "configure_options $CFG_OPTS " \
openmpi-${OMPI_VER}.spec 2>&1 | tee rpmbuild.log
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Gilles Gouaillardet
2018-10-05 15:31:39 UTC
Permalink
Charles,

are you saying that even if you

mpirun --mca pml ob1 ...

(e.g. force the ob1 component of the pml framework) the memory leak is
still present ?

As a side note, we strongly recommend to avoid
configure --with-FOO=/usr
instead
configure --with-FOO
should be used (otherwise you will end up with -I/usr/include
-L/usr/lib64 and that could silently hide third party libraries
installed in a non standard directory). If --with-FOO fails for you,
then this is a bug we will appreciate you report.

Cheers,

Gilles
Post by Charles A Taylor
We are seeing a gaping memory leak when running OpenMPI 3.1.x (or 2.1.2, for that matter) built with UCX support. The leak shows up
whether the “ucx” PML is specified for the run or not. The applications in question are arepo and gizmo but it I have no reason to believe
that others are not affected as well.
Basically the MPI processes grow without bound until SLURM kills the job or the host memory is exhausted.
If I configure and build with “--without-ucx” the problem goes away.
I didn’t see anything about this on the UCX github site so I thought I’d ask here. Anyone else seeing the same or similar?
What version of UCX is OpenMPI 3.1.x tested against?
Regards,
Charlie Taylor
UF Research Computing
—————————————
RHEL7.5
OpenMPI 3.1.2 (and any other version I’ve tried).
ucx 1.2.2-1.el7 (RH native)
RH native IB stack
Mellanox FDR/EDR IB fabric
Intel Parallel Studio 2018.1.163
—————————————————
CFG_OPTS=""
CFG_OPTS="$CFG_OPTS C=icc CXX=icpc FC=ifort FFLAGS=\"-O2 -g -warn -m64\" LDFLAGS=\"\" "
CFG_OPTS="$CFG_OPTS --enable-static"
CFG_OPTS="$CFG_OPTS --enable-orterun-prefix-by-default"
CFG_OPTS="$CFG_OPTS --with-slurm=/opt/slurm"
CFG_OPTS="$CFG_OPTS --with-pmix=/opt/pmix/2.1.1"
CFG_OPTS="$CFG_OPTS --with-pmi=/opt/slurm"
CFG_OPTS="$CFG_OPTS --with-libevent=external"
CFG_OPTS="$CFG_OPTS --with-hwloc=external"
CFG_OPTS="$CFG_OPTS --with-verbs=/usr"
CFG_OPTS="$CFG_OPTS --with-libfabric=/usr"
CFG_OPTS="$CFG_OPTS --with-ucx=/usr"
CFG_OPTS="$CFG_OPTS --with-verbs-libdir=/usr/lib64"
CFG_OPTS="$CFG_OPTS --with-mxm=no"
CFG_OPTS="$CFG_OPTS --with-cuda=${HPC_CUDA_DIR}"
CFG_OPTS="$CFG_OPTS --enable-openib-udcm"
CFG_OPTS="$CFG_OPTS --enable-openib-rdmacm"
CFG_OPTS="$CFG_OPTS --disable-pmix-dstore"
rpmbuild --ba \
--define '_name openmpi' \
--define "_version $OMPI_VER" \
--define "_release ${RELEASE}" \
--define "_prefix $PREFIX" \
--define '_mandir %{_prefix}/share/man' \
--define '_defaultdocdir %{_prefix}' \
--define 'mflags -j 8' \
--define 'use_default_rpm_opt_flags 1' \
--define 'use_check_files 0' \
--define 'install_shell_scripts 1' \
--define 'shell_scripts_basename mpivars' \
--define "configure_options $CFG_OPTS " \
openmpi-${OMPI_VER}.spec 2>&1 | tee rpmbuild.log
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Charles A Taylor
2018-10-06 09:49:41 UTC
Permalink
Post by Gilles Gouaillardet
are you saying that even if you
mpirun --mca pml ob1 ...
(e.g. force the ob1 component of the pml framework) the memory leak is
still present ?
No, I do not mean to say that - at least not in the current incarnation. Running with the following parameters avoids the leak…

export OMPI_MCA_pml="ob1"
export OMPI_MCA_btl_openib_eager_limit=1048576
export OMPI_MCA_btl_openib_max_send_size=1048576

as does building OpenMPI without UCX support (i.e. —without-ucx).

However, building _with_ UCX support (including the current github source) and running with the following parameters produces
the leak (note that no PML was explicitly requested).

export OMPI_MCA_oob_tcp_listen_mode="listen_thread"
export OMPI_MCA_btl_openib_eager_limit=1048576
export OMPI_MCA_btl_openib_max_send_size=1048576
export OMPI_MCA_btl="self,vader,openib”

The eager_limit and send_size limits are needed with this app to prevent a deadlock that I’ve posted about previously.

Also, explicitly requesting the UCX PML with,

export OMPI_MCA_pml=“ucx"

produces the leak.

I’m continuing to try to find exactly what I’m doing wrong to produce this behavior but have been unable to arrive at
a solution other than excluding UCX which seems like a bad idea since Jeff (Squyres) pointed out that it is the
Mellanox-recommended way to run on Mellanox hardware. Interestingly, using the UCX PML framework avoids
the deadlock that results when running with the default parameters and not limiting the message sizes - another
reason we’d like to be able to use it.

I can read your mind at this point - “Wow, these guys have really horked their cluster”. Could be. But we run
thousands of jobs every day including many other OpenMPI jobs (vasp, gromacs, raxml, lammps, namd, etc).
Also the users of the Arepo and Gadget code are currently running with MVAPICH2 without issue. I installed
it specifically to get them past these OpenMPI problems. We don’t normally build anything with MPICH/MVAPICH/IMPI
since we have never had any real reason to - until now.

That may have to be the solution but the memory leak is so readily reproducible that I thought I’d ask about it.
Since it appears that others are not seeing this issue, I’ll continue to try to figure it out and if I do, I’ll be sure to post back.
Post by Gilles Gouaillardet
As a side note, we strongly recommend to avoid
configure --with-FOO=/usr
instead
configure --with-FOO
should be used (otherwise you will end up with -I/usr/include
-L/usr/lib64 and that could silently hide third party libraries
installed in a non standard directory). If --with-FOO fails for you,
then this is a bug we will appreciate you report.
Noted and logged. We’ve been using the —with-FOO=/usr for a long time (since 1.x days). There was a reason we started doing
it but I’ve long since forgotten what it was but I think it was to _avoid_ what you describe - not cause it. Regardless,
I’ll heed your warning and remove it from future builds and file a bug if there are any problems.

However, I did post of a similar problem previously in when configuring against an external PMIx library. The configure
script produces (or did) a "-L/usr/lib” instead of a "-L/usr/lib64” resulting in unresolved PMIx routines when linking.
That was with OpenMPI 2.1.2. We now include a lib -> lib64 symlink in our /opt/pmix/x.y.z directories so I haven’t looked to
see if that was fixed for 3.x or not.

I should have also mentioned in my previous post that HPC_CUDA_DIR=NO meaning that CUDA support has
been excluded from these builds (in case anyone was wondering).

Thanks for the feedback,

Charlie
Post by Gilles Gouaillardet
Cheers,
Gilles
Post by Charles A Taylor
We are seeing a gaping memory leak when running OpenMPI 3.1.x (or 2.1.2, for that matter) built with UCX support. The leak shows up
whether the “ucx” PML is specified for the run or not. The applications in question are arepo and gizmo but it I have no reason to believe
that others are not affected as well.
Basically the MPI processes grow without bound until SLURM kills the job or the host memory is exhausted.
If I configure and build with “--without-ucx” the problem goes away.
I didn’t see anything about this on the UCX github site so I thought I’d ask here. Anyone else seeing the same or similar?
What version of UCX is OpenMPI 3.1.x tested against?
Regards,
Charlie Taylor
UF Research Computing
—————————————
RHEL7.5
OpenMPI 3.1.2 (and any other version I’ve tried).
ucx 1.2.2-1.el7 (RH native)
RH native IB stack
Mellanox FDR/EDR IB fabric
Intel Parallel Studio 2018.1.163
—————————————————
CFG_OPTS=""
CFG_OPTS="$CFG_OPTS C=icc CXX=icpc FC=ifort FFLAGS=\"-O2 -g -warn -m64\" LDFLAGS=\"\" "
CFG_OPTS="$CFG_OPTS --enable-static"
CFG_OPTS="$CFG_OPTS --enable-orterun-prefix-by-default"
CFG_OPTS="$CFG_OPTS --with-slurm=/opt/slurm"
CFG_OPTS="$CFG_OPTS --with-pmix=/opt/pmix/2.1.1"
CFG_OPTS="$CFG_OPTS --with-pmi=/opt/slurm"
CFG_OPTS="$CFG_OPTS --with-libevent=external"
CFG_OPTS="$CFG_OPTS --with-hwloc=external"
CFG_OPTS="$CFG_OPTS --with-verbs=/usr"
CFG_OPTS="$CFG_OPTS --with-libfabric=/usr"
CFG_OPTS="$CFG_OPTS --with-ucx=/usr"
CFG_OPTS="$CFG_OPTS --with-verbs-libdir=/usr/lib64"
CFG_OPTS="$CFG_OPTS --with-mxm=no"
CFG_OPTS="$CFG_OPTS --with-cuda=${HPC_CUDA_DIR}"
CFG_OPTS="$CFG_OPTS --enable-openib-udcm"
CFG_OPTS="$CFG_OPTS --enable-openib-rdmacm"
CFG_OPTS="$CFG_OPTS --disable-pmix-dstore"
rpmbuild --ba \
--define '_name openmpi' \
--define "_version $OMPI_VER" \
--define "_release ${RELEASE}" \
--define "_prefix $PREFIX" \
--define '_mandir %{_prefix}/share/man' \
--define '_defaultdocdir %{_prefix}' \
--define 'mflags -j 8' \
--define 'use_default_rpm_opt_flags 1' \
--define 'use_check_files 0' \
--define 'install_shell_scripts 1' \
--define 'shell_scripts_basename mpivars' \
--define "configure_options $CFG_OPTS " \
openmpi-${OMPI_VER}.spec 2>&1 | tee rpmbuild.log
g***@rist.or.jp
2018-10-06 10:06:34 UTC
Permalink
Charles,

ucx has a higher priority than ob1, that is why it is used by default
when available.


If you can provide simple instructions on how to build and test one of
the apps that experiment
a memory leak, that would greatly help us and the UCX folks reproduce,
troubleshoot and diagnose this issue.


Cheers,

Gilles

----- Original Message -----
Post by Charles A Taylor
On Oct 5, 2018, at 11:31 AM, Gilles Gouaillardet <gilles.
are you saying that even if you
mpirun --mca pml ob1 ...
(e.g. force the ob1 component of the pml framework) the memory leak is
still present ?
No, I do not mean to say that - at least not in the current
incarnation. Running with the following parameters avoids the leak…
Post by Charles A Taylor
export OMPI_MCA_pml="ob1"
export OMPI_MCA_btl_openib_eager_limit=1048576
export OMPI_MCA_btl_openib_max_send_size=1048576
as does building OpenMPI without UCX support (i.e. —without-ucx).
However, building _with_ UCX support (including the current github
source) and running with the following parameters produces
Post by Charles A Taylor
the leak (note that no PML was explicitly requested).
export OMPI_MCA_oob_tcp_listen_mode="listen_thread"
export OMPI_MCA_btl_openib_eager_limit=1048576
export OMPI_MCA_btl_openib_max_send_size=1048576
export OMPI_MCA_btl="self,vader,openib”
The eager_limit and send_size limits are needed with this app to
prevent a deadlock that I’ve posted about previously.
Post by Charles A Taylor
Also, explicitly requesting the UCX PML with,
export OMPI_MCA_pml=“ucx"
produces the leak.
I’m continuing to try to find exactly what I’m doing wrong to produce
this behavior but have been unable to arrive at
Post by Charles A Taylor
a solution other than excluding UCX which seems like a bad idea since
Jeff (Squyres) pointed out that it is the
Post by Charles A Taylor
Mellanox-recommended way to run on Mellanox hardware. Interestingly,
using the UCX PML framework avoids
Post by Charles A Taylor
the deadlock that results when running with the default parameters and
not limiting the message sizes - another
Post by Charles A Taylor
reason we’d like to be able to use it.
I can read your mind at this point - “Wow, these guys have really
horked their cluster”. Could be. But we run
Post by Charles A Taylor
thousands of jobs every day including many other OpenMPI jobs (vasp,
gromacs, raxml, lammps, namd, etc).
Post by Charles A Taylor
Also the users of the Arepo and Gadget code are currently running with
MVAPICH2 without issue. I installed
Post by Charles A Taylor
it specifically to get them past these OpenMPI problems. We don’t
normally build anything with MPICH/MVAPICH/IMPI
Post by Charles A Taylor
since we have never had any real reason to - until now.
That may have to be the solution but the memory leak is so readily
reproducible that I thought I’d ask about it.
Post by Charles A Taylor
Since it appears that others are not seeing this issue, I’ll continue
to try to figure it out and if I do, I’ll be sure to post back.
Post by Charles A Taylor
As a side note, we strongly recommend to avoid
configure --with-FOO=/usr
instead
configure --with-FOO
should be used (otherwise you will end up with -I/usr/include
-L/usr/lib64 and that could silently hide third party libraries
installed in a non standard directory). If --with-FOO fails for you,
then this is a bug we will appreciate you report.
Noted and logged. We’ve been using the —with-FOO=/usr for a long time
(since 1.x days). There was a reason we started doing
Post by Charles A Taylor
it but I’ve long since forgotten what it was but I think it was to _
avoid_ what you describe - not cause it. Regardless,
Post by Charles A Taylor
I’ll heed your warning and remove it from future builds and file a bug
if there are any problems.
Post by Charles A Taylor
However, I did post of a similar problem previously in when
configuring against an external PMIx library. The configure
Post by Charles A Taylor
script produces (or did) a "-L/usr/lib” instead of a "-L/usr/lib64”
resulting in unresolved PMIx routines when linking.
Post by Charles A Taylor
That was with OpenMPI 2.1.2. We now include a lib -> lib64 symlink in
our /opt/pmix/x.y.z directories so I haven’t looked to
Post by Charles A Taylor
see if that was fixed for 3.x or not.
I should have also mentioned in my previous post that HPC_CUDA_DIR=NO
meaning that CUDA support has
Post by Charles A Taylor
been excluded from these builds (in case anyone was wondering).
Thanks for the feedback,
Charlie
Cheers,
Gilles
Post by Charles A Taylor
We are seeing a gaping memory leak when running OpenMPI 3.1.x (or 2.
1.2, for that matter) built with UCX support. The leak shows up
Post by Charles A Taylor
Post by Charles A Taylor
whether the “ucx” PML is specified for the run or not. The
applications in question are arepo and gizmo but it I have no reason to
believe
Post by Charles A Taylor
Post by Charles A Taylor
that others are not affected as well.
Basically the MPI processes grow without bound until SLURM kills
the job or the host memory is exhausted.
Post by Charles A Taylor
Post by Charles A Taylor
If I configure and build with “--without-ucx” the problem goes away.
I didn’t see anything about this on the UCX github site so I
thought I’d ask here. Anyone else seeing the same or similar?
Post by Charles A Taylor
Post by Charles A Taylor
What version of UCX is OpenMPI 3.1.x tested against?
Regards,
Charlie Taylor
UF Research Computing
—————————————
RHEL7.5
OpenMPI 3.1.2 (and any other version I’ve tried).
ucx 1.2.2-1.el7 (RH native)
RH native IB stack
Mellanox FDR/EDR IB fabric
Intel Parallel Studio 2018.1.163
—————————————————
CFG_OPTS=""
CFG_OPTS="$CFG_OPTS C=icc CXX=icpc FC=ifort FFLAGS=\"-O2 -g -warn -
m64\" LDFLAGS=\"\" "
Post by Charles A Taylor
Post by Charles A Taylor
CFG_OPTS="$CFG_OPTS --enable-static"
CFG_OPTS="$CFG_OPTS --enable-orterun-prefix-by-default"
CFG_OPTS="$CFG_OPTS --with-slurm=/opt/slurm"
CFG_OPTS="$CFG_OPTS --with-pmix=/opt/pmix/2.1.1"
CFG_OPTS="$CFG_OPTS --with-pmi=/opt/slurm"
CFG_OPTS="$CFG_OPTS --with-libevent=external"
CFG_OPTS="$CFG_OPTS --with-hwloc=external"
CFG_OPTS="$CFG_OPTS --with-verbs=/usr"
CFG_OPTS="$CFG_OPTS --with-libfabric=/usr"
CFG_OPTS="$CFG_OPTS --with-ucx=/usr"
CFG_OPTS="$CFG_OPTS --with-verbs-libdir=/usr/lib64"
CFG_OPTS="$CFG_OPTS --with-mxm=no"
CFG_OPTS="$CFG_OPTS --with-cuda=${HPC_CUDA_DIR}"
CFG_OPTS="$CFG_OPTS --enable-openib-udcm"
CFG_OPTS="$CFG_OPTS --enable-openib-rdmacm"
CFG_OPTS="$CFG_OPTS --disable-pmix-dstore"
rpmbuild --ba \
--define '_name openmpi' \
--define "_version $OMPI_VER" \
--define "_release ${RELEASE}" \
--define "_prefix $PREFIX" \
--define '_mandir %{_prefix}/share/man' \
--define '_defaultdocdir %{_prefix}' \
--define 'mflags -j 8' \
--define 'use_default_rpm_opt_flags 1' \
--define 'use_check_files 0' \
--define 'install_shell_scripts 1' \
--define 'shell_scripts_basename mpivars' \
--define "configure_options $CFG_OPTS " \
openmpi-${OMPI_VER}.spec 2>&1 | tee rpmbuild.log
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Charles A Taylor
2018-10-06 10:57:23 UTC
Permalink
Post by Gilles Gouaillardet
Charles,
ucx has a higher priority than ob1, that is why it is used by default
when available.
Good to know. Thanks.
Post by Gilles Gouaillardet
If you can provide simple instructions on how to build and test one of
the apps that experiment
a memory leak, that would greatly help us and the UCX folks reproduce,
troubleshoot and diagnose this issue.
I’ll be happy to do that. Is it better to post it here or on the OpenMPI github site?

Regards,

Charlie
Post by Gilles Gouaillardet
Cheers,
Gilles
----- Original Message -----
Post by Charles A Taylor
On Oct 5, 2018, at 11:31 AM, Gilles Gouaillardet <gilles.
are you saying that even if you
mpirun --mca pml ob1 ...
(e.g. force the ob1 component of the pml framework) the memory leak
is
Post by Charles A Taylor
still present ?
No, I do not mean to say that - at least not in the current
incarnation. Running with the following parameters avoids the leak…
Post by Charles A Taylor
export OMPI_MCA_pml="ob1"
export OMPI_MCA_btl_openib_eager_limit=1048576
export OMPI_MCA_btl_openib_max_send_size=1048576
as does building OpenMPI without UCX support (i.e. —without-ucx).
However, building _with_ UCX support (including the current github
source) and running with the following parameters produces
Post by Charles A Taylor
the leak (note that no PML was explicitly requested).
export OMPI_MCA_oob_tcp_listen_mode="listen_thread"
export OMPI_MCA_btl_openib_eager_limit=1048576
export OMPI_MCA_btl_openib_max_send_size=1048576
export OMPI_MCA_btl="self,vader,openib”
The eager_limit and send_size limits are needed with this app to
prevent a deadlock that I’ve posted about previously.
Post by Charles A Taylor
Also, explicitly requesting the UCX PML with,
export OMPI_MCA_pml=“ucx"
produces the leak.
I’m continuing to try to find exactly what I’m doing wrong to produce
this behavior but have been unable to arrive at
Post by Charles A Taylor
a solution other than excluding UCX which seems like a bad idea since
Jeff (Squyres) pointed out that it is the
Post by Charles A Taylor
Mellanox-recommended way to run on Mellanox hardware. Interestingly,
using the UCX PML framework avoids
Post by Charles A Taylor
the deadlock that results when running with the default parameters and
not limiting the message sizes - another
Post by Charles A Taylor
reason we’d like to be able to use it.
I can read your mind at this point - “Wow, these guys have really
horked their cluster”. Could be. But we run
Post by Charles A Taylor
thousands of jobs every day including many other OpenMPI jobs (vasp,
gromacs, raxml, lammps, namd, etc).
Post by Charles A Taylor
Also the users of the Arepo and Gadget code are currently running with
MVAPICH2 without issue. I installed
Post by Charles A Taylor
it specifically to get them past these OpenMPI problems. We don’t
normally build anything with MPICH/MVAPICH/IMPI
Post by Charles A Taylor
since we have never had any real reason to - until now.
That may have to be the solution but the memory leak is so readily
reproducible that I thought I’d ask about it.
Post by Charles A Taylor
Since it appears that others are not seeing this issue, I’ll continue
to try to figure it out and if I do, I’ll be sure to post back.
Post by Charles A Taylor
As a side note, we strongly recommend to avoid
configure --with-FOO=/usr
instead
configure --with-FOO
should be used (otherwise you will end up with -I/usr/include
-L/usr/lib64 and that could silently hide third party libraries
installed in a non standard directory). If --with-FOO fails for you,
then this is a bug we will appreciate you report.
Noted and logged. We’ve been using the —with-FOO=/usr for a long time
(since 1.x days). There was a reason we started doing
Post by Charles A Taylor
it but I’ve long since forgotten what it was but I think it was to _
avoid_ what you describe - not cause it. Regardless,
Post by Charles A Taylor
I’ll heed your warning and remove it from future builds and file a bug
if there are any problems.
Post by Charles A Taylor
However, I did post of a similar problem previously in when
configuring against an external PMIx library. The configure
Post by Charles A Taylor
script produces (or did) a "-L/usr/lib” instead of a "-L/usr/lib64”
resulting in unresolved PMIx routines when linking.
Post by Charles A Taylor
That was with OpenMPI 2.1.2. We now include a lib -> lib64 symlink in
our /opt/pmix/x.y.z directories so I haven’t looked to
Post by Charles A Taylor
see if that was fixed for 3.x or not.
I should have also mentioned in my previous post that HPC_CUDA_DIR=NO
meaning that CUDA support has
Post by Charles A Taylor
been excluded from these builds (in case anyone was wondering).
Thanks for the feedback,
Charlie
Cheers,
Gilles
Post by Charles A Taylor
We are seeing a gaping memory leak when running OpenMPI 3.1.x (or 2.
1.2, for that matter) built with UCX support. The leak shows up
Post by Charles A Taylor
Post by Charles A Taylor
whether the “ucx” PML is specified for the run or not. The
applications in question are arepo and gizmo but it I have no reason to
believe
Post by Charles A Taylor
Post by Charles A Taylor
that others are not affected as well.
Basically the MPI processes grow without bound until SLURM kills
the job or the host memory is exhausted.
Post by Charles A Taylor
Post by Charles A Taylor
If I configure and build with “--without-ucx” the problem goes away.
I didn’t see anything about this on the UCX github site so I
thought I’d ask here. Anyone else seeing the same or similar?
Post by Charles A Taylor
Post by Charles A Taylor
What version of UCX is OpenMPI 3.1.x tested against?
Regards,
Charlie Taylor
UF Research Computing
—————————————
RHEL7.5
OpenMPI 3.1.2 (and any other version I’ve tried).
ucx 1.2.2-1.el7 (RH native)
RH native IB stack
Mellanox FDR/EDR IB fabric
Intel Parallel Studio 2018.1.163
—————————————————
CFG_OPTS=""
CFG_OPTS="$CFG_OPTS C=icc CXX=icpc FC=ifort FFLAGS=\"-O2 -g -warn -
m64\" LDFLAGS=\"\" "
Post by Charles A Taylor
Post by Charles A Taylor
CFG_OPTS="$CFG_OPTS --enable-static"
CFG_OPTS="$CFG_OPTS --enable-orterun-prefix-by-default"
CFG_OPTS="$CFG_OPTS --with-slurm=/opt/slurm"
CFG_OPTS="$CFG_OPTS --with-pmix=/opt/pmix/2.1.1"
CFG_OPTS="$CFG_OPTS --with-pmi=/opt/slurm"
CFG_OPTS="$CFG_OPTS --with-libevent=external"
CFG_OPTS="$CFG_OPTS --with-hwloc=external"
CFG_OPTS="$CFG_OPTS --with-verbs=/usr"
CFG_OPTS="$CFG_OPTS --with-libfabric=/usr"
CFG_OPTS="$CFG_OPTS --with-ucx=/usr"
CFG_OPTS="$CFG_OPTS --with-verbs-libdir=/usr/lib64"
CFG_OPTS="$CFG_OPTS --with-mxm=no"
CFG_OPTS="$CFG_OPTS --with-cuda=${HPC_CUDA_DIR}"
CFG_OPTS="$CFG_OPTS --enable-openib-udcm"
CFG_OPTS="$CFG_OPTS --enable-openib-rdmacm"
CFG_OPTS="$CFG_OPTS --disable-pmix-dstore"
rpmbuild --ba \
--define '_name openmpi' \
--define "_version $OMPI_VER" \
--define "_release ${RELEASE}" \
--define "_prefix $PREFIX" \
--define '_mandir %{_prefix}/share/man' \
--define '_defaultdocdir %{_prefix}' \
--define 'mflags -j 8' \
--define 'use_default_rpm_opt_flags 1' \
--define 'use_check_files 0' \
--define 'install_shell_scripts 1' \
--define 'shell_scripts_basename mpivars' \
--define "configure_options $CFG_OPTS " \
openmpi-${OMPI_VER}.spec 2>&1 | tee rpmbuild.log
_______________________________________________
users mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.open-2Dmpi.org_mailman_listinfo_users&d=DwIGaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=HOtXciFqK5GlgIgLAxthUQ&m=eZpC-PdZ0V-ZvSxLp7MxP2fdhRzm60qKMMrgqbRvZgQ&s=HFD6S6cr9ZsYgY8D209CnegdKPIs4eZR7lx6HRTsP6k&e=
_______________________________________________
users mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.open-2Dmpi.org_mailman_listinfo_users&d=DwIGaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=HOtXciFqK5GlgIgLAxthUQ&m=eZpC-PdZ0V-ZvSxLp7MxP2fdhRzm60qKMMrgqbRvZgQ&s=HFD6S6cr9ZsYgY8D209CnegdKPIs4eZR7lx6HRTsP6k&e=
Charles A Taylor
2018-10-17 10:36:56 UTC
Permalink
Just to follow up


This turned out to be a bug in OpenMPI+UCX.

https://github.com/openucx/ucx/issues/2921 <https://github.com/openucx/ucx/issues/2921>
https://github.com/open-mpi/ompi/pull/5878 <https://github.com/open-mpi/ompi/pull/5878>

I cherry-picked the patch from the github master and applied it to 3.1.2. The gadget/gizmo
test case has been running since yesterday without the previously observed growth in RSS.

Thanks to Yossi Itigin (***@mellanox.com <mailto:***@mellanox.com>) for the fix.

Charlie Taylor
UF Research Computing
Post by Charles A Taylor
We are seeing a gaping memory leak when running OpenMPI 3.1.x (or 2.1.2, for that matter) built with UCX support. The leak shows up
whether the “ucx” PML is specified for the run or not. The applications in question are arepo and gizmo but it I have no reason to believe
that others are not affected as well.
Basically the MPI processes grow without bound until SLURM kills the job or the host memory is exhausted.
If I configure and build with “--without-ucx” the problem goes away.
I didn’t see anything about this on the UCX github site so I thought I’d ask here. Anyone else seeing the same or similar?
What version of UCX is OpenMPI 3.1.x tested against?
Regards,
Charlie Taylor
UF Research Computing
—————————————
RHEL7.5
OpenMPI 3.1.2 (and any other version I’ve tried).
ucx 1.2.2-1.el7 (RH native)
RH native IB stack
Mellanox FDR/EDR IB fabric
Intel Parallel Studio 2018.1.163
—————————————————
CFG_OPTS=""
CFG_OPTS="$CFG_OPTS C=icc CXX=icpc FC=ifort FFLAGS=\"-O2 -g -warn -m64\" LDFLAGS=\"\" "
CFG_OPTS="$CFG_OPTS --enable-static"
CFG_OPTS="$CFG_OPTS --enable-orterun-prefix-by-default"
CFG_OPTS="$CFG_OPTS --with-slurm=/opt/slurm"
CFG_OPTS="$CFG_OPTS --with-pmix=/opt/pmix/2.1.1"
CFG_OPTS="$CFG_OPTS --with-pmi=/opt/slurm"
CFG_OPTS="$CFG_OPTS --with-libevent=external"
CFG_OPTS="$CFG_OPTS --with-hwloc=external"
CFG_OPTS="$CFG_OPTS --with-verbs=/usr"
CFG_OPTS="$CFG_OPTS --with-libfabric=/usr"
CFG_OPTS="$CFG_OPTS --with-ucx=/usr"
CFG_OPTS="$CFG_OPTS --with-verbs-libdir=/usr/lib64"
CFG_OPTS="$CFG_OPTS --with-mxm=no"
CFG_OPTS="$CFG_OPTS --with-cuda=${HPC_CUDA_DIR}"
CFG_OPTS="$CFG_OPTS --enable-openib-udcm"
CFG_OPTS="$CFG_OPTS --enable-openib-rdmacm"
CFG_OPTS="$CFG_OPTS --disable-pmix-dstore"
rpmbuild --ba \
--define '_name openmpi' \
--define "_version $OMPI_VER" \
--define "_release ${RELEASE}" \
--define "_prefix $PREFIX" \
--define '_mandir %{_prefix}/share/man' \
--define '_defaultdocdir %{_prefix}' \
--define 'mflags -j 8' \
--define 'use_default_rpm_opt_flags 1' \
--define 'use_check_files 0' \
--define 'install_shell_scripts 1' \
--define 'shell_scripts_basename mpivars' \
--define "configure_options $CFG_OPTS " \
openmpi-${OMPI_VER}.spec 2>&1 | tee rpmbuild.log
_______________________________________________
users mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.open-2Dmpi.org_mailman_listinfo_users&d=DwIGaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=HOtXciFqK5GlgIgLAxthUQ&m=_TUHqBC2-jZfYbwP18yLYDuU3Rq68N8-nk-rnxsiDGo&s=QjG-szMi1wbDc0DX3andcwIsZNDMsVZErnCirrAYnlE&e=
Loading...