Discussion:
[OMPI users] Valgrind errors related to MPI_Win_allocate_shared
Joseph Schuchart
2016-11-14 12:49:48 UTC
Permalink
All,

I am investigating an MPI application using Valgrind and see a load of
memory leaks reported in MPI-related code. Please find the full log
attached. Some observations/questions:

1) According to the information available at
https://www.open-mpi.org/faq/?category=debugging#valgrind_clean the
suppression file should help get a clean run of an MPI application
despite several buffers not being free'd by MPI_Finalize. Is this
assumption still valid? If so, maybe the suppression file needs an
update as I still see reports on leaked memory allocated in MPI_Init?

2) There seem to be several invalid reads and writes in the
opal_shmem_segment_* functions. Are they significant or can we regard
them as false positives?

3) The code example attached allocates memory using
MPI_Win_allocate_shared and frees it using MPI_Win_free. However,
Valgrind reports some memory to be leaking, e.g.:

==4020== 16 bytes in 1 blocks are definitely lost in loss record 21 of 234
==4020== at 0x4C2DB8F: malloc (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4020== by 0xCFDCD47: component_select (osc_sm_component.c:277)
==4020== by 0x4F39FC3: ompi_osc_base_select (osc_base_init.c:73)
==4020== by 0x4E945DC: ompi_win_allocate_shared (win.c:272)
==4020== by 0x4EF6576: PMPI_Win_allocate_shared
(pwin_allocate_shared.c:80)
==4020== by 0x400E96: main (mpi_dynamic_win_free.c:48)

Can someone please confirm that we the way the shared window memory is
free'd is actually correct? I noticed that the amount of memory that is
reported to be leaking scales with the number of windows that are
allocated and free'd. In our case this happens in a set of unit tests
that all allocate their own shared memory windows and thus the amount of
leaked memory piles up quite a bit.

I build the code using GCC 5.4.0 using OpenMPI 2.0.1 and ran it on a
single node. How to reproduce:

$ mpicc -Wall -ggdb mpi_dynamic_win_free.c -o mpi_dynamic_win_free

$ mpirun -n 2 valgrind --leak-check=full
--suppressions=$HOME/opt/openmpi-2.0.1/share/openmpi/openmpi-valgrind.supp
./mpi_dynamic_win_free

Best regards,
Joseph
--
Dipl.-Inf. Joseph Schuchart
High Performance Computing Center Stuttgart (HLRS)
Nobelstr. 19
D-70569 Stuttgart

Tel.: +49(0)711-68565890
Fax: +49(0)711-6856832
E-Mail: ***@hlrs.de
D'Alessandro, Luke K
2016-11-14 16:07:23 UTC
Permalink
Hi Joesph,

I don’t have a solution to your issue, but I’ve found that the valgrind mpi wrapper is necessary to eliminate many of the false positives that the suppressions file can’t.

http://valgrind.org/docs/manual/mc-manual.html#mc-manual.mpiwrap.gettingstarted

You should LD_PRELOAD the libmpiwrap from your installation. If it’s not there then you can rebuild valgrind with CC=mpicc to have it built.

Hope this helps move you towards a solution.

Luke
All,
1) According to the information available at https://www.open-mpi.org/faq/?category=debugging#valgrind_clean the suppression file should help get a clean run of an MPI application despite several buffers not being free'd by MPI_Finalize. Is this assumption still valid? If so, maybe the suppression file needs an update as I still see reports on leaked memory allocated in MPI_Init?
2) There seem to be several invalid reads and writes in the opal_shmem_segment_* functions. Are they significant or can we regard them as false positives?
==4020== 16 bytes in 1 blocks are definitely lost in loss record 21 of 234
==4020== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4020== by 0xCFDCD47: component_select (osc_sm_component.c:277)
==4020== by 0x4F39FC3: ompi_osc_base_select (osc_base_init.c:73)
==4020== by 0x4E945DC: ompi_win_allocate_shared (win.c:272)
==4020== by 0x4EF6576: PMPI_Win_allocate_shared (pwin_allocate_shared.c:80)
==4020== by 0x400E96: main (mpi_dynamic_win_free.c:48)
Can someone please confirm that we the way the shared window memory is free'd is actually correct? I noticed that the amount of memory that is reported to be leaking scales with the number of windows that are allocated and free'd. In our case this happens in a set of unit tests that all allocate their own shared memory windows and thus the amount of leaked memory piles up quite a bit.
$ mpicc -Wall -ggdb mpi_dynamic_win_free.c -o mpi_dynamic_win_free
$ mpirun -n 2 valgrind --leak-check=full --suppressions=$HOME/opt/openmpi-2.0.1/share/openmpi/openmpi-valgrind.supp ./mpi_dynamic_win_free
Best regards,
Joseph
--
Dipl.-Inf. Joseph Schuchart
High Performance Computing Center Stuttgart (HLRS)
Nobelstr. 19
D-70569 Stuttgart
Tel.: +49(0)711-68565890
Fax: +49(0)711-6856832
<valgrind_mpi.log><mpi_dynamic_win_free.c>_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Joseph Schuchart
2016-11-15 08:52:49 UTC
Permalink
Hi Luke,

Thanks for your reply. From my understanding, the wrappers mainly help
catch errors on the MPI API level. The errors I reported are well below
the API layer (please correct me if I'm wrong here) However, I re-ran
the code with the wrapper loaded via LD_PRELOAD and without the
suppresion file and the warnings issued by Valgrind for the shmem
segment handling code and leaking memory from MPI_Win_allocate_shared
are basically the same. Nevertheless, I am attaching the full log of
that run as well.

Cheers
Joseph
Post by D'Alessandro, Luke K
Hi Joesph,
I don’t have a solution to your issue, but I’ve found that the valgrind mpi wrapper is necessary to eliminate many of the false positives that the suppressions file can’t.
http://valgrind.org/docs/manual/mc-manual.html#mc-manual.mpiwrap.gettingstarted
You should LD_PRELOAD the libmpiwrap from your installation. If it’s not there then you can rebuild valgrind with CC=mpicc to have it built.
Hope this helps move you towards a solution.
Luke
All,
1) According to the information available at https://www.open-mpi.org/faq/?category=debugging#valgrind_clean the suppression file should help get a clean run of an MPI application despite several buffers not being free'd by MPI_Finalize. Is this assumption still valid? If so, maybe the suppression file needs an update as I still see reports on leaked memory allocated in MPI_Init?
2) There seem to be several invalid reads and writes in the opal_shmem_segment_* functions. Are they significant or can we regard them as false positives?
==4020== 16 bytes in 1 blocks are definitely lost in loss record 21 of 234
==4020== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4020== by 0xCFDCD47: component_select (osc_sm_component.c:277)
==4020== by 0x4F39FC3: ompi_osc_base_select (osc_base_init.c:73)
==4020== by 0x4E945DC: ompi_win_allocate_shared (win.c:272)
==4020== by 0x4EF6576: PMPI_Win_allocate_shared (pwin_allocate_shared.c:80)
==4020== by 0x400E96: main (mpi_dynamic_win_free.c:48)
Can someone please confirm that we the way the shared window memory is free'd is actually correct? I noticed that the amount of memory that is reported to be leaking scales with the number of windows that are allocated and free'd. In our case this happens in a set of unit tests that all allocate their own shared memory windows and thus the amount of leaked memory piles up quite a bit.
$ mpicc -Wall -ggdb mpi_dynamic_win_free.c -o mpi_dynamic_win_free
$ mpirun -n 2 valgrind --leak-check=full --suppressions=$HOME/opt/openmpi-2.0.1/share/openmpi/openmpi-valgrind.supp ./mpi_dynamic_win_free
Best regards,
Joseph
--
Dipl.-Inf. Joseph Schuchart
High Performance Computing Center Stuttgart (HLRS)
Nobelstr. 19
D-70569 Stuttgart
Tel.: +49(0)711-68565890
Fax: +49(0)711-6856832
<valgrind_mpi.log><mpi_dynamic_win_free.c>_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
--
Dipl.-Inf. Joseph Schuchart
High Performance Computing Center Stuttgart (HLRS)
Nobelstr. 19
D-70569 Stuttgart

Tel.: +49(0)711-68565890
Fax: +49(0)711-6856832
E-Mail: ***@hlrs.de
Gilles Gouaillardet
2016-11-15 15:39:33 UTC
Permalink
Joseph,

thanks for the report, this is a real memory leak.
i fixed it in master, and the fix is now being reviewed.
meanwhile, you can manually apply the patch available at
https://github.com/open-mpi/ompi/pull/2418.patch

Cheers,

Gilles
Post by Joseph Schuchart
Hi Luke,
Thanks for your reply. From my understanding, the wrappers mainly help catch
errors on the MPI API level. The errors I reported are well below the API
layer (please correct me if I'm wrong here) However, I re-ran the code with
the wrapper loaded via LD_PRELOAD and without the suppresion file and the
warnings issued by Valgrind for the shmem segment handling code and leaking
memory from MPI_Win_allocate_shared are basically the same. Nevertheless, I
am attaching the full log of that run as well.
Cheers
Joseph
Post by D'Alessandro, Luke K
Hi Joesph,
I don’t have a solution to your issue, but I’ve found that the valgrind
mpi wrapper is necessary to eliminate many of the false positives that the
suppressions file can’t.
http://valgrind.org/docs/manual/mc-manual.html#mc-manual.mpiwrap.gettingstarted
You should LD_PRELOAD the libmpiwrap from your installation. If it’s not
there then you can rebuild valgrind with CC=mpicc to have it built.
Hope this helps move you towards a solution.
Luke
Post by Joseph Schuchart
All,
I am investigating an MPI application using Valgrind and see a load of
memory leaks reported in MPI-related code. Please find the full log
1) According to the information available at
https://www.open-mpi.org/faq/?category=debugging#valgrind_clean the
suppression file should help get a clean run of an MPI application despite
several buffers not being free'd by MPI_Finalize. Is this assumption still
valid? If so, maybe the suppression file needs an update as I still see
reports on leaked memory allocated in MPI_Init?
2) There seem to be several invalid reads and writes in the
opal_shmem_segment_* functions. Are they significant or can we regard them
as false positives?
3) The code example attached allocates memory using
MPI_Win_allocate_shared and frees it using MPI_Win_free. However, Valgrind
==4020== 16 bytes in 1 blocks are definitely lost in loss record 21 of 234
==4020== at 0x4C2DB8F: malloc (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4020== by 0xCFDCD47: component_select (osc_sm_component.c:277)
==4020== by 0x4F39FC3: ompi_osc_base_select (osc_base_init.c:73)
==4020== by 0x4E945DC: ompi_win_allocate_shared (win.c:272)
==4020== by 0x4EF6576: PMPI_Win_allocate_shared
(pwin_allocate_shared.c:80)
==4020== by 0x400E96: main (mpi_dynamic_win_free.c:48)
Can someone please confirm that we the way the shared window memory is
free'd is actually correct? I noticed that the amount of memory that is
reported to be leaking scales with the number of windows that are allocated
and free'd. In our case this happens in a set of unit tests that all
allocate their own shared memory windows and thus the amount of leaked
memory piles up quite a bit.
I build the code using GCC 5.4.0 using OpenMPI 2.0.1 and ran it on a
$ mpicc -Wall -ggdb mpi_dynamic_win_free.c -o mpi_dynamic_win_free
$ mpirun -n 2 valgrind --leak-check=full
--suppressions=$HOME/opt/openmpi-2.0.1/share/openmpi/openmpi-valgrind.supp
./mpi_dynamic_win_free
Best regards,
Joseph
--
Dipl.-Inf. Joseph Schuchart
High Performance Computing Center Stuttgart (HLRS)
Nobelstr. 19
D-70569 Stuttgart
Tel.: +49(0)711-68565890
Fax: +49(0)711-6856832
<valgrind_mpi.log><mpi_dynamic_win_free.c>_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
--
Dipl.-Inf. Joseph Schuchart
High Performance Computing Center Stuttgart (HLRS)
Nobelstr. 19
D-70569 Stuttgart
Tel.: +49(0)711-68565890
Fax: +49(0)711-6856832
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Joseph Schuchart
2016-11-21 17:31:23 UTC
Permalink
Gilles,

Thanks a lot for the fix. I just tested on master and can confirm that
both the leak and the invalid read-writes are gone. However, the
suppression file still does not filter the memory that is allocated in
MPI_Init and not properly free'd at the end. Any chance this can be
filtered using the suppression file?

Cheers,
Joseph
Post by Gilles Gouaillardet
Joseph,
thanks for the report, this is a real memory leak.
i fixed it in master, and the fix is now being reviewed.
meanwhile, you can manually apply the patch available at
https://github.com/open-mpi/ompi/pull/2418.patch
Cheers,
Gilles
Post by Joseph Schuchart
Hi Luke,
Thanks for your reply. From my understanding, the wrappers mainly help catch
errors on the MPI API level. The errors I reported are well below the API
layer (please correct me if I'm wrong here) However, I re-ran the code with
the wrapper loaded via LD_PRELOAD and without the suppresion file and the
warnings issued by Valgrind for the shmem segment handling code and leaking
memory from MPI_Win_allocate_shared are basically the same. Nevertheless, I
am attaching the full log of that run as well.
Cheers
Joseph
Post by D'Alessandro, Luke K
Hi Joesph,
I don’t have a solution to your issue, but I’ve found that the valgrind
mpi wrapper is necessary to eliminate many of the false positives that the
suppressions file can’t.
http://valgrind.org/docs/manual/mc-manual.html#mc-manual.mpiwrap.gettingstarted
You should LD_PRELOAD the libmpiwrap from your installation. If it’s not
there then you can rebuild valgrind with CC=mpicc to have it built.
Hope this helps move you towards a solution.
Luke
Post by Joseph Schuchart
All,
I am investigating an MPI application using Valgrind and see a load of
memory leaks reported in MPI-related code. Please find the full log
1) According to the information available at
https://www.open-mpi.org/faq/?category=debugging#valgrind_clean the
suppression file should help get a clean run of an MPI application despite
several buffers not being free'd by MPI_Finalize. Is this assumption still
valid? If so, maybe the suppression file needs an update as I still see
reports on leaked memory allocated in MPI_Init?
2) There seem to be several invalid reads and writes in the
opal_shmem_segment_* functions. Are they significant or can we regard them
as false positives?
3) The code example attached allocates memory using
MPI_Win_allocate_shared and frees it using MPI_Win_free. However, Valgrind
==4020== 16 bytes in 1 blocks are definitely lost in loss record 21 of 234
==4020== at 0x4C2DB8F: malloc (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4020== by 0xCFDCD47: component_select (osc_sm_component.c:277)
==4020== by 0x4F39FC3: ompi_osc_base_select (osc_base_init.c:73)
==4020== by 0x4E945DC: ompi_win_allocate_shared (win.c:272)
==4020== by 0x4EF6576: PMPI_Win_allocate_shared
(pwin_allocate_shared.c:80)
==4020== by 0x400E96: main (mpi_dynamic_win_free.c:48)
Can someone please confirm that we the way the shared window memory is
free'd is actually correct? I noticed that the amount of memory that is
reported to be leaking scales with the number of windows that are allocated
and free'd. In our case this happens in a set of unit tests that all
allocate their own shared memory windows and thus the amount of leaked
memory piles up quite a bit.
I build the code using GCC 5.4.0 using OpenMPI 2.0.1 and ran it on a
$ mpicc -Wall -ggdb mpi_dynamic_win_free.c -o mpi_dynamic_win_free
$ mpirun -n 2 valgrind --leak-check=full
--suppressions=$HOME/opt/openmpi-2.0.1/share/openmpi/openmpi-valgrind.supp
./mpi_dynamic_win_free
Best regards,
Joseph
--
Dipl.-Inf. Joseph Schuchart
High Performance Computing Center Stuttgart (HLRS)
Nobelstr. 19
D-70569 Stuttgart
Tel.: +49(0)711-68565890
Fax: +49(0)711-6856832
<valgrind_mpi.log><mpi_dynamic_win_free.c>_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
--
Dipl.-Inf. Joseph Schuchart
High Performance Computing Center Stuttgart (HLRS)
Nobelstr. 19
D-70569 Stuttgart
Tel.: +49(0)711-68565890
Fax: +49(0)711-6856832
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
--
Dipl.-Inf. Joseph Schuchart
High Performance Computing Center Stuttgart (HLRS)
Nobelstr. 19
D-70569 Stuttgart

Tel.: +49(0)711-68565890
Fax: +49(0)711-6856832
E-Mail: ***@hlrs.de
Gilles Gouaillardet
2016-11-21 23:54:09 UTC
Permalink
Joseph,


the goal is to plug all the memory leaks (e.g. not simply filter them)

there is an ongoing effort at https://github.com/open-mpi/ompi/pull/2175

but i must admit this is not a high priority one (and it will take a bit
more time before all the fixes land

into the stable branches


Cheers,


Gilles
Post by Joseph Schuchart
Gilles,
Thanks a lot for the fix. I just tested on master and can confirm that
both the leak and the invalid read-writes are gone. However, the
suppression file still does not filter the memory that is allocated in
MPI_Init and not properly free'd at the end. Any chance this can be
filtered using the suppression file?
Cheers,
Joseph
Post by Gilles Gouaillardet
Joseph,
thanks for the report, this is a real memory leak.
i fixed it in master, and the fix is now being reviewed.
meanwhile, you can manually apply the patch available at
https://github.com/open-mpi/ompi/pull/2418.patch
Cheers,
Gilles
Post by Joseph Schuchart
Hi Luke,
Thanks for your reply. From my understanding, the wrappers mainly help catch
errors on the MPI API level. The errors I reported are well below the API
layer (please correct me if I'm wrong here) However, I re-ran the code with
the wrapper loaded via LD_PRELOAD and without the suppresion file and the
warnings issued by Valgrind for the shmem segment handling code and leaking
memory from MPI_Win_allocate_shared are basically the same.
Nevertheless, I
am attaching the full log of that run as well.
Cheers
Joseph
Post by D'Alessandro, Luke K
Hi Joesph,
I don’t have a solution to your issue, but I’ve found that the valgrind
mpi wrapper is necessary to eliminate many of the false positives that the
suppressions file can’t.
http://valgrind.org/docs/manual/mc-manual.html#mc-manual.mpiwrap.gettingstarted
You should LD_PRELOAD the libmpiwrap from your installation. If it’s not
there then you can rebuild valgrind with CC=mpicc to have it built.
Hope this helps move you towards a solution.
Luke
Post by Joseph Schuchart
All,
I am investigating an MPI application using Valgrind and see a load of
memory leaks reported in MPI-related code. Please find the full log
1) According to the information available at
https://www.open-mpi.org/faq/?category=debugging#valgrind_clean the
suppression file should help get a clean run of an MPI application despite
several buffers not being free'd by MPI_Finalize. Is this
assumption still
valid? If so, maybe the suppression file needs an update as I still see
reports on leaked memory allocated in MPI_Init?
2) There seem to be several invalid reads and writes in the
opal_shmem_segment_* functions. Are they significant or can we regard them
as false positives?
3) The code example attached allocates memory using
MPI_Win_allocate_shared and frees it using MPI_Win_free. However, Valgrind
==4020== 16 bytes in 1 blocks are definitely lost in loss record
21 of
234
==4020== at 0x4C2DB8F: malloc (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4020== by 0xCFDCD47: component_select (osc_sm_component.c:277)
==4020== by 0x4F39FC3: ompi_osc_base_select (osc_base_init.c:73)
==4020== by 0x4E945DC: ompi_win_allocate_shared (win.c:272)
==4020== by 0x4EF6576: PMPI_Win_allocate_shared
(pwin_allocate_shared.c:80)
==4020== by 0x400E96: main (mpi_dynamic_win_free.c:48)
Can someone please confirm that we the way the shared window memory is
free'd is actually correct? I noticed that the amount of memory that is
reported to be leaking scales with the number of windows that are allocated
and free'd. In our case this happens in a set of unit tests that all
allocate their own shared memory windows and thus the amount of leaked
memory piles up quite a bit.
I build the code using GCC 5.4.0 using OpenMPI 2.0.1 and ran it on a
$ mpicc -Wall -ggdb mpi_dynamic_win_free.c -o mpi_dynamic_win_free
$ mpirun -n 2 valgrind --leak-check=full
--suppressions=$HOME/opt/openmpi-2.0.1/share/openmpi/openmpi-valgrind.supp
./mpi_dynamic_win_free
Best regards,
Joseph
--
Dipl.-Inf. Joseph Schuchart
High Performance Computing Center Stuttgart (HLRS)
Nobelstr. 19
D-70569 Stuttgart
Tel.: +49(0)711-68565890
Fax: +49(0)711-6856832
<valgrind_mpi.log><mpi_dynamic_win_free.c>_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
--
Dipl.-Inf. Joseph Schuchart
High Performance Computing Center Stuttgart (HLRS)
Nobelstr. 19
D-70569 Stuttgart
Tel.: +49(0)711-68565890
Fax: +49(0)711-6856832
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Loading...