Discussion:
[OMPI users] False positives and even failure with Open MPI and memchecker
Yvan Fournier
2016-11-05 10:59:13 UTC
Permalink
Hello,

I have observed what seems to be false positives running under Valgrind when Open MPI is build with --enable-memchecker
(at least with versions 1.10.4 and 2.0.1).

Attached is a simple test case (extracted from larger code) that sends one int to rank r+1, and receives from rank r-1
(using MPI_COMM_NULL to handle ranks below 0 or above comm size).

Using:

~/opt/openmpi-2.0/bin/mpicc -DVARIANT_1 vg_mpi.c
~/opt/openmpi-2.0/bin/mpiexec -output-filename vg_log -n 2 valgrind ./a.out

I get the following Valgrind error for rank 1:

==8382== Invalid read of size 4
==8382==    at 0x400A00: main (in /home/yvan/test/a.out)
==8382==  Address 0xffefffe70 is on thread 1's stack
==8382==  in frame #0, created by main (???:)


Using:

~/opt/openmpi-2.0/bin/mpicc -DVARIANT_2 vg_mpi.c
~/opt/openmpi-2.0/bin/mpiexec -output-filename vg_log -n 2 valgrind ./a.out

I get the following Valgrind error for rank 1:

==8322== Invalid read of size 4
==8322==    at 0x400A6C: main (in /home/yvan/test/a.out)
==8322==  Address 0xcb6f9a0 is 0 bytes inside a block of size 4 alloc'd
==8322==    at 0x4C29BBE: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8322==    by 0x400998: main (in /home/yvan/test/a.out)

I get no error for the default variant (no -D_VARIANT...) with either Open MPI 2.0.1, or 1.10.4,
but de get an error similar to variant 1 on the parent code from which the example was extracted...

is given below. Running under Valgrind's gdb server, for the parent code of variant 1,
it even seems the value received on rank 1 is uninitialized, then Valgrind complains
with the given message.

The code fails to work as intended when run under Valgrind when OpenMPI is built with --enable-memchecker,
while it works fine when run with the same build but not under Valgrind,
or when run under Valgrind with Open MPI built without memchecker.

I'm running under Arch Linux (whosed packaged Open MPI 1.10.4 is built with memchecker enabled,
rendering it unusable under Valgrind).

Did anybody else encounter this type of issue, or I does my code contain an obvious mistake that I am missing ?
I initially though of possible alignment issues, but saw nothing in the standard that requires that,
and the "malloc"-base variant exhibits the same behavior,while I assume
alignment to 64-bits for allocated arrays is the default.

Best regards,

  Yvan Fournier
Gilles Gouaillardet
2016-11-05 12:48:36 UTC
Permalink
Hi,

note your printf line is missing.
if you printf l_prev, then the valgrind error occurs in all variants

at first glance, it looks like a false positive, and i will investigate it


Cheers,

Gilles
Post by Yvan Fournier
Hello,
I have observed what seems to be false positives running under Valgrind when Open MPI is build with --enable-memchecker
(at least with versions 1.10.4 and 2.0.1).
Attached is a simple test case (extracted from larger code) that sends one int to rank r+1, and receives from rank r-1
(using MPI_COMM_NULL to handle ranks below 0 or above comm size).
~/opt/openmpi-2.0/bin/mpicc -DVARIANT_1 vg_mpi.c
~/opt/openmpi-2.0/bin/mpiexec -output-filename vg_log -n 2 valgrind ./a.out
==8382== Invalid read of size 4
==8382== at 0x400A00: main (in /home/yvan/test/a.out)
==8382== Address 0xffefffe70 is on thread 1's stack
==8382== in frame #0, created by main (???:)
~/opt/openmpi-2.0/bin/mpicc -DVARIANT_2 vg_mpi.c
~/opt/openmpi-2.0/bin/mpiexec -output-filename vg_log -n 2 valgrind ./a.out
==8322== Invalid read of size 4
==8322== at 0x400A6C: main (in /home/yvan/test/a.out)
==8322== Address 0xcb6f9a0 is 0 bytes inside a block of size 4 alloc'd
==8322== at 0x4C29BBE: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8322== by 0x400998: main (in /home/yvan/test/a.out)
I get no error for the default variant (no -D_VARIANT...) with either Open MPI 2.0.1, or 1.10.4,
but de get an error similar to variant 1 on the parent code from which the example was extracted...
is given below. Running under Valgrind's gdb server, for the parent code of variant 1,
it even seems the value received on rank 1 is uninitialized, then Valgrind complains
with the given message.
The code fails to work as intended when run under Valgrind when OpenMPI is built with --enable-memchecker,
while it works fine when run with the same build but not under Valgrind,
or when run under Valgrind with Open MPI built without memchecker.
I'm running under Arch Linux (whosed packaged Open MPI 1.10.4 is built with memchecker enabled,
rendering it unusable under Valgrind).
Did anybody else encounter this type of issue, or I does my code contain an obvious mistake that I am missing ?
I initially though of possible alignment issues, but saw nothing in the standard that requires that,
and the "malloc"-base variant exhibits the same behavior,while I assume
alignment to 64-bits for allocated arrays is the default.
Best regards,
Yvan Fournier
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Gilles Gouaillardet
2016-11-05 13:08:32 UTC
Permalink
that really looks like a bug

if you rewrite your program with

MPI_Sendrecv(&l, 1, MPI_INT, rank_next, tag, &l_prev, 1, MPI_INT,
rank_prev, tag, MPI_COMM_WORLD, &status);

or even

MPI_Irecv(&l_prev, 1, MPI_INT, rank_prev, tag, MPI_COMM_WORLD, &req);

MPI_Send(&l, 1, MPI_INT, rank_next, tag, MPI_COMM_WORLD);

MPI_Wait(&req, &status);

then there is no more valgrind warning

iirc, Open MPI marks the receive buffer as invalid memory, so it can
check only MPI subroutine updates it. it looks like a step is missing
in the case of MPI_Recv()


Cheers,

Gilles

On Sat, Nov 5, 2016 at 9:48 PM, Gilles Gouaillardet
Post by Gilles Gouaillardet
Hi,
note your printf line is missing.
if you printf l_prev, then the valgrind error occurs in all variants
at first glance, it looks like a false positive, and i will investigate it
Cheers,
Gilles
Post by Yvan Fournier
Hello,
I have observed what seems to be false positives running under Valgrind when Open MPI is build with --enable-memchecker
(at least with versions 1.10.4 and 2.0.1).
Attached is a simple test case (extracted from larger code) that sends one int to rank r+1, and receives from rank r-1
(using MPI_COMM_NULL to handle ranks below 0 or above comm size).
~/opt/openmpi-2.0/bin/mpicc -DVARIANT_1 vg_mpi.c
~/opt/openmpi-2.0/bin/mpiexec -output-filename vg_log -n 2 valgrind ./a.out
==8382== Invalid read of size 4
==8382== at 0x400A00: main (in /home/yvan/test/a.out)
==8382== Address 0xffefffe70 is on thread 1's stack
==8382== in frame #0, created by main (???:)
~/opt/openmpi-2.0/bin/mpicc -DVARIANT_2 vg_mpi.c
~/opt/openmpi-2.0/bin/mpiexec -output-filename vg_log -n 2 valgrind ./a.out
==8322== Invalid read of size 4
==8322== at 0x400A6C: main (in /home/yvan/test/a.out)
==8322== Address 0xcb6f9a0 is 0 bytes inside a block of size 4 alloc'd
==8322== at 0x4C29BBE: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8322== by 0x400998: main (in /home/yvan/test/a.out)
I get no error for the default variant (no -D_VARIANT...) with either Open MPI 2.0.1, or 1.10.4,
but de get an error similar to variant 1 on the parent code from which the example was extracted...
is given below. Running under Valgrind's gdb server, for the parent code of variant 1,
it even seems the value received on rank 1 is uninitialized, then Valgrind complains
with the given message.
The code fails to work as intended when run under Valgrind when OpenMPI is built with --enable-memchecker,
while it works fine when run with the same build but not under Valgrind,
or when run under Valgrind with Open MPI built without memchecker.
I'm running under Arch Linux (whosed packaged Open MPI 1.10.4 is built with memchecker enabled,
rendering it unusable under Valgrind).
Did anybody else encounter this type of issue, or I does my code contain an obvious mistake that I am missing ?
I initially though of possible alignment issues, but saw nothing in the standard that requires that,
and the "malloc"-base variant exhibits the same behavior,while I assume
alignment to 64-bits for allocated arrays is the default.
Best regards,
Yvan Fournier
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Gilles Gouaillardet
2016-11-05 14:12:54 UTC
Permalink
so it seems we took some shortcuts in pml/ob1

the attached patch (for the v1.10 branch) should fix this issue


Cheers

Gilles



On Sat, Nov 5, 2016 at 10:08 PM, Gilles Gouaillardet
Post by Gilles Gouaillardet
that really looks like a bug
if you rewrite your program with
MPI_Sendrecv(&l, 1, MPI_INT, rank_next, tag, &l_prev, 1, MPI_INT,
rank_prev, tag, MPI_COMM_WORLD, &status);
or even
MPI_Irecv(&l_prev, 1, MPI_INT, rank_prev, tag, MPI_COMM_WORLD, &req);
MPI_Send(&l, 1, MPI_INT, rank_next, tag, MPI_COMM_WORLD);
MPI_Wait(&req, &status);
then there is no more valgrind warning
iirc, Open MPI marks the receive buffer as invalid memory, so it can
check only MPI subroutine updates it. it looks like a step is missing
in the case of MPI_Recv()
Cheers,
Gilles
On Sat, Nov 5, 2016 at 9:48 PM, Gilles Gouaillardet
Post by Gilles Gouaillardet
Hi,
note your printf line is missing.
if you printf l_prev, then the valgrind error occurs in all variants
at first glance, it looks like a false positive, and i will investigate it
Cheers,
Gilles
Post by Yvan Fournier
Hello,
I have observed what seems to be false positives running under Valgrind when Open MPI is build with --enable-memchecker
(at least with versions 1.10.4 and 2.0.1).
Attached is a simple test case (extracted from larger code) that sends one int to rank r+1, and receives from rank r-1
(using MPI_COMM_NULL to handle ranks below 0 or above comm size).
~/opt/openmpi-2.0/bin/mpicc -DVARIANT_1 vg_mpi.c
~/opt/openmpi-2.0/bin/mpiexec -output-filename vg_log -n 2 valgrind ./a.out
==8382== Invalid read of size 4
==8382== at 0x400A00: main (in /home/yvan/test/a.out)
==8382== Address 0xffefffe70 is on thread 1's stack
==8382== in frame #0, created by main (???:)
~/opt/openmpi-2.0/bin/mpicc -DVARIANT_2 vg_mpi.c
~/opt/openmpi-2.0/bin/mpiexec -output-filename vg_log -n 2 valgrind ./a.out
==8322== Invalid read of size 4
==8322== at 0x400A6C: main (in /home/yvan/test/a.out)
==8322== Address 0xcb6f9a0 is 0 bytes inside a block of size 4 alloc'd
==8322== at 0x4C29BBE: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8322== by 0x400998: main (in /home/yvan/test/a.out)
I get no error for the default variant (no -D_VARIANT...) with either Open MPI 2.0.1, or 1.10.4,
but de get an error similar to variant 1 on the parent code from which the example was extracted...
is given below. Running under Valgrind's gdb server, for the parent code of variant 1,
it even seems the value received on rank 1 is uninitialized, then Valgrind complains
with the given message.
The code fails to work as intended when run under Valgrind when OpenMPI is built with --enable-memchecker,
while it works fine when run with the same build but not under Valgrind,
or when run under Valgrind with Open MPI built without memchecker.
I'm running under Arch Linux (whosed packaged Open MPI 1.10.4 is built with memchecker enabled,
rendering it unusable under Valgrind).
Did anybody else encounter this type of issue, or I does my code contain an obvious mistake that I am missing ?
I initially though of possible alignment issues, but saw nothing in the standard that requires that,
and the "malloc"-base variant exhibits the same behavior,while I assume
alignment to 64-bits for allocated arrays is the default.
Best regards,
Yvan Fournier
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Loading...