Yvan Fournier
2016-11-05 19:22:41 UTC
Hello,
Yes, as I had hinted in the my message, I observed the bug in an irregular
manner.
Glad to see it could be fixed so quickly (it affects 2.0 too). I had observed it
for some time, but only recently took the time to make a proper simplified case
and investigate. Guess I should have submitted the issue sooner...
Best regards,
Yvan Fournier
Yes, as I had hinted in the my message, I observed the bug in an irregular
manner.
Glad to see it could be fixed so quickly (it affects 2.0 too). I had observed it
for some time, but only recently took the time to make a proper simplified case
and investigate. Guess I should have submitted the issue sooner...
Best regards,
Yvan Fournier
Message: 5
Date: Sat, 5 Nov 2016 22:08:32 +0900
Subject: Re: [OMPI users] False positives and even failure with Open
MPI and memchecker
Content-Type: text/plain; charset=UTF-8
that really looks like a bug
if you rewrite your program with
MPI_Sendrecv(&l, 1, MPI_INT, rank_next, tag, &l_prev, 1, MPI_INT,
rank_prev, tag, MPI_COMM_WORLD, &status);
or even
MPI_Irecv(&l_prev, 1, MPI_INT, rank_prev, tag, MPI_COMM_WORLD, &req);
MPI_Send(&l, 1, MPI_INT, rank_next, tag, MPI_COMM_WORLD);
MPI_Wait(&req, &status);
then there is no more valgrind warning
iirc, Open MPI marks the receive buffer as invalid memory, so it can
check only MPI subroutine updates it. it looks like a step is missing
in the case of MPI_Recv()
Cheers,
Gilles
On Sat, Nov 5, 2016 at 9:48 PM, Gilles Gouaillardet
Message: 6
Date: Sat, 5 Nov 2016 23:12:54 +0900
Subject: Re: [OMPI users] False positives and even failure with Open
MPI and memchecker
Content-Type: text/plain; charset="utf-8"
so it seems we took some shortcuts in pml/ob1
the attached patch (for the v1.10 branch) should fix this issue
Cheers
Gilles
On Sat, Nov 5, 2016 at 10:08 PM, Gilles Gouaillardet
diff --git a/ompi/mca/pml/ob1/pml_ob1_irecv.c
b/ompi/mca/pml/ob1/pml_ob1_irecv.c
index 56826a2..97a6a38 100644
--- a/ompi/mca/pml/ob1/pml_ob1_irecv.c
+++ b/ompi/mca/pml/ob1/pml_ob1_irecv.c
@@ -30,6 +30,7 @@
#include "pml_ob1_recvfrag.h"
#include "ompi/peruse/peruse-internal.h"
#include "ompi/message/message.h"
+#include "ompi/memchecker.h"
mca_pml_ob1_recv_request_t *mca_pml_ob1_recvreq = NULL;
@@ -128,6 +129,17 @@ int mca_pml_ob1_recv(void *addr,
rc = recvreq->req_recv.req_base.req_ompi.req_status.MPI_ERROR;
+ if (recvreq->req_recv.req_base.req_pml_complete) {
+ /* make buffer defined when the request is compeleted,
+ and before releasing the objects. */
+ MEMCHECKER(
+ memchecker_call(&opal_memchecker_base_mem_defined,
+ recvreq->req_recv.req_base.req_addr,
+ recvreq->req_recv.req_base.req_count,
+ recvreq->req_recv.req_base.req_datatype);
+ );
+ }
+
#if OMPI_ENABLE_THREAD_MULTIPLE
MCA_PML_OB1_RECV_REQUEST_RETURN(recvreq);
#else
------------------------------
Subject: Digest Footer
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
------------------------------
End of users Digest, Vol 3645, Issue 1
**************************************
Date: Sat, 5 Nov 2016 22:08:32 +0900
Subject: Re: [OMPI users] False positives and even failure with Open
MPI and memchecker
Content-Type: text/plain; charset=UTF-8
that really looks like a bug
if you rewrite your program with
MPI_Sendrecv(&l, 1, MPI_INT, rank_next, tag, &l_prev, 1, MPI_INT,
rank_prev, tag, MPI_COMM_WORLD, &status);
or even
MPI_Irecv(&l_prev, 1, MPI_INT, rank_prev, tag, MPI_COMM_WORLD, &req);
MPI_Send(&l, 1, MPI_INT, rank_next, tag, MPI_COMM_WORLD);
MPI_Wait(&req, &status);
then there is no more valgrind warning
iirc, Open MPI marks the receive buffer as invalid memory, so it can
check only MPI subroutine updates it. it looks like a step is missing
in the case of MPI_Recv()
Cheers,
Gilles
On Sat, Nov 5, 2016 at 9:48 PM, Gilles Gouaillardet
Hi,
note your printf line is missing.
if you printf l_prev, then the valgrind error occurs in all variants
at first glance, it looks like a false positive, and i will investigate it
Cheers,
Gilles
------------------------------note your printf line is missing.
if you printf l_prev, then the valgrind error occurs in all variants
at first glance, it looks like a false positive, and i will investigate it
Cheers,
Gilles
Hello,
I have observed what seems to be false positives running under Valgrind
when Open MPI is build with --enable-memchecker
(at least with versions 1.10.4 and 2.0.1).
Attached is a simple test case (extracted from larger code) that sends one
int to rank r+1, and receives from rank r-1
(using MPI_COMM_NULL to handle ranks below 0 or above comm size).
~/opt/openmpi-2.0/bin/mpicc -DVARIANT_1 vg_mpi.c
~/opt/openmpi-2.0/bin/mpiexec -output-filename vg_log -n 2 valgrind
./a.out
==8382== Invalid read of size 4
==8382== at 0x400A00: main (in /home/yvan/test/a.out)
==8382== Address 0xffefffe70 is on thread 1's stack
==8382== in frame #0, created by main (???:)
~/opt/openmpi-2.0/bin/mpicc -DVARIANT_2 vg_mpi.c
~/opt/openmpi-2.0/bin/mpiexec -output-filename vg_log -n 2 valgrind
./a.out
==8322== Invalid read of size 4
==8322== at 0x400A6C: main (in /home/yvan/test/a.out)
==8322== Address 0xcb6f9a0 is 0 bytes inside a block of size 4 alloc'd
==8322== at 0x4C29BBE: malloc (in /usr/lib/valgrind/vgpreload_memcheck-
amd64-linux.so)
==8322== by 0x400998: main (in /home/yvan/test/a.out)
I get no error for the default variant (no -D_VARIANT...) with either Open
MPI 2.0.1, or 1.10.4,
but de get an error similar to variant 1 on the parent code from which the
example was extracted...
is given below. Running under Valgrind's gdb server, for the parent code
of variant 1,
it even seems the value received on rank 1 is uninitialized, then Valgrind
complains
with the given message.
The code fails to work as intended when run under Valgrind when OpenMPI is
built with --enable-memchecker,
while it works fine when run with the same build but not under Valgrind,
or when run under Valgrind with Open MPI built without memchecker.
I'm running under Arch Linux (whosed packaged Open MPI 1.10.4 is built
with memchecker enabled,
rendering it unusable under Valgrind).
Did anybody else encounter this type of issue, or I does my code contain
an obvious mistake that I am missing ?
I initially though of possible alignment issues, but saw nothing in the
standard that requires that,
and the "malloc"-base variant exhibits the same behavior,while I assume
alignment to 64-bits for allocated arrays is the default.
Best regards,
Yvan Fournier
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
I have observed what seems to be false positives running under Valgrind
when Open MPI is build with --enable-memchecker
(at least with versions 1.10.4 and 2.0.1).
Attached is a simple test case (extracted from larger code) that sends one
int to rank r+1, and receives from rank r-1
(using MPI_COMM_NULL to handle ranks below 0 or above comm size).
~/opt/openmpi-2.0/bin/mpicc -DVARIANT_1 vg_mpi.c
~/opt/openmpi-2.0/bin/mpiexec -output-filename vg_log -n 2 valgrind
./a.out
==8382== Invalid read of size 4
==8382== at 0x400A00: main (in /home/yvan/test/a.out)
==8382== Address 0xffefffe70 is on thread 1's stack
==8382== in frame #0, created by main (???:)
~/opt/openmpi-2.0/bin/mpicc -DVARIANT_2 vg_mpi.c
~/opt/openmpi-2.0/bin/mpiexec -output-filename vg_log -n 2 valgrind
./a.out
==8322== Invalid read of size 4
==8322== at 0x400A6C: main (in /home/yvan/test/a.out)
==8322== Address 0xcb6f9a0 is 0 bytes inside a block of size 4 alloc'd
==8322== at 0x4C29BBE: malloc (in /usr/lib/valgrind/vgpreload_memcheck-
amd64-linux.so)
==8322== by 0x400998: main (in /home/yvan/test/a.out)
I get no error for the default variant (no -D_VARIANT...) with either Open
MPI 2.0.1, or 1.10.4,
but de get an error similar to variant 1 on the parent code from which the
example was extracted...
is given below. Running under Valgrind's gdb server, for the parent code
of variant 1,
it even seems the value received on rank 1 is uninitialized, then Valgrind
complains
with the given message.
The code fails to work as intended when run under Valgrind when OpenMPI is
built with --enable-memchecker,
while it works fine when run with the same build but not under Valgrind,
or when run under Valgrind with Open MPI built without memchecker.
I'm running under Arch Linux (whosed packaged Open MPI 1.10.4 is built
with memchecker enabled,
rendering it unusable under Valgrind).
Did anybody else encounter this type of issue, or I does my code contain
an obvious mistake that I am missing ?
I initially though of possible alignment issues, but saw nothing in the
standard that requires that,
and the "malloc"-base variant exhibits the same behavior,while I assume
alignment to 64-bits for allocated arrays is the default.
Best regards,
Yvan Fournier
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Message: 6
Date: Sat, 5 Nov 2016 23:12:54 +0900
Subject: Re: [OMPI users] False positives and even failure with Open
MPI and memchecker
Content-Type: text/plain; charset="utf-8"
so it seems we took some shortcuts in pml/ob1
the attached patch (for the v1.10 branch) should fix this issue
Cheers
Gilles
On Sat, Nov 5, 2016 at 10:08 PM, Gilles Gouaillardet
that really looks like a bug
if you rewrite your program with
MPI_Sendrecv(&l, 1, MPI_INT, rank_next, tag, &l_prev, 1, MPI_INT,
rank_prev, tag, MPI_COMM_WORLD, &status);
or even
MPI_Irecv(&l_prev, 1, MPI_INT, rank_prev, tag, MPI_COMM_WORLD, &req);
MPI_Send(&l, 1, MPI_INT, rank_next, tag, MPI_COMM_WORLD);
MPI_Wait(&req, &status);
then there is no more valgrind warning
iirc, Open MPI marks the receive buffer as invalid memory, so it can
check only MPI subroutine updates it. it looks like a step is missing
in the case of MPI_Recv()
Cheers,
Gilles
On Sat, Nov 5, 2016 at 9:48 PM, Gilles Gouaillardet
-------------- next part --------------if you rewrite your program with
MPI_Sendrecv(&l, 1, MPI_INT, rank_next, tag, &l_prev, 1, MPI_INT,
rank_prev, tag, MPI_COMM_WORLD, &status);
or even
MPI_Irecv(&l_prev, 1, MPI_INT, rank_prev, tag, MPI_COMM_WORLD, &req);
MPI_Send(&l, 1, MPI_INT, rank_next, tag, MPI_COMM_WORLD);
MPI_Wait(&req, &status);
then there is no more valgrind warning
iirc, Open MPI marks the receive buffer as invalid memory, so it can
check only MPI subroutine updates it. it looks like a step is missing
in the case of MPI_Recv()
Cheers,
Gilles
On Sat, Nov 5, 2016 at 9:48 PM, Gilles Gouaillardet
Hi,
note your printf line is missing.
if you printf l_prev, then the valgrind error occurs in all variants
at first glance, it looks like a false positive, and i will investigate it
Cheers,
Gilles
note your printf line is missing.
if you printf l_prev, then the valgrind error occurs in all variants
at first glance, it looks like a false positive, and i will investigate it
Cheers,
Gilles
Hello,
I have observed what seems to be false positives running under Valgrind
when Open MPI is build with --enable-memchecker
(at least with versions 1.10.4 and 2.0.1).
Attached is a simple test case (extracted from larger code) that sends
one int to rank r+1, and receives from rank r-1
(using MPI_COMM_NULL to handle ranks below 0 or above comm size).
~/opt/openmpi-2.0/bin/mpicc -DVARIANT_1 vg_mpi.c
~/opt/openmpi-2.0/bin/mpiexec -output-filename vg_log -n 2 valgrind
./a.out
==8382== Invalid read of size 4
==8382== at 0x400A00: main (in /home/yvan/test/a.out)
==8382== Address 0xffefffe70 is on thread 1's stack
==8382== in frame #0, created by main (???:)
~/opt/openmpi-2.0/bin/mpicc -DVARIANT_2 vg_mpi.c
~/opt/openmpi-2.0/bin/mpiexec -output-filename vg_log -n 2 valgrind
./a.out
==8322== Invalid read of size 4
==8322== at 0x400A6C: main (in /home/yvan/test/a.out)
==8322== Address 0xcb6f9a0 is 0 bytes inside a block of size 4 alloc'd
==8322== at 0x4C29BBE: malloc (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8322== by 0x400998: main (in /home/yvan/test/a.out)
I get no error for the default variant (no -D_VARIANT...) with either
Open MPI 2.0.1, or 1.10.4,
but de get an error similar to variant 1 on the parent code from which
the example was extracted...
is given below. Running under Valgrind's gdb server, for the parent code
of variant 1,
it even seems the value received on rank 1 is uninitialized, then
Valgrind complains
with the given message.
The code fails to work as intended when run under Valgrind when OpenMPI
is built with --enable-memchecker,
while it works fine when run with the same build but not under Valgrind,
or when run under Valgrind with Open MPI built without memchecker.
I'm running under Arch Linux (whosed packaged Open MPI 1.10.4 is built
with memchecker enabled,
rendering it unusable under Valgrind).
Did anybody else encounter this type of issue, or I does my code contain
an obvious mistake that I am missing ?
I initially though of possible alignment issues, but saw nothing in the
standard that requires that,
and the "malloc"-base variant exhibits the same behavior,while I assume
alignment to 64-bits for allocated arrays is the default.
Best regards,
Yvan Fournier
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
I have observed what seems to be false positives running under Valgrind
when Open MPI is build with --enable-memchecker
(at least with versions 1.10.4 and 2.0.1).
Attached is a simple test case (extracted from larger code) that sends
one int to rank r+1, and receives from rank r-1
(using MPI_COMM_NULL to handle ranks below 0 or above comm size).
~/opt/openmpi-2.0/bin/mpicc -DVARIANT_1 vg_mpi.c
~/opt/openmpi-2.0/bin/mpiexec -output-filename vg_log -n 2 valgrind
./a.out
==8382== Invalid read of size 4
==8382== at 0x400A00: main (in /home/yvan/test/a.out)
==8382== Address 0xffefffe70 is on thread 1's stack
==8382== in frame #0, created by main (???:)
~/opt/openmpi-2.0/bin/mpicc -DVARIANT_2 vg_mpi.c
~/opt/openmpi-2.0/bin/mpiexec -output-filename vg_log -n 2 valgrind
./a.out
==8322== Invalid read of size 4
==8322== at 0x400A6C: main (in /home/yvan/test/a.out)
==8322== Address 0xcb6f9a0 is 0 bytes inside a block of size 4 alloc'd
==8322== at 0x4C29BBE: malloc (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8322== by 0x400998: main (in /home/yvan/test/a.out)
I get no error for the default variant (no -D_VARIANT...) with either
Open MPI 2.0.1, or 1.10.4,
but de get an error similar to variant 1 on the parent code from which
the example was extracted...
is given below. Running under Valgrind's gdb server, for the parent code
of variant 1,
it even seems the value received on rank 1 is uninitialized, then
Valgrind complains
with the given message.
The code fails to work as intended when run under Valgrind when OpenMPI
is built with --enable-memchecker,
while it works fine when run with the same build but not under Valgrind,
or when run under Valgrind with Open MPI built without memchecker.
I'm running under Arch Linux (whosed packaged Open MPI 1.10.4 is built
with memchecker enabled,
rendering it unusable under Valgrind).
Did anybody else encounter this type of issue, or I does my code contain
an obvious mistake that I am missing ?
I initially though of possible alignment issues, but saw nothing in the
standard that requires that,
and the "malloc"-base variant exhibits the same behavior,while I assume
alignment to 64-bits for allocated arrays is the default.
Best regards,
Yvan Fournier
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
diff --git a/ompi/mca/pml/ob1/pml_ob1_irecv.c
b/ompi/mca/pml/ob1/pml_ob1_irecv.c
index 56826a2..97a6a38 100644
--- a/ompi/mca/pml/ob1/pml_ob1_irecv.c
+++ b/ompi/mca/pml/ob1/pml_ob1_irecv.c
@@ -30,6 +30,7 @@
#include "pml_ob1_recvfrag.h"
#include "ompi/peruse/peruse-internal.h"
#include "ompi/message/message.h"
+#include "ompi/memchecker.h"
mca_pml_ob1_recv_request_t *mca_pml_ob1_recvreq = NULL;
@@ -128,6 +129,17 @@ int mca_pml_ob1_recv(void *addr,
rc = recvreq->req_recv.req_base.req_ompi.req_status.MPI_ERROR;
+ if (recvreq->req_recv.req_base.req_pml_complete) {
+ /* make buffer defined when the request is compeleted,
+ and before releasing the objects. */
+ MEMCHECKER(
+ memchecker_call(&opal_memchecker_base_mem_defined,
+ recvreq->req_recv.req_base.req_addr,
+ recvreq->req_recv.req_base.req_count,
+ recvreq->req_recv.req_base.req_datatype);
+ );
+ }
+
#if OMPI_ENABLE_THREAD_MULTIPLE
MCA_PML_OB1_RECV_REQUEST_RETURN(recvreq);
#else
------------------------------
Subject: Digest Footer
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
------------------------------
End of users Digest, Vol 3645, Issue 1
**************************************