Joseph Schuchart
2017-03-01 15:03:47 UTC
Hi all,
We are seeing issues in one of our applications, in which processes in a
shared communicator allocate a shared MPI window and execute
MPI_Accumulate simultaneously on it to iteratively update each process'
values. The test boils down to the sample code attached. Sample output
is as follows:
```
$ mpirun -n 4 ./mpi_shared_accumulate
[1] baseptr[0]: 1010 (expected 1010)
[1] baseptr[1]: 1011 (expected 1011)
[1] baseptr[2]: 1012 (expected 1012)
[1] baseptr[3]: 1013 (expected 1013)
[1] baseptr[4]: 1014 (expected 1014)
[2] baseptr[0]: 1005 (expected 1010) [!!!]
[2] baseptr[1]: 1006 (expected 1011) [!!!]
[2] baseptr[2]: 1007 (expected 1012) [!!!]
[2] baseptr[3]: 1008 (expected 1013) [!!!]
[2] baseptr[4]: 1009 (expected 1014) [!!!]
[3] baseptr[0]: 1010 (expected 1010)
[0] baseptr[0]: 1010 (expected 1010)
[0] baseptr[1]: 1011 (expected 1011)
[0] baseptr[2]: 1012 (expected 1012)
[0] baseptr[3]: 1013 (expected 1013)
[0] baseptr[4]: 1014 (expected 1014)
[3] baseptr[1]: 1011 (expected 1011)
[3] baseptr[2]: 1012 (expected 1012)
[3] baseptr[3]: 1013 (expected 1013)
[3] baseptr[4]: 1014 (expected 1014)
```
Each process should hold the same values but sometimes (not on all
executions) random processes diverge (marked through [!!!]).
I made the following observations:
1) The issue occurs with both OpenMPI 1.10.6 and 2.0.2 but not with
MPICH 3.2.
2) The issue occurs only if the window is allocated through
MPI_Win_allocate_shared, using MPI_Win_allocate works fine.
3) The code assumes that MPI_Accumulate atomically updates individual
elements (please correct me if that is not covered by the MPI standard).
Both OpenMPI and the example code were compiled using GCC 5.4.1 and run
on a Linux system (single node). OpenMPI was configure with
--enable-mpi-thread-multiple and --with-threads but the application is
not multi-threaded. Please let me know if you need any other information.
Cheers
Joseph
We are seeing issues in one of our applications, in which processes in a
shared communicator allocate a shared MPI window and execute
MPI_Accumulate simultaneously on it to iteratively update each process'
values. The test boils down to the sample code attached. Sample output
is as follows:
```
$ mpirun -n 4 ./mpi_shared_accumulate
[1] baseptr[0]: 1010 (expected 1010)
[1] baseptr[1]: 1011 (expected 1011)
[1] baseptr[2]: 1012 (expected 1012)
[1] baseptr[3]: 1013 (expected 1013)
[1] baseptr[4]: 1014 (expected 1014)
[2] baseptr[0]: 1005 (expected 1010) [!!!]
[2] baseptr[1]: 1006 (expected 1011) [!!!]
[2] baseptr[2]: 1007 (expected 1012) [!!!]
[2] baseptr[3]: 1008 (expected 1013) [!!!]
[2] baseptr[4]: 1009 (expected 1014) [!!!]
[3] baseptr[0]: 1010 (expected 1010)
[0] baseptr[0]: 1010 (expected 1010)
[0] baseptr[1]: 1011 (expected 1011)
[0] baseptr[2]: 1012 (expected 1012)
[0] baseptr[3]: 1013 (expected 1013)
[0] baseptr[4]: 1014 (expected 1014)
[3] baseptr[1]: 1011 (expected 1011)
[3] baseptr[2]: 1012 (expected 1012)
[3] baseptr[3]: 1013 (expected 1013)
[3] baseptr[4]: 1014 (expected 1014)
```
Each process should hold the same values but sometimes (not on all
executions) random processes diverge (marked through [!!!]).
I made the following observations:
1) The issue occurs with both OpenMPI 1.10.6 and 2.0.2 but not with
MPICH 3.2.
2) The issue occurs only if the window is allocated through
MPI_Win_allocate_shared, using MPI_Win_allocate works fine.
3) The code assumes that MPI_Accumulate atomically updates individual
elements (please correct me if that is not covered by the MPI standard).
Both OpenMPI and the example code were compiled using GCC 5.4.1 and run
on a Linux system (single node). OpenMPI was configure with
--enable-mpi-thread-multiple and --with-threads but the application is
not multi-threaded. Please let me know if you need any other information.
Cheers
Joseph
--
Dipl.-Inf. Joseph Schuchart
High Performance Computing Center Stuttgart (HLRS)
Nobelstr. 19
D-70569 Stuttgart
Tel.: +49(0)711-68565890
Fax: +49(0)711-6856832
E-Mail: ***@hlrs.de
Dipl.-Inf. Joseph Schuchart
High Performance Computing Center Stuttgart (HLRS)
Nobelstr. 19
D-70569 Stuttgart
Tel.: +49(0)711-68565890
Fax: +49(0)711-6856832
E-Mail: ***@hlrs.de