Discussion:
[OMPI users] Remote progress in MPI_Win_flush_local
Joseph Schuchart
2017-06-23 14:20:16 UTC
Permalink
All,

We employ the following pattern to send signals between processes:

```
int com_rank, root = 0;
// allocate MPI window
MPI_Win win = allocate_win();
// do some computation
...
// Process 0 waits for a signal
if (com_rank == root) {
do {
MPI_Fetch_and_op(NULL, &res,
MPI_INT, com_rank, 0, MPI_NO_OP, win);
MPI_Win_flush_local(com_rank, win);
} while (res == 0);
} else {
MPI_Accumulate(
&val, &res, 1, MPI_INT, root, 0, MPI_SUM, win);
MPI_Win_flush(root, win);
}
[...]
```

We use MPI_Fetch_and_op to atomically query the local memory location
for a signal and MPI_Accumulate to send the signal (I have omitted the
reset and other details for simplicity).

If running on a single node (my laptop), this code snippet reproducibly
hangs, with the root process indefinitely repeating the do-while-loop
and all other processes being stuck in MPI_Win_flush.

An interesting observation here is that if I replace the
MPI_Win_flush_local with MPI_Win_flush the application does not hang.
However, my understanding is that a local flush should be sufficient for
MPI_Fetch_and_op with MPI_NO_OP as remote completion is not required.

I do not observe this hang with MPICH 3.2 and I am aware that the
progress semantics of MPI are rather vague. However, I'm curious whether
this difference is intended and whether or not repeatedly calling into
MPI communication functions (that do not block) should provide progress
for incoming RMA operations?

Any input is much appreciated.

Cheers
Joseph
Nathan Hjelm
2017-06-23 14:31:31 UTC
Permalink
This is not the intended behavior. Please open a bug on github.

-Nathan

On Jun 23, 2017, at 08:21 AM, Joseph Schuchart <***@hlrs.de> wrote:

All,

We employ the following pattern to send signals between processes:

```
int com_rank, root = 0;
// allocate MPI window
MPI_Win win = allocate_win();
// do some computation
...
// Process 0 waits for a signal
if (com_rank == root) {
do {
MPI_Fetch_and_op(NULL, &res,
MPI_INT, com_rank, 0, MPI_NO_OP, win);
MPI_Win_flush_local(com_rank, win);
} while (res == 0);
} else {
MPI_Accumulate(
&val, &res, 1, MPI_INT, root, 0, MPI_SUM, win);
MPI_Win_flush(root, win);
}
[...]
```

We use MPI_Fetch_and_op to atomically query the local memory location
for a signal and MPI_Accumulate to send the signal (I have omitted the
reset and other details for simplicity).

If running on a single node (my laptop), this code snippet reproducibly
hangs, with the root process indefinitely repeating the do-while-loop
and all other processes being stuck in MPI_Win_flush.

An interesting observation here is that if I replace the
MPI_Win_flush_local with MPI_Win_flush the application does not hang.
However, my understanding is that a local flush should be sufficient for
MPI_Fetch_and_op with MPI_NO_OP as remote completion is not required.

I do not observe this hang with MPICH 3.2 and I am aware that the
progress semantics of MPI are rather vague. However, I'm curious whether
this difference is intended and whether or not repeatedly calling into
MPI communication functions (that do not block) should provide progress
for incoming RMA operations?

Any input is much appreciated.

Cheers
Joseph

Loading...