Discussion:
[OMPI users] Progress issue with dynamic windows
Joseph Schuchart
2017-11-02 03:49:49 UTC
Permalink
All,

I came across what I consider another issue regarding progress in Open
MPI: consider one process (P1) polling locally on a regular window (W1)
for a local value to change (using MPI_Win_lock+MPI_Get+MPI_Win_unlock)
while a second process (P2) tries to read from a memory location in a
dynamic window (W2) on process P1 (using MPI_Rget+MPI_Wait, other
combinations affected as well). P2 will later update the memory location
waited on by P1. However, the read on the dynamic window stalls as the
(local) read on W1 on P1 does not trigger progress on the dynamic window
W2, causing the application to deadlock.

It is my understanding that process P1 should guarantee progress on any
communication it is involved in, irregardless of the window or window
type, and thus the communication should succeed. Is this assumption
correct? Or is P1 required to access W2 as well to ensure progress? I
can trigger progress on W2 on P1 by adding a call to MPI_Iprobe but that
seems like a hack to me. Also, if both W1 and W2 are regular (allocated)
windows the communication succeeds.

I am attaching a small reproducer, tested with Open MPI release 3.0.0 on
a single GNU/Linux node (Linux Mint 18.2, gcc 5.4.1, Linux
4.10.0-38-generic).

Many thanks in advance!

Joseph
--
Dipl.-Inf. Joseph Schuchart
High Performance Computing Center Stuttgart (HLRS)
Nobelstr. 19
D-70569 Stuttgart

Tel.: +49(0)711-68565890
Fax: +49(0)711-6856832
E-Mail: ***@hlrs.de
Nathan Hjelm
2017-11-02 03:54:34 UTC
Permalink
This is a known issue when using osc/pt2pt. The only way to get progress is to enable (it is not on by default) it at the network level (btl). How this is done depends on the underlying transport.

-Nathan
All,
I came across what I consider another issue regarding progress in Open MPI: consider one process (P1) polling locally on a regular window (W1) for a local value to change (using MPI_Win_lock+MPI_Get+MPI_Win_unlock) while a second process (P2) tries to read from a memory location in a dynamic window (W2) on process P1 (using MPI_Rget+MPI_Wait, other combinations affected as well). P2 will later update the memory location waited on by P1. However, the read on the dynamic window stalls as the (local) read on W1 on P1 does not trigger progress on the dynamic window W2, causing the application to deadlock.
It is my understanding that process P1 should guarantee progress on any communication it is involved in, irregardless of the window or window type, and thus the communication should succeed. Is this assumption correct? Or is P1 required to access W2 as well to ensure progress? I can trigger progress on W2 on P1 by adding a call to MPI_Iprobe but that seems like a hack to me. Also, if both W1 and W2 are regular (allocated) windows the communication succeeds.
I am attaching a small reproducer, tested with Open MPI release 3.0.0 on a single GNU/Linux node (Linux Mint 18.2, gcc 5.4.1, Linux 4.10.0-38-generic).
Many thanks in advance!
Joseph
--
Dipl.-Inf. Joseph Schuchart
High Performance Computing Center Stuttgart (HLRS)
Nobelstr. 19
D-70569 Stuttgart
Tel.: +49(0)711-68565890
Fax: +49(0)711-6856832
<ompi_dynamicwin_hang.c>_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Nathan Hjelm
2017-11-02 04:04:33 UTC
Permalink
Hmm, though I thought we also make calls to opal_progress () in your case (calling MPI_Win_lock on self). Open a bug on github and I will double-check.
Post by Nathan Hjelm
This is a known issue when using osc/pt2pt. The only way to get progress is to enable (it is not on by default) it at the network level (btl). How this is done depends on the underlying transport.
-Nathan
All,
I came across what I consider another issue regarding progress in Open MPI: consider one process (P1) polling locally on a regular window (W1) for a local value to change (using MPI_Win_lock+MPI_Get+MPI_Win_unlock) while a second process (P2) tries to read from a memory location in a dynamic window (W2) on process P1 (using MPI_Rget+MPI_Wait, other combinations affected as well). P2 will later update the memory location waited on by P1. However, the read on the dynamic window stalls as the (local) read on W1 on P1 does not trigger progress on the dynamic window W2, causing the application to deadlock.
It is my understanding that process P1 should guarantee progress on any communication it is involved in, irregardless of the window or window type, and thus the communication should succeed. Is this assumption correct? Or is P1 required to access W2 as well to ensure progress? I can trigger progress on W2 on P1 by adding a call to MPI_Iprobe but that seems like a hack to me. Also, if both W1 and W2 are regular (allocated) windows the communication succeeds.
I am attaching a small reproducer, tested with Open MPI release 3.0.0 on a single GNU/Linux node (Linux Mint 18.2, gcc 5.4.1, Linux 4.10.0-38-generic).
Many thanks in advance!
Joseph
--
Dipl.-Inf. Joseph Schuchart
High Performance Computing Center Stuttgart (HLRS)
Nobelstr. 19
D-70569 Stuttgart
Tel.: +49(0)711-68565890
Fax: +49(0)711-6856832
<ompi_dynamicwin_hang.c>_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Joseph Schuchart
2017-11-02 04:19:25 UTC
Permalink
Nathan,

Thank you for your reply. I opened an issue:
https://github.com/open-mpi/ompi/issues/4434

Thanks,
Joseph
Post by Nathan Hjelm
Hmm, though I thought we also make calls to opal_progress () in your case (calling MPI_Win_lock on self). Open a bug on github and I will double-check.
Post by Nathan Hjelm
This is a known issue when using osc/pt2pt. The only way to get progress is to enable (it is not on by default) it at the network level (btl). How this is done depends on the underlying transport.
-Nathan
All,
I came across what I consider another issue regarding progress in Open MPI: consider one process (P1) polling locally on a regular window (W1) for a local value to change (using MPI_Win_lock+MPI_Get+MPI_Win_unlock) while a second process (P2) tries to read from a memory location in a dynamic window (W2) on process P1 (using MPI_Rget+MPI_Wait, other combinations affected as well). P2 will later update the memory location waited on by P1. However, the read on the dynamic window stalls as the (local) read on W1 on P1 does not trigger progress on the dynamic window W2, causing the application to deadlock.
It is my understanding that process P1 should guarantee progress on any communication it is involved in, irregardless of the window or window type, and thus the communication should succeed. Is this assumption correct? Or is P1 required to access W2 as well to ensure progress? I can trigger progress on W2 on P1 by adding a call to MPI_Iprobe but that seems like a hack to me. Also, if both W1 and W2 are regular (allocated) windows the communication succeeds.
I am attaching a small reproducer, tested with Open MPI release 3.0.0 on a single GNU/Linux node (Linux Mint 18.2, gcc 5.4.1, Linux 4.10.0-38-generic).
Many thanks in advance!
Joseph
--
Dipl.-Inf. Joseph Schuchart
High Performance Computing Center Stuttgart (HLRS)
Nobelstr. 19
D-70569 Stuttgart
Tel.: +49(0)711-68565890
Fax: +49(0)711-6856832
<ompi_dynamicwin_hang.c>_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
--
Dipl.-Inf. Joseph Schuchart
High Performance Computing Center Stuttgart (HLRS)
Nobelstr. 19
D-70569 Stuttgart

Tel.: +49(0)711-68565890
Fax: +49(0)711-6856832
E-Mail: ***@hlrs.de
Loading...