[OMPI users] Passive target sync. support

Discussion:

Sebastian Rinke

2017-04-03 14:34:16 UTC

Dear all,

I’m using passive target sync. in my code and would like to
know how well it is supported in Open MPI.

In particular, the code is some sort of particle tree code that uses a distributed tree and every rank
gets non-local tree nodes that are needed for its own computation from other ranks
on demand, i.e.:

Win_lock(target)

Get()
Get()
…
Get()

(up to 8 Gets)

Win_unlock(target)

After closing the access epoch with Win_unlock(target),
the rank looks at the nodes that it got and decides if it needs to get
more non-local nodes in the same fashion.

Unfortunately, this implementation blocks until the access epoch is completed for one particle.
As every rank needs to do the same for several particles, it would be better
to use Rget and start processing other particles in the meantime already.
From time to time the pending Rgets are then checked for completion and
the corresponding particle can progress.

My questions are:

1) Does Get and Rget use network hardware support on Infiniband (IB) for contiguous data?

2) How is RMA progress achieved for IB? Is there a progress thread option available?

3) If there is no progress thread option, would it be useful to use MPI_THREAD_MULTIPLE
and have a pthread testing on a request that will not be satisfied?
Would this be a reasonable option to ensure progress in MPI?

E.g.:
while (1)
MPI_Test()

Thank you for your help,
Sebastian

Nathan Hjelm

2017-04-03 16:02:15 UTC

Permalink

On Apr 03, 2017, at 08:36 AM, Sebastian Rinke <***@cs.tu-darmstadt.de> wrote:

Dear all,

Iâm using passive target sync. in my code and would like to
know how well it is supported in Open MPI.

In particular, the code is some sort of particle tree code that uses a distributed tree and every rank
gets non-local tree nodes that are needed for its own computation from other ranks
on demand, i.e.:

Win_lock(target)

Get()
Get()
âŠ
Get()

(up to 8 Gets)

Win_unlock(target)

After closing the access epoch with Win_unlock(target),
the rank looks at the nodes that it got and decides if it needs to get
more non-local nodes in the same fashion.

Unfortunately, this implementation blocks until the access epoch is completed for one particle.
As every rank needs to do the same for several particles, it would be better
to use Rget and start processing other particles in the meantime already.
From time to time the pending Rgets are then checked for completion and
the corresponding particle can progress.

My questions are:

1) Does Get and Rget use network hardware support on Infiniband (IB) for contiguous data?

In Open MPI v2.0.0 and newer only. Open MPI v1.10.x and older will always use the two-sided implementation which may or may not use the hardware put/get support.
Â

2) How is RMA progress achieved for IB? Is there a progress thread option available?

Progress threads are generally not needed for progressing RMA with Open MPI v2.0.0+. The only exception is when we have to queue up the operation (which may be the case with get). You can get origin-side progress by making another RMA call or by waiting on an operation initiated with on of the request-based calls.

If you want to progress each get independently you should use Rget.
Â

3) If there is no progress thread option, would it be useful to use MPI_THREAD_MULTIPLE
and have a pthread testing on a request that will not be satisfied?
Would this be a reasonable option to ensure progress in MPI?

E.g.:
while (1)
MPI_Test()

This will get you progress but isn't possible with Open MPI v1.10.x and older. MPI_THREAD_MULTIPLE is only really supported from v2.0.0.
Â
-Nathan

Sebastian Rinke

2017-04-03 21:01:00 UTC

Permalink

Thank you very much for the quick response!

Do I need to configure with certain flags to enable the
hardware put/get support?

Sebastian

Post by Nathan Hjelm

Post by Sebastian Rinke
Dear all,
I’m using passive target sync. in my code and would like to
know how well it is supported in Open MPI.
In particular, the code is some sort of particle tree code that uses a distributed tree and every rank
gets non-local tree nodes that are needed for its own computation from other ranks
Win_lock(target)
Get()
Get()
…
Get()
(up to 8 Gets)
Win_unlock(target)
After closing the access epoch with Win_unlock(target),
the rank looks at the nodes that it got and decides if it needs to get
more non-local nodes in the same fashion.
Unfortunately, this implementation blocks until the access epoch is completed for one particle.
As every rank needs to do the same for several particles, it would be better
to use Rget and start processing other particles in the meantime already.
From time to time the pending Rgets are then checked for completion and
the corresponding particle can progress.
1) Does Get and Rget use network hardware support on Infiniband (IB) for contiguous data?

In Open MPI v2.0.0 and newer only. Open MPI v1.10.x and older will always use the two-sided implementation which may or may not use the hardware put/get support.

Post by Sebastian Rinke
2) How is RMA progress achieved for IB? Is there a progress thread option available?

Progress threads are generally not needed for progressing RMA with Open MPI v2.0.0+. The only exception is when we have to queue up the operation (which may be the case with get). You can get origin-side progress by making another RMA call or by waiting on an operation initiated with on of the request-based calls.
If you want to progress each get independently you should use Rget.

Post by Sebastian Rinke
3) If there is no progress thread option, would it be useful to use MPI_THREAD_MULTIPLE
and have a pthread testing on a request that will not be satisfied?
Would this be a reasonable option to ensure progress in MPI?
while (1)
MPI_Test()

This will get you progress but isn't possible with Open MPI v1.10.x and older. MPI_THREAD_MULTIPLE is only really supported from v2.0.0.
-Nathan
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Nathan Hjelm

2017-04-03 21:23:23 UTC

Permalink

No, support is enabled by default. You can check whether it is working by running with --mca osc ^pt2pt . This will disable the two-sided implementation.

-Nathan

On Apr 03, 2017, at 03:02 PM, Sebastian Rinke <***@cs.tu-darmstadt.de> wrote:

Thank you very much for the quick response!

Do I need to configure with certain flags to enable the
hardware put/get support?

Sebastian

On 03 Apr 2017, at 18:02, Nathan Hjelm <***@me.com> wrote:

On Apr 03, 2017, at 08:36 AM, Sebastian Rinke <***@cs.tu-darmstadt.de> wrote:

Dear all,

Iâm using passive target sync. in my code and would like to
know how well it is supported in Open MPI.

In particular, the code is some sort of particle tree code that uses a distributed tree and every rank
gets non-local tree nodes that are needed for its own computation from other ranks
on demand, i.e.:

Win_lock(target)

Get()
Get()
âŠ
Get()

(up to 8 Gets)

Win_unlock(target)

After closing the access epoch with Win_unlock(target),
the rank looks at the nodes that it got and decides if it needs to get
more non-local nodes in the same fashion.

Unfortunately, this implementation blocks until the access epoch is completed for one particle.
As every rank needs to do the same for several particles, it would be better
to use Rget and start processing other particles in the meantime already.
From time to time the pending Rgets are then checked for completion and
the corresponding particle can progress.

My questions are:

1) Does Get and Rget use network hardware support on Infiniband (IB) for contiguous data?

In Open MPI v2.0.0 and newer only. Open MPI v1.10.x and older will always use the two-sided implementation which may or may not use the hardware put/get support.

2) How is RMA progress achieved for IB? Is there a progress thread option available?

Progress threads are generally not needed for progressing RMA with Open MPI v2.0.0+. The only exception is when we have to queue up the operation (which may be the case with get). You can get origin-side progress by making another RMA call or by waiting on an operation initiated with on of the request-based calls.

If you want to progress each get independently you should use Rget.

3) If there is no progress thread option, would it be useful to use MPI_THREAD_MULTIPLE
and have a pthread testing on a request that will not be satisfied?
Would this be a reasonable option to ensure progress in MPI?

E.g.:
while (1)
MPI_Test()

This will get you progress but isn't possible with Open MPI v1.10.x and older. MPI_THREAD_MULTIPLE is only really supported from v2.0.0.

-Nathan

Sebastian Rinke

2017-04-04 01:14:08 UTC

Permalink

Thanks!
Sebastian

Post by Nathan Hjelm
No, support is enabled by default. You can check whether it is working by running with --mca osc ^pt2pt . This will disable the two-sided implementation.
-Nathan

Post by Sebastian Rinke
Thank you very much for the quick response!
Do I need to configure with certain flags to enable the
hardware put/get support?
Sebastian

Post by Nathan Hjelm

Post by Sebastian Rinke
Dear all,
Im using passive target sync. in my code and would like to
know how well it is supported in Open MPI.
In particular, the code is some sort of particle tree code that uses a distributed tree and every rank
gets non-local tree nodes that are needed for its own computation from other ranks
Win_lock(target)
Get()
Get()

Get()
(up to 8 Gets)
Win_unlock(target)
After closing the access epoch with Win_unlock(target),
the rank looks at the nodes that it got and decides if it needs to get
more non-local nodes in the same fashion.
Unfortunately, this implementation blocks until the access epoch is completed for one particle.
As every rank needs to do the same for several particles, it would be better
to use Rget and start processing other particles in the meantime already.
From time to time the pending Rgets are then checked for completion and
the corresponding particle can progress.
1) Does Get and Rget use network hardware support on Infiniband (IB) for contiguous data?

In Open MPI v2.0.0 and newer only. Open MPI v1.10.x and older will always use the two-sided implementation which may or may not use the hardware put/get support.

Post by Sebastian Rinke
2) How is RMA progress achieved for IB? Is there a progress thread option available?

_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users