Discussion:
[OMPI users] RoCE device performance with large message size
Brendan Myers
2017-10-10 19:29:00 UTC
Permalink
Hello All,

I have a RoCE interoperability event starting next week and I was wondering
if anyone had any ideas to help me with a new vendor I am trying to help get
ready.

I am using:

* Open MPI 2.1

* Intel MPI Benchmarks 2018

* OFED 3.18 (requirement from vendor)

* SLES 11 SP3 (requirement from vendor)



The problem seems to be that the device does not handle larger message sizes
well and I am sure they will be working on this but I am hoping there may be
a way to complete an IMB run with some Open MPI parameter tweaking.

Sample of IMB output from a Sendrecv benchmark:



262144 160 131.07 132.24 131.80 3964.56

524288 80 277.42 284.57 281.57
3684.71

1048576 40 461.16 474.83 470.02
4416.59

2097152 3 1112.15 4294965.49 2147851.04
0.98

4194304 2 2815.25 8589929.73 3222731.54
0.98



In red text is what looks like the problematic results. This happens on many
of the benchmarks at larger message sizes and causes either a major slowdown
or it causes the job to abort with error:



The InfiniBand retry count between two MPI processes has been exceeded.



If anyone has any thoughts on how I can complete the benchmarks without the
job aborting I would appreciate it. If anyone has ideas as to why a RoCE
device might show this issue I would take any information on offer. If more
data is required please let me know what is relevant.





Thank you,

Brendan T. W. Myers
Jeff Squyres (jsquyres)
2017-10-10 20:32:59 UTC
Permalink
Probably want to check to make sure that lossless ethernet is enabled everywhere (that's a common problem I've seen); otherwise, you end up in timeouts and retransmissions.

Check with your vendor on how to do layer-0 diagnostics, etc.

Also, if this is a new vendor, they should probably try running this themselves -- IMB is fairly abusive to the network stack and turns up many bugs in lower layers (drivers, firmware), etc.
Post by Brendan Myers
Hello All,
I have a RoCE interoperability event starting next week and I was wondering if anyone had any ideas to help me with a new vendor I am trying to help get ready.
· Open MPI 2.1
· Intel MPI Benchmarks 2018
· OFED 3.18 (requirement from vendor)
· SLES 11 SP3 (requirement from vendor)
The problem seems to be that the device does not handle larger message sizes well and I am sure they will be working on this but I am hoping there may be a way to complete an IMB run with some Open MPI parameter tweaking.
262144 160 131.07 132.24 131.80 3964.56
524288 80 277.42 284.57 281.57 3684.71
1048576 40 461.16 474.83 470.02 4416.59
2097152 3 1112.15 4294965.49 2147851.04 0.98
4194304 2 2815.25 8589929.73 3222731.54 0.98
The InfiniBand retry count between two MPI processes has been exceeded.
If anyone has any thoughts on how I can complete the benchmarks without the job aborting I would appreciate it. If anyone has ideas as to why a RoCE device might show this issue I would take any information on offer. If more data is required please let me know what is relevant.
Thank you,
Brendan T. W. Myers
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
--
Jeff Squyres
***@cisco.com

Loading...