Discussion:
[OMPI users] OpenMPI 2.1.3 hangs on Gather/Gatherv at large scale
Sasso, John (GE Digital, consultant)
2018-04-13 19:36:12 UTC
Permalink
I went and built Intel MPI Benchmarks 2018 as well as OMB 5.4.1 w/ Intel 18 compiler suite + OpenMPI 2.1.3; I built similar but purely with the MPI that comes with Intel 18.

What I found is that for both benchmark suites, they hang when at a scale of at least 640 ranks, and in particular with the Gather/Gatherv collective for message size >= 8KB, when the OpenMPI 2.1.3 version is used. If I use the Intel 18 MPI version, both run fine up to 1024 ranks (the max I did). It is possible that for both benchmark suites, they could have hung on some other collectives, but I am wondering if anyone has encountered similar. These benchmarks were done over the same InfiniBand fabric and computer hosts.

--john

Loading...