[OMPI users] Performance issues: 1.10.x vs 2.x

Discussion:

marcin.krotkiewski

2017-05-04 10:27:53 UTC

Hi, everyone,

I ran some bandwidth tests on two different systems with Mellanox IB
(FDR and EDR). I compiled the three supported versions of openmpi
(1.10.6, 2.0.2, 2.1.0) and measured the time it takes to send/receive
4MB arrays of doubles betweentwo hosts connected to the same IB switch.
MPI_Send/MPI_Recv were performed 1000 times, andthe table below gives
the average bandwidth obtained [MB/s]:

OpenMPI FDR EDR
1.10.6 6203.0 11271.1
2.0.2 5128.4 11948.0
2.1.0 5095.1 11947.2

openib btl was used to transfer the data. The resultsare puzzling: it
seems that something changed starting from version 2.x, and the FDR
system performs much worse than with the prior 1.10.x release. On the
EDR system I see the opposite (v2.x are better), but the difference is
not so dramatic.

Did anyone experience similar behavior? Is this due to OpenMPI, or
something else? The two systems run Centos (FDR:6.8, EDR:7.3), and
Mellanox OFED with a minor version difference.

I'd appreciate any thoughts.

Thanks a lot!

Marcin Krotkiewski

Paul Kapinos

2017-05-04 14:29:19 UTC

Permalink

Note that 2.x lost the memory hooks, cf. the thread
https://www.mail-archive.com/***@lists.open-mpi.org/msg00039.html

The numbers you have looks like 20% loss we also have seen with 4.x vs. 1.10.x
versions. Try the dirty hook with 'memalign', LD_PRELOAD this:

$ cat alignmalloc64.c
/* Dirk Schmidl (ds53448b), 01/2012 */
#include <malloc.h>
void* malloc(size_t size){
return memalign(64,size);
}

$ gcc -c -fPIC alignmalloc64.c
$ gcc -shared -Wl,-soname,$(LIBNAME64) -o $(LIBNAME64) alignmalloc64.o

The resultsare puzzling: it seems that something changed starting from version
2.x, and the FDR system performs much worse than with the prior 1.10.x release.

--
Dipl.-Inform. Paul Kapinos - High Performance Computing,
RWTH Aachen University, IT Center
Seffenter Weg 23, D 52074 Aachen (Germany)
Tel: +49 241/80-24915

marcin.krotkiewski

2017-05-05 10:10:14 UTC

Permalink

Thanks, Paul. That was useful! although in my case it was enough to
allocate my own arrays using posix_memalign. The internals of OpenMPI
did not play any role, which I guess is quite natural assuming OpenMPI
doesn't reallocate.

But since that worked, it means that 1.10.6 deals somehow better with
unaligned data. Anyone knows the reason for this?

Marcin

Post by Paul Kapinos
Note that 2.x lost the memory hooks, cf. the thread
The numbers you have looks like 20% loss we also have seen with 4.x
$ cat alignmalloc64.c
/* Dirk Schmidl (ds53448b), 01/2012 */
#include <malloc.h>
void* malloc(size_t size){
return memalign(64,size);
}
$ gcc -c -fPIC alignmalloc64.c
$ gcc -shared -Wl,-soname,$(LIBNAME64) -o $(LIBNAME64) alignmalloc64.o

The resultsare puzzling: it seems that something changed starting from version
2.x, and the FDR system performs much worse than with the prior 1.10.x release.

_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Paul Kapinos

2017-05-05 10:19:25 UTC

Permalink

in my case it was enough to allocate my own arrays using posix_memalign.

Be happy. This did not work for Fortran codes..

But since that worked, it means that 1.10.6 deals somehow better with unaligned
data. Anyone knows the reason for this?

In 1.10.x series there were 'memory hooks' - Open MPI did take some care abount
the alignment. This was removed in 2.x series, cf. the whole thread on my link.

--
Dipl.-Inform. Paul Kapinos - High Performance Computing,
RWTH Aachen University, IT Center
Seffenter Weg 23, D 52074 Aachen (Germany)
Tel: +49 241/80-24915