Discussion:
[OMPI users] OpenMPI 2.x + PGI + Mellanox OFED
Aaron Knister
2017-05-23 01:50:10 UTC
Permalink
Matt Thompson reported an issue over the summer that prevented one from
using OpenMPI when built with the PGI compilers and Mellanox OFED (At
least the 3.x series). The thread is here
https://www.mail-archive.com/***@lists.open-mpi.org/msg29698.html.
Information about how PGI is reacting with libibverbs is in a thread
here (https://www.pgroup.com/userforum/viewtopic.php?t=5249).

For those googling for a solution the error message you get at runtime
is this:

[borgr138][[16866,1],38][btl_openib_component.c:1648:init_one_device]
error obtaining device attributes for mlx5_0 errno says Success
[borgr137][[16866,1],4][btl_openib_component.c:1648:init_one_device]
error obtaining device attributes for mlx5_0 errno says Success
[borgr137][[16866,1],14][btl_openib_component.c:1648:init_one_device]
error obtaining device attributes for mlx5_0 errno says Success

To get this combination to work I tried disabling experimental verbs
through various mechanisms which I can't recall anymore but none of them
worked.

The solution I came up with, in case anyone else runs into this problem,
is this:

1. Build OpenMPI with PGI (e.g. CC=pgcc CXX=pgc++ F77=pgf77 FC=pgf90
./configure --with-verbs=/usr && make)
2. cd into opal/mca/btl/openib in the source/build directory
3. make clean
4. make CC="gcc -std=gnu99"
5. make install

It's not elegant but it appears to work.

-Aaron
--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
Loading...