Sergei Hrushev
2016-10-28 09:48:58 UTC
Hello, All !
We have a problem with OpenMPI version 1.10.2 on a cluster with newly
installed Mellanox InfiniBand adapters.
OpenMPI was re-configured and re-compiled using: --with-verbs
--with-verbs-libdir=/usr/lib
And our test MPI task returns proper results but it seems OpenMPI continues
to use existing 1Gbit Ethernet network instead of InfiniBand.
An output file contains these lines:
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port. As such, the openib BTL (OpenFabrics
support) will be disabled for this port.
Local host: node1
Local device: mlx4_0
Local port: 1
CPCs attempted: rdmacm, udcm
--------------------------------------------------------------------------
InfiniBand network itself seems to be working:
$ ibstat mlx4_0 shows:
CA 'mlx4_0'
CA type: MT4099
Number of ports: 1
Firmware version: 2.35.5100
Hardware version: 0
Node GUID: 0x7cfe900300bddec0
System image GUID: 0x7cfe900300bddec3
Port 1:
State: Active
Physical state: LinkUp
Rate: 56
Base lid: 3
LMC: 0
SM lid: 3
Capability mask: 0x0251486a
Port GUID: 0x7cfe900300bddec1
Link layer: InfiniBand
ibping also works.
ibnetdiscover shows the correct topology of IB network.
Cluster works under Ubuntu 16.04 and we use drivers from OS (OFED is not
installed).
Is it enough for OpenMPI to have RDMA only or IPoIB should also be
installed?
What else can be checked?
Thanks a lot for any help!
We have a problem with OpenMPI version 1.10.2 on a cluster with newly
installed Mellanox InfiniBand adapters.
OpenMPI was re-configured and re-compiled using: --with-verbs
--with-verbs-libdir=/usr/lib
And our test MPI task returns proper results but it seems OpenMPI continues
to use existing 1Gbit Ethernet network instead of InfiniBand.
An output file contains these lines:
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port. As such, the openib BTL (OpenFabrics
support) will be disabled for this port.
Local host: node1
Local device: mlx4_0
Local port: 1
CPCs attempted: rdmacm, udcm
--------------------------------------------------------------------------
InfiniBand network itself seems to be working:
$ ibstat mlx4_0 shows:
CA 'mlx4_0'
CA type: MT4099
Number of ports: 1
Firmware version: 2.35.5100
Hardware version: 0
Node GUID: 0x7cfe900300bddec0
System image GUID: 0x7cfe900300bddec3
Port 1:
State: Active
Physical state: LinkUp
Rate: 56
Base lid: 3
LMC: 0
SM lid: 3
Capability mask: 0x0251486a
Port GUID: 0x7cfe900300bddec1
Link layer: InfiniBand
ibping also works.
ibnetdiscover shows the correct topology of IB network.
Cluster works under Ubuntu 16.04 and we use drivers from OS (OFED is not
installed).
Is it enough for OpenMPI to have RDMA only or IPoIB should also be
installed?
What else can be checked?
Thanks a lot for any help!