Discussion:
[OMPI users] OpenMPI + InfiniBand
Sergei Hrushev
2016-10-28 09:48:58 UTC
Permalink
Hello, All !

We have a problem with OpenMPI version 1.10.2 on a cluster with newly
installed Mellanox InfiniBand adapters.
OpenMPI was re-configured and re-compiled using: --with-verbs
--with-verbs-libdir=/usr/lib

And our test MPI task returns proper results but it seems OpenMPI continues
to use existing 1Gbit Ethernet network instead of InfiniBand.

An output file contains these lines:
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port. As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

Local host: node1
Local device: mlx4_0
Local port: 1
CPCs attempted: rdmacm, udcm
--------------------------------------------------------------------------

InfiniBand network itself seems to be working:

$ ibstat mlx4_0 shows:

CA 'mlx4_0'
CA type: MT4099
Number of ports: 1
Firmware version: 2.35.5100
Hardware version: 0
Node GUID: 0x7cfe900300bddec0
System image GUID: 0x7cfe900300bddec3
Port 1:
State: Active
Physical state: LinkUp
Rate: 56
Base lid: 3
LMC: 0
SM lid: 3
Capability mask: 0x0251486a
Port GUID: 0x7cfe900300bddec1
Link layer: InfiniBand

ibping also works.
ibnetdiscover shows the correct topology of IB network.

Cluster works under Ubuntu 16.04 and we use drivers from OS (OFED is not
installed).

Is it enough for OpenMPI to have RDMA only or IPoIB should also be
installed?
What else can be checked?

Thanks a lot for any help!
John Hearns via users
2016-10-28 10:21:48 UTC
Permalink
Sergei, what does the command "ibv_devinfo" return please?

I had a recent case like this, but on Qlogic hardware.
Sorry if I am mixing things up.
Post by Sergei Hrushev
Hello, All !
We have a problem with OpenMPI version 1.10.2 on a cluster with newly
installed Mellanox InfiniBand adapters.
OpenMPI was re-configured and re-compiled using: --with-verbs
--with-verbs-libdir=/usr/lib
And our test MPI task returns proper results but it seems OpenMPI
continues to use existing 1Gbit Ethernet network instead of InfiniBand.
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port. As such, the openib BTL (OpenFabrics
support) will be disabled for this port.
Local host: node1
Local device: mlx4_0
Local port: 1
CPCs attempted: rdmacm, udcm
--------------------------------------------------------------------------
CA 'mlx4_0'
CA type: MT4099
Number of ports: 1
Firmware version: 2.35.5100
Hardware version: 0
Node GUID: 0x7cfe900300bddec0
System image GUID: 0x7cfe900300bddec3
State: Active
Physical state: LinkUp
Rate: 56
Base lid: 3
LMC: 0
SM lid: 3
Capability mask: 0x0251486a
Port GUID: 0x7cfe900300bddec1
Link layer: InfiniBand
ibping also works.
ibnetdiscover shows the correct topology of IB network.
Cluster works under Ubuntu 16.04 and we use drivers from OS (OFED is not
installed).
Is it enough for OpenMPI to have RDMA only or IPoIB should also be
installed?
What else can be checked?
Thanks a lot for any help!
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Sergei Hrushev
2016-10-28 10:28:35 UTC
Permalink
Post by John Hearns via users
Sergei, what does the command "ibv_devinfo" return please?
I had a recent case like this, but on Qlogic hardware.
Sorry if I am mixing things up.
An output of ibv_devinfo from cluster's 1st node is:

$ ibv_devinfo -d mlx4_0
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.35.5100
node_guid: 7cfe:9003:00bd:dec0
sys_image_guid: 7cfe:9003:00bd:dec3
vendor_id: 0x02c9
vendor_part_id: 4099
hw_ver: 0x0
board_id: MT_1100120019
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 3
port_lid: 3
port_lmc: 0x00
link_layer: InfiniBand
John Hearns via users
2016-10-28 10:46:37 UTC
Permalink
Sorry - shoot down my idea. Over to someone else (me hides head in shame)
Post by John Hearns via users
Sergei, what does the command "ibv_devinfo" return please?
Post by John Hearns via users
I had a recent case like this, but on Qlogic hardware.
Sorry if I am mixing things up.
$ ibv_devinfo -d mlx4_0
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.35.5100
node_guid: 7cfe:9003:00bd:dec0
sys_image_guid: 7cfe:9003:00bd:dec3
vendor_id: 0x02c9
vendor_part_id: 4099
hw_ver: 0x0
board_id: MT_1100120019
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 3
port_lid: 3
port_lmc: 0x00
link_layer: InfiniBand
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Gilles Gouaillardet
2016-10-28 11:11:04 UTC
Permalink
Sergei,

is there any reason why you configure with --with-verbs-libdir=/usr/lib ?
as far as i understand, --with-verbs should be enough, and /usr/lib
nor /usr/local/lib should ever be used in the configure command line
(and btw, are you running on a 32 bits system ? should the 64 bits
libs be in /usr/lib64 ?)

make sure you
ulimit -l unlimited
before you invoke mpirun, and this value is correctly propagated to
the remote nodes
/* the failure could be a side effect of a low ulimit -l */

Cheers,

Gilles
Post by Sergei Hrushev
Hello, All !
We have a problem with OpenMPI version 1.10.2 on a cluster with newly
installed Mellanox InfiniBand adapters.
OpenMPI was re-configured and re-compiled using: --with-verbs
--with-verbs-libdir=/usr/lib
And our test MPI task returns proper results but it seems OpenMPI continues
to use existing 1Gbit Ethernet network instead of InfiniBand.
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port. As such, the openib BTL (OpenFabrics
support) will be disabled for this port.
Local host: node1
Local device: mlx4_0
Local port: 1
CPCs attempted: rdmacm, udcm
--------------------------------------------------------------------------
CA 'mlx4_0'
CA type: MT4099
Number of ports: 1
Firmware version: 2.35.5100
Hardware version: 0
Node GUID: 0x7cfe900300bddec0
System image GUID: 0x7cfe900300bddec3
State: Active
Physical state: LinkUp
Rate: 56
Base lid: 3
LMC: 0
SM lid: 3
Capability mask: 0x0251486a
Port GUID: 0x7cfe900300bddec1
Link layer: InfiniBand
ibping also works.
ibnetdiscover shows the correct topology of IB network.
Cluster works under Ubuntu 16.04 and we use drivers from OS (OFED is not
installed).
Is it enough for OpenMPI to have RDMA only or IPoIB should also be
installed?
What else can be checked?
Thanks a lot for any help!
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Sergei Hrushev
2016-10-31 02:35:55 UTC
Permalink
Hi Gilles!
Post by Gilles Gouaillardet
is there any reason why you configure with --with-verbs-libdir=/usr/lib ?
as far as i understand, --with-verbs should be enough, and /usr/lib
nor /usr/local/lib should ever be used in the configure command line
(and btw, are you running on a 32 bits system ? should the 64 bits
libs be in /usr/lib64 ?)
I'm on Ubuntu 16.04 x86_64 and it has /usr/lib and /usr/lib32.
As I understand /usr/lib is assumed to be /usr/lib64.
So the library path is correct.
Post by Gilles Gouaillardet
make sure you
ulimit -l unlimited
before you invoke mpirun, and this value is correctly propagated to
the remote nodes
/* the failure could be a side effect of a low ulimit -l */
Yes, ulimit -l returns "unlimited".
So this is also correct.

Best regards,
Sergei.
Jeff Squyres (jsquyres)
2016-10-31 12:51:02 UTC
Permalink
What does "ompi_info | grep openib" show?

Additionally, Mellanox provides alternate support through their MXM libraries, if you want to try that.

If that shows that you have the openib BTL plugin loaded, try running with "mpirun --mca btl_base_verbose 100 ..." That will provide additional output about why / why not each point-to-point plugin is chosen.
Post by Sergei Hrushev
Hi Gilles!
is there any reason why you configure with --with-verbs-libdir=/usr/lib ?
as far as i understand, --with-verbs should be enough, and /usr/lib
nor /usr/local/lib should ever be used in the configure command line
(and btw, are you running on a 32 bits system ? should the 64 bits
libs be in /usr/lib64 ?)
I'm on Ubuntu 16.04 x86_64 and it has /usr/lib and /usr/lib32.
As I understand /usr/lib is assumed to be /usr/lib64.
So the library path is correct.
make sure you
ulimit -l unlimited
before you invoke mpirun, and this value is correctly propagated to
the remote nodes
/* the failure could be a side effect of a low ulimit -l */
Yes, ulimit -l returns "unlimited".
So this is also correct.
Best regards,
Sergei.
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
--
Jeff Squyres
***@cisco.com
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Sergei Hrushev
2016-11-01 06:40:24 UTC
Permalink
Hi Jeff !

What does "ompi_info | grep openib" show?
$ ompi_info | grep openib
MCA btl: openib (MCA v2.0.0, API v2.0.0, Component v1.10.2)

Additionally, Mellanox provides alternate support through their MXM
Post by Jeff Squyres (jsquyres)
libraries, if you want to try that.
Yes, I know.
But we already have a hybrid cluster with OpenMPI, OpenMP, CUDA, Torque and
many other libraries installed,
and because it works perfect over Ethernet interconnect my idea was to add
InfiniBand support with minimum
of changes. Mainly because we already have some custom-written software for
OpenMPI.
Post by Jeff Squyres (jsquyres)
If that shows that you have the openib BTL plugin loaded, try running with
"mpirun --mca btl_base_verbose 100 ..." That will provide additional
output about why / why not each point-to-point plugin is chosen.
Yes, I tried to get this info already.
And I saw in log that rdmacm wants IP address on port.
So my question in topc start message was:

Is it enough for OpenMPI to have RDMA only or IPoIB should also be
installed?

The mpirun output is:

[node1:02674] mca: base: components_register: registering btl components
[node1:02674] mca: base: components_register: found loaded component openib
[node1:02674] mca: base: components_register: component openib register
function successful
[node1:02674] mca: base: components_register: found loaded component sm
[node1:02674] mca: base: components_register: component sm register
function successful
[node1:02674] mca: base: components_register: found loaded component self
[node1:02674] mca: base: components_register: component self register
function successful
[node1:02674] mca: base: components_open: opening btl components
[node1:02674] mca: base: components_open: found loaded component openib
[node1:02674] mca: base: components_open: component openib open function
successful
[node1:02674] mca: base: components_open: found loaded component sm
[node1:02674] mca: base: components_open: component sm open function
successful
[node1:02674] mca: base: components_open: found loaded component self
[node1:02674] mca: base: components_open: component self open function
successful
[node1:02674] select: initializing btl component openib
[node1:02674] openib BTL: rdmacm IP address not found on port
[node1:02674] openib BTL: rdmacm CPC unavailable for use on mlx4_0:1;
skipped
[node1:02674] select: init of component openib returned failure
[node1:02674] mca: base: close: component openib closed
[node1:02674] mca: base: close: unloading component openib
[node1:02674] select: initializing btl component sm
[node1:02674] select: init of component sm returned failure
[node1:02674] mca: base: close: component sm closed
[node1:02674] mca: base: close: unloading component sm
[node1:02674] select: initializing btl component self
[node1:02674] select: init of component self returned success
[node1:02674] mca: bml: Using self btl to [[16642,1],0] on node node1
[node1:02674] mca: base: close: component self closed
[node1:02674] mca: base: close: unloading component self

Best regards,
Sergei.
John Hearns via users
2016-11-01 12:06:20 UTC
Permalink
Segei,
can you run :

ibhosts

ibstat

ibdiagnet


Lord help me for being so naive, but do you have a subnet manager running?
Post by Sergei Hrushev
Hi Jeff !
What does "ompi_info | grep openib" show?
$ ompi_info | grep openib
MCA btl: openib (MCA v2.0.0, API v2.0.0, Component v1.10.2)
Additionally, Mellanox provides alternate support through their MXM
Post by Jeff Squyres (jsquyres)
libraries, if you want to try that.
Yes, I know.
But we already have a hybrid cluster with OpenMPI, OpenMP, CUDA, Torque
and many other libraries installed,
and because it works perfect over Ethernet interconnect my idea was to add
InfiniBand support with minimum
of changes. Mainly because we already have some custom-written software
for OpenMPI.
Post by Jeff Squyres (jsquyres)
If that shows that you have the openib BTL plugin loaded, try running
with "mpirun --mca btl_base_verbose 100 ..." That will provide additional
output about why / why not each point-to-point plugin is chosen.
Yes, I tried to get this info already.
And I saw in log that rdmacm wants IP address on port.
Is it enough for OpenMPI to have RDMA only or IPoIB should also be
installed?
[node1:02674] mca: base: components_register: registering btl components
[node1:02674] mca: base: components_register: found loaded component openib
[node1:02674] mca: base: components_register: component openib register
function successful
[node1:02674] mca: base: components_register: found loaded component sm
[node1:02674] mca: base: components_register: component sm register
function successful
[node1:02674] mca: base: components_register: found loaded component self
[node1:02674] mca: base: components_register: component self register
function successful
[node1:02674] mca: base: components_open: opening btl components
[node1:02674] mca: base: components_open: found loaded component openib
[node1:02674] mca: base: components_open: component openib open function
successful
[node1:02674] mca: base: components_open: found loaded component sm
[node1:02674] mca: base: components_open: component sm open function
successful
[node1:02674] mca: base: components_open: found loaded component self
[node1:02674] mca: base: components_open: component self open function
successful
[node1:02674] select: initializing btl component openib
[node1:02674] openib BTL: rdmacm IP address not found on port
[node1:02674] openib BTL: rdmacm CPC unavailable for use on mlx4_0:1;
skipped
[node1:02674] select: init of component openib returned failure
[node1:02674] mca: base: close: component openib closed
[node1:02674] mca: base: close: unloading component openib
[node1:02674] select: initializing btl component sm
[node1:02674] select: init of component sm returned failure
[node1:02674] mca: base: close: component sm closed
[node1:02674] mca: base: close: unloading component sm
[node1:02674] select: initializing btl component self
[node1:02674] select: init of component self returned success
[node1:02674] mca: bml: Using self btl to [[16642,1],0] on node node1
[node1:02674] mca: base: close: component self closed
[node1:02674] mca: base: close: unloading component self
Best regards,
Sergei.
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Sergei Hrushev
2016-11-01 12:55:02 UTC
Permalink
Hi John !

I'm experimenting now with a head node and single compute node, all the
rest of cluster is switched off.
Post by John Hearns via users
ibhosts
# ibhosts
Ca : 0x7cfe900300bddec0 ports 1 "MT25408 ConnectX Mellanox
Technologies"
Ca : 0xe41d2d030050caf0 ports 1 "MT25408 ConnectX Mellanox
Technologies"
Post by John Hearns via users
ibstat
# ibstat
CA 'mlx4_0'
CA type: MT4099
Number of ports: 1
Firmware version: 2.35.5100
Hardware version: 0
Node GUID: 0xe41d2d030050caf0
System image GUID: 0xe41d2d030050caf3
Port 1:
State: Active
Physical state: LinkUp
Rate: 56
Base lid: 1
LMC: 0
SM lid: 3
Capability mask: 0x0251486a
Port GUID: 0xe41d2d030050caf1
Link layer: InfiniBand
ibdiagnet
# ibdiagnet
# cat ibdiagnet.log
-W- Topology file is not specified.
Reports regarding cluster links will use direct routes.
-I- Using port 1 as the local port.
-I- Discovering ... 3 nodes (1 Switches & 2 CA-s) discovered.


-I---------------------------------------------------
-I- Bad Guids/LIDs Info
-I---------------------------------------------------
-I- No bad Guids were found

-I---------------------------------------------------
-I- Links With Logical State = INIT
-I---------------------------------------------------
-I- No bad Links (with logical state = INIT) were found

-I---------------------------------------------------
-I- General Device Info
-I---------------------------------------------------

-I---------------------------------------------------
-I- PM Counters Info
-I---------------------------------------------------
-I- No illegal PM counters values were found

-I---------------------------------------------------
-I- Fabric Partitions Report (see ibdiagnet.pkey for a full hosts list)
-I---------------------------------------------------
-I- PKey:0x7fff Hosts:2 full:2 limited:0

-I---------------------------------------------------
-I- IPoIB Subnets Check
-I---------------------------------------------------
-I- Subnet: IPv4 PKey:0x7fff QKey:0x00000b1b MTU:2048Byte rate:10Gbps
SL:0x00
-W- Suboptimal rate for group. Lowest member rate:40Gbps > group-rate:10Gbps

-I---------------------------------------------------
-I- Bad Links Info
-I- No bad link were found
-I---------------------------------------------------

-I- Done. Run time was 2 seconds.
Post by John Hearns via users
Lord help me for being so naive, but do you have a subnet manager running?
It seems, yes (I even have standby):

# service --status-all | grep opensm
[ + ] opensm

# cat ibdiagnet.sm

ibdiagnet fabric SM report

SM - master
MT25408/P1 lid=0x0003 guid=0x7cfe900300bddec1 dev=4099 priority:0

SM - standby
The Local Device : MT25408/P1 lid=0x0001 guid=0xe41d2d030050caf1
dev=4099 priority:0

Best regards,
Sergei.
Jeff Squyres (jsquyres)
2016-11-01 12:58:04 UTC
Permalink
Post by Sergei Hrushev
Yes, I tried to get this info already.
And I saw in log that rdmacm wants IP address on port.
Is it enough for OpenMPI to have RDMA only or IPoIB should also be
installed?
Sorry; I joined the thread late.

I haven't worked with InfiniBand for years, but I do believe that yes: you need IPoIB enabled on your IB devices to get the RDMA CM support to work.
--
Jeff Squyres
***@cisco.com
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Sergei Hrushev
2016-11-01 13:49:08 UTC
Permalink
Post by Jeff Squyres (jsquyres)
I haven't worked with InfiniBand for years, but I do believe that yes: you
need IPoIB enabled on your IB devices to get the RDMA CM support to work.
Yes, I saw too that RDMA CM requires IP, but in my case OpenMPI reports
that UD CM can't be used too.
Is it also require IPoIB?

Is it possible to read more about UD CM somewhere?
Jeff Squyres (jsquyres)
2016-11-01 13:57:37 UTC
Permalink
I actually just filed a Github issue to ask this exact question:

https://github.com/open-mpi/ompi/issues/2326
Post by Jeff Squyres (jsquyres)
I haven't worked with InfiniBand for years, but I do believe that yes: you need IPoIB enabled on your IB devices to get the RDMA CM support to work.
Yes, I saw too that RDMA CM requires IP, but in my case OpenMPI reports that UD CM can't be used too.
Is it also require IPoIB?
Is it possible to read more about UD CM somewhere?
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
--
Jeff Squyres
***@cisco.com
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Sergei Hrushev
2016-11-01 14:07:28 UTC
Permalink
Post by Jeff Squyres (jsquyres)
https://github.com/open-mpi/ompi/issues/2326
Good idea, thanks!
Sergei Hrushev
2016-10-31 02:26:35 UTC
Permalink
Post by John Hearns via users
Sorry - shoot down my idea. Over to someone else (me hides head in shame)
No problem, thanks for your try!
Nathan Hjelm
2016-11-01 20:37:22 UTC
Permalink
UDCM does not require IPoIB. It should be working for you. Can you build Open MPI with --enable-debug and run with -mca btl_base_verbose 100 and create a gist with the output.

-Nathan

On Nov 01, 2016, at 07:50 AM, Sergei Hrushev <***@gmail.com> wrote:


I haven't worked with InfiniBand for years, but I do believe that yes: you need IPoIB enabled on your IB devices to get the RDMA CM support to work.


Yes, I saw too that RDMA CM requires IP, but in my case OpenMPI reports that UD CM can't be used too.
Is it also require IPoIB?

Is it possible to read more about UD CM somewhere?
Sergei Hrushev
2016-11-02 07:46:38 UTC
Permalink
Hi Nathan!

UDCM does not require IPoIB. It should be working for you. Can you build
Post by Nathan Hjelm
Open MPI with --enable-debug and run with -mca btl_base_verbose 100 and
create a gist with the output.
Ok, done:

https://gist.github.com/hsa-online/30bb27a90bb7b225b233cc2af11b3942


Best regards,
Sergei.

Loading...