Discussion:
[OMPI users] Q: Basic invoking of InfiniBand with OpenMPI
Boris M. Vulovic
2017-07-13 21:43:37 UTC
Permalink
I would like to know how to invoke InfiniBand hardware on CentOS 6x cluster
with OpenMPI (static libs.) for running my C++ code. This is how I compile
and run:

/usr/local/open-mpi/1.10.7/bin/mpic++ -L/usr/local/open-mpi/1.10.7/lib
-Bstatic main.cpp -o DoWork

usr/local/open-mpi/1.10.7/bin/mpiexec -mca btl tcp,self --hostfile
hostfile5 -host node01,node02,node03,node04,node05 -n 200 DoWork

Here, "*-mca btl tcp,self*" reveals that *TCP* is used, and the cluster has
InfiniBand.

What should be changed in compiling and running commands for InfiniBand to
be invoked? If I just replace "*-mca btl tcp,self*" with "*-mca btl
openib,self*" then I get plenty of errors with relevant one saying:

*At least one pair of MPI processes are unable to reach each other for MPI
communications. This means that no Open MPI device has indicated that it
can be used to communicate between these processes. This is an error; Open
MPI requires that all MPI processes be able to reach each other. This error
can sometimes be the result of forgetting to specify the "self" BTL.*

Thanks very much!!!

*Boris *
Gus Correa
2017-07-13 21:55:06 UTC
Permalink
Have you tried:

-mca btl vader,openib,self

or

-mca btl sm,openib,self

by chance?

That adds a btl for intra-node communication (vader or sm).
Post by Boris M. Vulovic
I would like to know how to invoke InfiniBand hardware on CentOS 6x
cluster with OpenMPI (static libs.) for running my C++ code. This is how
/usr/local/open-mpi/1.10.7/bin/mpic++ -L/usr/local/open-mpi/1.10.7/lib
-Bstatic main.cpp -o DoWork
usr/local/open-mpi/1.10.7/bin/mpiexec -mca btl tcp,self --hostfile
hostfile5 -host node01,node02,node03,node04,node05 -n 200 DoWork
Here, "*-mca btl tcp,self*" reveals that *TCP* is used, and the cluster
has InfiniBand.
What should be changed in compiling and running commands for InfiniBand
to be invoked? If I just replace "*-mca btl tcp,self*" with "*-mca btl
/At least one pair of MPI processes are unable to reach each other for
MPI communications. This means that no Open MPI device has indicated
that it can be used to communicate between these processes. This is an
error; Open MPI requires that all MPI processes be able to reach each
other. This error can sometimes be the result of forgetting to specify
the "self" BTL./
Thanks very much!!!
*Boris *
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Gilles Gouaillardet
2017-07-14 01:58:32 UTC
Permalink
Boris,


Open MPI should automatically detect the infiniband hardware, and use
openib (and *not* tcp) for inter node communications

and a shared memory optimized btl (e.g. sm or vader) for intra node
communications.


note if you "-mca btl openib,self", you tell Open MPI to use the openib
btl between any tasks,

including tasks running on the same node (which is less efficient than
using sm or vader)


at first, i suggest you make sure infiniband is up and running on all
your nodes.

(just run ibstat, at least one port should be listed, state should be
Active, and all nodes should have the same SM lid)


then try to run two tasks on two nodes.


if this does not work, you can

mpirun --mca btl_base_verbose 100 ...

and post the logs so we can investigate from there.


Cheers,


Gilles
Post by Boris M. Vulovic
I would like to know how to invoke InfiniBand hardware on CentOS 6x
cluster with OpenMPI (static libs.) for running my C++ code. This is
/usr/local/open-mpi/1.10.7/bin/mpic++ -L/usr/local/open-mpi/1.10.7/lib
-Bstatic main.cpp -o DoWork
usr/local/open-mpi/1.10.7/bin/mpiexec -mca btl tcp,self --hostfile
hostfile5 -host node01,node02,node03,node04,node05 -n 200 DoWork
Here, "*-mca btl tcp,self*" reveals that *TCP* is used, and the
cluster has InfiniBand.
What should be changed in compiling and running commands for
InfiniBand to be invoked? If I just replace "*-mca btl tcp,self*" with
"*-mca btl openib,self*" then I get plenty of errors with relevant one
/At least one pair of MPI processes are unable to reach each other for
MPI communications. This means that no Open MPI device has indicated
that it can be used to communicate between these processes. This is an
error; Open MPI requires that all MPI processes be able to reach each
other. This error can sometimes be the result of forgetting to specify
the "self" BTL./
Thanks very much!!!
*Boris *
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
John Hearns via users
2017-07-14 07:37:01 UTC
Permalink
ABoris, as Gilles says - first do som elower level checkouts of your
Infiniband network.
I suggest running:
ibdiagnet
ibhosts
and then as Gilles says 'ibstat' on each node
Post by Gilles Gouaillardet
Boris,
Open MPI should automatically detect the infiniband hardware, and use
openib (and *not* tcp) for inter node communications
and a shared memory optimized btl (e.g. sm or vader) for intra node
communications.
note if you "-mca btl openib,self", you tell Open MPI to use the openib
btl between any tasks,
including tasks running on the same node (which is less efficient than
using sm or vader)
at first, i suggest you make sure infiniband is up and running on all your
nodes.
(just run ibstat, at least one port should be listed, state should be
Active, and all nodes should have the same SM lid)
then try to run two tasks on two nodes.
if this does not work, you can
mpirun --mca btl_base_verbose 100 ...
and post the logs so we can investigate from there.
Cheers,
Gilles
Post by Boris M. Vulovic
I would like to know how to invoke InfiniBand hardware on CentOS 6x
cluster with OpenMPI (static libs.) for running my C++ code. This is how I
/usr/local/open-mpi/1.10.7/bin/mpic++ -L/usr/local/open-mpi/1.10.7/lib
-Bstatic main.cpp -o DoWork
usr/local/open-mpi/1.10.7/bin/mpiexec -mca btl tcp,self --hostfile
hostfile5 -host node01,node02,node03,node04,node05 -n 200 DoWork
Here, "*-mca btl tcp,self*" reveals that *TCP* is used, and the cluster
has InfiniBand.
What should be changed in compiling and running commands for InfiniBand
to be invoked? If I just replace "*-mca btl tcp,self*" with "*-mca btl
/At least one pair of MPI processes are unable to reach each other for
MPI communications. This means that no Open MPI device has indicated that
it can be used to communicate between these processes. This is an error;
Open MPI requires that all MPI processes be able to reach each other. This
error can sometimes be the result of forgetting to specify the "self" BTL./
Thanks very much!!!
*Boris *
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Boris M. Vulovic
2017-07-14 16:34:05 UTC
Permalink
Gus, Gilles and John,

Thanks for the help. Let me first post (below) the output from checkouts of
the IB network:
ibdiagnet
ibhosts
ibstat (for login node, for now)

What do you think?
Thanks
--Boris


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

-bash-4.1$ *ibdiagnet*
----------
Load Plugins from:
/usr/share/ibdiagnet2.1.1/plugins/
(You can specify more paths to be looked in with "IBDIAGNET_PLUGINS_PATH"
env variable)

Plugin Name Result Comment
libibdiagnet_cable_diag_plugin-2.1.1 Succeeded Plugin loaded
libibdiagnet_phy_diag_plugin-2.1.1 Succeeded Plugin loaded

---------------------------------------------
Discovery
-E- Failed to initialize

-E- Fabric Discover failed, err=IBDiag initialize wasn't done
-E- Fabric Discover failed, MAD err=Failed to register SMI class

---------------------------------------------
Summary
-I- Stage Warnings Errors Comment
-I- Discovery NA
-I- Lids Check NA
-I- Links Check NA
-I- Subnet Manager NA
-I- Port Counters NA
-I- Nodes Information NA
-I- Speed / Width checks NA
-I- Partition Keys NA
-I- Alias GUIDs NA
-I- Temperature Sensing NA

-I- You can find detailed errors/warnings in:
/var/tmp/ibdiagnet2/ibdiagnet2.log

-E- A fatal error occurred, exiting...
-bash-4.1$
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

-bash-4.1$ *ibhosts*
ibwarn: [168221] mad_rpc_open_port: client_register for mgmt 1 failed
src/ibnetdisc.c:766; can't open MAD port ((null):0)
/usr/sbin/ibnetdiscover: iberror: failed: discover failed
-bash-4.1$

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-bash-4.1$ *ibstat*
CA 'mlx5_0'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.2020
Hardware version: 0
Node GUID: 0x248a0703005abb1c
System image GUID: 0x248a0703005abb1c
Port 1:
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x3c010000
Port GUID: 0x268a07fffe5abb1c
Link layer: Ethernet
CA 'mlx5_1'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.2020
Hardware version: 0
Node GUID: 0x248a0703005abb1d
System image GUID: 0x248a0703005abb1c
Port 1:
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x3c010000
Port GUID: 0x0000000000000000
Link layer: Ethernet
CA 'mlx5_2'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.2020
Hardware version: 0
Node GUID: 0x248a0703005abb30
System image GUID: 0x248a0703005abb30
Port 1:
State: Down
Physical state: Disabled
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x3c010000
Port GUID: 0x268a07fffe5abb30
Link layer: Ethernet
CA 'mlx5_3'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.2020
Hardware version: 0
Node GUID: 0x248a0703005abb31
System image GUID: 0x248a0703005abb30
Port 1:
State: Down
Physical state: Disabled
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x3c010000
Port GUID: 0x268a07fffe5abb31
Link layer: Ethernet
-bash-4.1$
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

On Fri, Jul 14, 2017 at 12:37 AM, John Hearns via users <
Post by John Hearns via users
ABoris, as Gilles says - first do som elower level checkouts of your
Infiniband network.
ibdiagnet
ibhosts
and then as Gilles says 'ibstat' on each node
Post by Gilles Gouaillardet
Boris,
Open MPI should automatically detect the infiniband hardware, and use
openib (and *not* tcp) for inter node communications
and a shared memory optimized btl (e.g. sm or vader) for intra node
communications.
note if you "-mca btl openib,self", you tell Open MPI to use the openib
btl between any tasks,
including tasks running on the same node (which is less efficient than
using sm or vader)
at first, i suggest you make sure infiniband is up and running on all
your nodes.
(just run ibstat, at least one port should be listed, state should be
Active, and all nodes should have the same SM lid)
then try to run two tasks on two nodes.
if this does not work, you can
mpirun --mca btl_base_verbose 100 ...
and post the logs so we can investigate from there.
Cheers,
Gilles
Post by Boris M. Vulovic
I would like to know how to invoke InfiniBand hardware on CentOS 6x
cluster with OpenMPI (static libs.) for running my C++ code. This is how I
/usr/local/open-mpi/1.10.7/bin/mpic++ -L/usr/local/open-mpi/1.10.7/lib
-Bstatic main.cpp -o DoWork
usr/local/open-mpi/1.10.7/bin/mpiexec -mca btl tcp,self --hostfile
hostfile5 -host node01,node02,node03,node04,node05 -n 200 DoWork
Here, "*-mca btl tcp,self*" reveals that *TCP* is used, and the cluster
has InfiniBand.
What should be changed in compiling and running commands for InfiniBand
to be invoked? If I just replace "*-mca btl tcp,self*" with "*-mca btl
/At least one pair of MPI processes are unable to reach each other for
MPI communications. This means that no Open MPI device has indicated that
it can be used to communicate between these processes. This is an error;
Open MPI requires that all MPI processes be able to reach each other. This
error can sometimes be the result of forgetting to specify the "self" BTL./
Thanks very much!!!
*Boris *
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
--
*Boris M. Vulovic*
John Hearns via users
2017-07-17 10:10:06 UTC
Permalink
Boris,
do you have a Subnet Manager running on your fabric?

I am sorry if there have bene other replies ot this over the weekend.
Post by Boris M. Vulovic
Gus, Gilles and John,
Thanks for the help. Let me first post (below) the output from checkouts
ibdiagnet
ibhosts
ibstat (for login node, for now)
What do you think?
Thanks
--Boris
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-bash-4.1$ *ibdiagnet*
----------
/usr/share/ibdiagnet2.1.1/plugins/
(You can specify more paths to be looked in with "IBDIAGNET_PLUGINS_PATH"
env variable)
Plugin Name Result Comment
libibdiagnet_cable_diag_plugin-2.1.1 Succeeded Plugin loaded
libibdiagnet_phy_diag_plugin-2.1.1 Succeeded Plugin loaded
---------------------------------------------
Discovery
-E- Failed to initialize
-E- Fabric Discover failed, err=IBDiag initialize wasn't done
-E- Fabric Discover failed, MAD err=Failed to register SMI class
---------------------------------------------
Summary
-I- Stage Warnings Errors Comment
-I- Discovery NA
-I- Lids Check NA
-I- Links Check NA
-I- Subnet Manager NA
-I- Port Counters NA
-I- Nodes Information NA
-I- Speed / Width checks NA
-I- Partition Keys NA
-I- Alias GUIDs NA
-I- Temperature Sensing NA
-I- You can find detailed errors/warnings in: /var/tmp/ibdiagnet2/
ibdiagnet2.log
-E- A fatal error occurred, exiting...
-bash-4.1$
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-bash-4.1$ *ibhosts*
ibwarn: [168221] mad_rpc_open_port: client_register for mgmt 1 failed
src/ibnetdisc.c:766; can't open MAD port ((null):0)
/usr/sbin/ibnetdiscover: iberror: failed: discover failed
-bash-4.1$
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-bash-4.1$ *ibstat*
CA 'mlx5_0'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.2020
Hardware version: 0
Node GUID: 0x248a0703005abb1c
System image GUID: 0x248a0703005abb1c
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x3c010000
Port GUID: 0x268a07fffe5abb1c
Link layer: Ethernet
CA 'mlx5_1'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.2020
Hardware version: 0
Node GUID: 0x248a0703005abb1d
System image GUID: 0x248a0703005abb1c
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x3c010000
Port GUID: 0x0000000000000000
Link layer: Ethernet
CA 'mlx5_2'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.2020
Hardware version: 0
Node GUID: 0x248a0703005abb30
System image GUID: 0x248a0703005abb30
State: Down
Physical state: Disabled
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x3c010000
Port GUID: 0x268a07fffe5abb30
Link layer: Ethernet
CA 'mlx5_3'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.2020
Hardware version: 0
Node GUID: 0x248a0703005abb31
System image GUID: 0x248a0703005abb30
State: Down
Physical state: Disabled
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x3c010000
Port GUID: 0x268a07fffe5abb31
Link layer: Ethernet
-bash-4.1$
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
On Fri, Jul 14, 2017 at 12:37 AM, John Hearns via users <
Post by John Hearns via users
ABoris, as Gilles says - first do som elower level checkouts of your
Infiniband network.
ibdiagnet
ibhosts
and then as Gilles says 'ibstat' on each node
Post by Gilles Gouaillardet
Boris,
Open MPI should automatically detect the infiniband hardware, and use
openib (and *not* tcp) for inter node communications
and a shared memory optimized btl (e.g. sm or vader) for intra node
communications.
note if you "-mca btl openib,self", you tell Open MPI to use the openib
btl between any tasks,
including tasks running on the same node (which is less efficient than
using sm or vader)
at first, i suggest you make sure infiniband is up and running on all
your nodes.
(just run ibstat, at least one port should be listed, state should be
Active, and all nodes should have the same SM lid)
then try to run two tasks on two nodes.
if this does not work, you can
mpirun --mca btl_base_verbose 100 ...
and post the logs so we can investigate from there.
Cheers,
Gilles
Post by Boris M. Vulovic
I would like to know how to invoke InfiniBand hardware on CentOS 6x
cluster with OpenMPI (static libs.) for running my C++ code. This is how I
/usr/local/open-mpi/1.10.7/bin/mpic++ -L/usr/local/open-mpi/1.10.7/lib
-Bstatic main.cpp -o DoWork
usr/local/open-mpi/1.10.7/bin/mpiexec -mca btl tcp,self --hostfile
hostfile5 -host node01,node02,node03,node04,node05 -n 200 DoWork
Here, "*-mca btl tcp,self*" reveals that *TCP* is used, and the cluster
has InfiniBand.
What should be changed in compiling and running commands for InfiniBand
to be invoked? If I just replace "*-mca btl tcp,self*" with "*-mca btl
/At least one pair of MPI processes are unable to reach each other for
MPI communications. This means that no Open MPI device has indicated that
it can be used to communicate between these processes. This is an error;
Open MPI requires that all MPI processes be able to reach each other. This
error can sometimes be the result of forgetting to specify the "self" BTL./
Thanks very much!!!
*Boris *
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
--
*Boris M. Vulovic*
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Gilles Gouaillardet
2017-07-17 11:30:36 UTC
Permalink
Boris,

these logs seem a bit odd to me.
as far as i remember, the state is POLLING when there is no subnet manager.
and when there is one, the state is ACTIVE *but* both Base and SM lid
are non zero

btw, is IPoIB configured ?
if yes, then can your hosts ping each other with this interface.

i noted your host has 4 ib ports, but only 2 are active.
you might want to try using one port at first, for example, you can
mpirun --mca btl_openib_if_include mlx5_0 ...

Cheers,

Gilles

On Sat, Jul 15, 2017 at 1:34 AM, Boris M. Vulovic
Post by Boris M. Vulovic
Gus, Gilles and John,
Thanks for the help. Let me first post (below) the output from checkouts of
ibdiagnet
ibhosts
ibstat (for login node, for now)
What do you think?
Thanks
--Boris
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-bash-4.1$ ibdiagnet
----------
/usr/share/ibdiagnet2.1.1/plugins/
(You can specify more paths to be looked in with "IBDIAGNET_PLUGINS_PATH"
env variable)
Plugin Name Result Comment
libibdiagnet_cable_diag_plugin-2.1.1 Succeeded Plugin loaded
libibdiagnet_phy_diag_plugin-2.1.1 Succeeded Plugin loaded
---------------------------------------------
Discovery
-E- Failed to initialize
-E- Fabric Discover failed, err=IBDiag initialize wasn't done
-E- Fabric Discover failed, MAD err=Failed to register SMI class
---------------------------------------------
Summary
-I- Stage Warnings Errors Comment
-I- Discovery NA
-I- Lids Check NA
-I- Links Check NA
-I- Subnet Manager NA
-I- Port Counters NA
-I- Nodes Information NA
-I- Speed / Width checks NA
-I- Partition Keys NA
-I- Alias GUIDs NA
-I- Temperature Sensing NA
/var/tmp/ibdiagnet2/ibdiagnet2.log
-E- A fatal error occurred, exiting...
-bash-4.1$
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-bash-4.1$ ibhosts
ibwarn: [168221] mad_rpc_open_port: client_register for mgmt 1 failed
src/ibnetdisc.c:766; can't open MAD port ((null):0)
/usr/sbin/ibnetdiscover: iberror: failed: discover failed
-bash-4.1$
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-bash-4.1$ ibstat
CA 'mlx5_0'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.2020
Hardware version: 0
Node GUID: 0x248a0703005abb1c
System image GUID: 0x248a0703005abb1c
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x3c010000
Port GUID: 0x268a07fffe5abb1c
Link layer: Ethernet
CA 'mlx5_1'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.2020
Hardware version: 0
Node GUID: 0x248a0703005abb1d
System image GUID: 0x248a0703005abb1c
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x3c010000
Port GUID: 0x0000000000000000
Link layer: Ethernet
CA 'mlx5_2'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.2020
Hardware version: 0
Node GUID: 0x248a0703005abb30
System image GUID: 0x248a0703005abb30
State: Down
Physical state: Disabled
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x3c010000
Port GUID: 0x268a07fffe5abb30
Link layer: Ethernet
CA 'mlx5_3'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.2020
Hardware version: 0
Node GUID: 0x248a0703005abb31
System image GUID: 0x248a0703005abb30
State: Down
Physical state: Disabled
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x3c010000
Port GUID: 0x268a07fffe5abb31
Link layer: Ethernet
-bash-4.1$
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
On Fri, Jul 14, 2017 at 12:37 AM, John Hearns via users
Post by John Hearns via users
ABoris, as Gilles says - first do som elower level checkouts of your
Infiniband network.
ibdiagnet
ibhosts
and then as Gilles says 'ibstat' on each node
Post by Gilles Gouaillardet
Boris,
Open MPI should automatically detect the infiniband hardware, and use
openib (and *not* tcp) for inter node communications
and a shared memory optimized btl (e.g. sm or vader) for intra node
communications.
note if you "-mca btl openib,self", you tell Open MPI to use the openib
btl between any tasks,
including tasks running on the same node (which is less efficient than
using sm or vader)
at first, i suggest you make sure infiniband is up and running on all
your nodes.
(just run ibstat, at least one port should be listed, state should be
Active, and all nodes should have the same SM lid)
then try to run two tasks on two nodes.
if this does not work, you can
mpirun --mca btl_base_verbose 100 ...
and post the logs so we can investigate from there.
Cheers,
Gilles
Post by Boris M. Vulovic
I would like to know how to invoke InfiniBand hardware on CentOS 6x
cluster with OpenMPI (static libs.) for running my C++ code. This is how I
/usr/local/open-mpi/1.10.7/bin/mpic++ -L/usr/local/open-mpi/1.10.7/lib
-Bstatic main.cpp -o DoWork
usr/local/open-mpi/1.10.7/bin/mpiexec -mca btl tcp,self --hostfile
hostfile5 -host node01,node02,node03,node04,node05 -n 200 DoWork
Here, "*-mca btl tcp,self*" reveals that *TCP* is used, and the cluster
has InfiniBand.
What should be changed in compiling and running commands for InfiniBand
to be invoked? If I just replace "*-mca btl tcp,self*" with "*-mca btl
/At least one pair of MPI processes are unable to reach each other for
MPI communications. This means that no Open MPI device has indicated that it
can be used to communicate between these processes. This is an error; Open
MPI requires that all MPI processes be able to reach each other. This error
can sometimes be the result of forgetting to specify the "self" BTL./
Thanks very much!!!
*Boris *
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
--
Boris M. Vulovic
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Russell Dekema
2017-07-17 13:31:29 UTC
Permalink
It looks like you have two dual-port Mellanox VPI cards in this
machine. These cards can be set to run InfiniBand or Ethernet on a
port-by-port basis, and all four of your ports are set to Ethernet
mode. Two of your ports have active 100 gigabit Ethernet links, and
the other two have no link up at all.

With no InfiniBand links on the machine, you will, of course, not be
able to run your OpenMPI job over InfiniBand.

If your machines and network are set up for it, you might be able to
run your job over RoCE (RDMA Over Converged Ethernet) using one or
both of those 100 GbE links. I have never used RoCE myself, but one
starting point for gathering more information on it might be the
following section of the OpenMPI FAQ:

https://www.open-mpi.org/faq/?category=openfabrics#ompi-over-roce

Sincerely,
Rusty Dekema
University of Michigan
Advanced Research Computing - Technology Services


On Fri, Jul 14, 2017 at 12:34 PM, Boris M. Vulovic
Post by Boris M. Vulovic
Gus, Gilles and John,
Thanks for the help. Let me first post (below) the output from checkouts of
ibdiagnet
ibhosts
ibstat (for login node, for now)
What do you think?
Thanks
--Boris
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-bash-4.1$ ibdiagnet
----------
/usr/share/ibdiagnet2.1.1/plugins/
(You can specify more paths to be looked in with "IBDIAGNET_PLUGINS_PATH"
env variable)
Plugin Name Result Comment
libibdiagnet_cable_diag_plugin-2.1.1 Succeeded Plugin loaded
libibdiagnet_phy_diag_plugin-2.1.1 Succeeded Plugin loaded
---------------------------------------------
Discovery
-E- Failed to initialize
-E- Fabric Discover failed, err=IBDiag initialize wasn't done
-E- Fabric Discover failed, MAD err=Failed to register SMI class
---------------------------------------------
Summary
-I- Stage Warnings Errors Comment
-I- Discovery NA
-I- Lids Check NA
-I- Links Check NA
-I- Subnet Manager NA
-I- Port Counters NA
-I- Nodes Information NA
-I- Speed / Width checks NA
-I- Partition Keys NA
-I- Alias GUIDs NA
-I- Temperature Sensing NA
/var/tmp/ibdiagnet2/ibdiagnet2.log
-E- A fatal error occurred, exiting...
-bash-4.1$
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-bash-4.1$ ibhosts
ibwarn: [168221] mad_rpc_open_port: client_register for mgmt 1 failed
src/ibnetdisc.c:766; can't open MAD port ((null):0)
/usr/sbin/ibnetdiscover: iberror: failed: discover failed
-bash-4.1$
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-bash-4.1$ ibstat
CA 'mlx5_0'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.2020
Hardware version: 0
Node GUID: 0x248a0703005abb1c
System image GUID: 0x248a0703005abb1c
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x3c010000
Port GUID: 0x268a07fffe5abb1c
Link layer: Ethernet
CA 'mlx5_1'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.2020
Hardware version: 0
Node GUID: 0x248a0703005abb1d
System image GUID: 0x248a0703005abb1c
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x3c010000
Port GUID: 0x0000000000000000
Link layer: Ethernet
CA 'mlx5_2'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.2020
Hardware version: 0
Node GUID: 0x248a0703005abb30
System image GUID: 0x248a0703005abb30
State: Down
Physical state: Disabled
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x3c010000
Port GUID: 0x268a07fffe5abb30
Link layer: Ethernet
CA 'mlx5_3'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.2020
Hardware version: 0
Node GUID: 0x248a0703005abb31
System image GUID: 0x248a0703005abb30
State: Down
Physical state: Disabled
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x3c010000
Port GUID: 0x268a07fffe5abb31
Link layer: Ethernet
-bash-4.1$
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
On Fri, Jul 14, 2017 at 12:37 AM, John Hearns via users
Post by John Hearns via users
ABoris, as Gilles says - first do som elower level checkouts of your
Infiniband network.
ibdiagnet
ibhosts
and then as Gilles says 'ibstat' on each node
Post by Gilles Gouaillardet
Boris,
Open MPI should automatically detect the infiniband hardware, and use
openib (and *not* tcp) for inter node communications
and a shared memory optimized btl (e.g. sm or vader) for intra node
communications.
note if you "-mca btl openib,self", you tell Open MPI to use the openib
btl between any tasks,
including tasks running on the same node (which is less efficient than
using sm or vader)
at first, i suggest you make sure infiniband is up and running on all
your nodes.
(just run ibstat, at least one port should be listed, state should be
Active, and all nodes should have the same SM lid)
then try to run two tasks on two nodes.
if this does not work, you can
mpirun --mca btl_base_verbose 100 ...
and post the logs so we can investigate from there.
Cheers,
Gilles
Post by Boris M. Vulovic
I would like to know how to invoke InfiniBand hardware on CentOS 6x
cluster with OpenMPI (static libs.) for running my C++ code. This is how I
/usr/local/open-mpi/1.10.7/bin/mpic++ -L/usr/local/open-mpi/1.10.7/lib
-Bstatic main.cpp -o DoWork
usr/local/open-mpi/1.10.7/bin/mpiexec -mca btl tcp,self --hostfile
hostfile5 -host node01,node02,node03,node04,node05 -n 200 DoWork
Here, "*-mca btl tcp,self*" reveals that *TCP* is used, and the cluster
has InfiniBand.
What should be changed in compiling and running commands for InfiniBand
to be invoked? If I just replace "*-mca btl tcp,self*" with "*-mca btl
/At least one pair of MPI processes are unable to reach each other for
MPI communications. This means that no Open MPI device has indicated that it
can be used to communicate between these processes. This is an error; Open
MPI requires that all MPI processes be able to reach each other. This error
can sometimes be the result of forgetting to specify the "self" BTL./
Thanks very much!!!
*Boris *
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
--
Boris M. Vulovic
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Boris M. Vulovic
2017-07-17 16:43:37 UTC
Permalink
Gus, Gilles, Russell, John:

Thanks very much for the replies and the help.
I got confirmation from the "root" that it is indeed RoCE with 100G.

I'll go over the info in the link Russell provided, but have a quick
question: if I run the "*mpiexec*" with "*-mca btl tcp,self*" do I get the
benefit of *RoCE *(the fastest speed)?

I'll go over the details of all reply and post useful feedback.

Thanks very much all!

Best,

--Boris
Post by Russell Dekema
It looks like you have two dual-port Mellanox VPI cards in this
machine. These cards can be set to run InfiniBand or Ethernet on a
port-by-port basis, and all four of your ports are set to Ethernet
mode. Two of your ports have active 100 gigabit Ethernet links, and
the other two have no link up at all.
With no InfiniBand links on the machine, you will, of course, not be
able to run your OpenMPI job over InfiniBand.
If your machines and network are set up for it, you might be able to
run your job over RoCE (RDMA Over Converged Ethernet) using one or
both of those 100 GbE links. I have never used RoCE myself, but one
starting point for gathering more information on it might be the
https://www.open-mpi.org/faq/?category=openfabrics#ompi-over-roce
Sincerely,
Rusty Dekema
University of Michigan
Advanced Research Computing - Technology Services
On Fri, Jul 14, 2017 at 12:34 PM, Boris M. Vulovic
Post by Boris M. Vulovic
Gus, Gilles and John,
Thanks for the help. Let me first post (below) the output from checkouts
of
Post by Boris M. Vulovic
ibdiagnet
ibhosts
ibstat (for login node, for now)
What do you think?
Thanks
--Boris
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Post by Boris M. Vulovic
-bash-4.1$ ibdiagnet
----------
/usr/share/ibdiagnet2.1.1/plugins/
(You can specify more paths to be looked in with "IBDIAGNET_PLUGINS_PATH"
env variable)
Plugin Name Result Comment
libibdiagnet_cable_diag_plugin-2.1.1 Succeeded Plugin loaded
libibdiagnet_phy_diag_plugin-2.1.1 Succeeded Plugin loaded
---------------------------------------------
Discovery
-E- Failed to initialize
-E- Fabric Discover failed, err=IBDiag initialize wasn't done
-E- Fabric Discover failed, MAD err=Failed to register SMI class
---------------------------------------------
Summary
-I- Stage Warnings Errors Comment
-I- Discovery NA
-I- Lids Check NA
-I- Links Check NA
-I- Subnet Manager NA
-I- Port Counters NA
-I- Nodes Information NA
-I- Speed / Width checks NA
-I- Partition Keys NA
-I- Alias GUIDs NA
-I- Temperature Sensing NA
/var/tmp/ibdiagnet2/ibdiagnet2.log
-E- A fatal error occurred, exiting...
-bash-4.1$
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Post by Boris M. Vulovic
-bash-4.1$ ibhosts
ibwarn: [168221] mad_rpc_open_port: client_register for mgmt 1 failed
src/ibnetdisc.c:766; can't open MAD port ((null):0)
/usr/sbin/ibnetdiscover: iberror: failed: discover failed
-bash-4.1$
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Post by Boris M. Vulovic
-bash-4.1$ ibstat
CA 'mlx5_0'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.2020
Hardware version: 0
Node GUID: 0x248a0703005abb1c
System image GUID: 0x248a0703005abb1c
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x3c010000
Port GUID: 0x268a07fffe5abb1c
Link layer: Ethernet
CA 'mlx5_1'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.2020
Hardware version: 0
Node GUID: 0x248a0703005abb1d
System image GUID: 0x248a0703005abb1c
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x3c010000
Port GUID: 0x0000000000000000
Link layer: Ethernet
CA 'mlx5_2'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.2020
Hardware version: 0
Node GUID: 0x248a0703005abb30
System image GUID: 0x248a0703005abb30
State: Down
Physical state: Disabled
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x3c010000
Port GUID: 0x268a07fffe5abb30
Link layer: Ethernet
CA 'mlx5_3'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.2020
Hardware version: 0
Node GUID: 0x248a0703005abb31
System image GUID: 0x248a0703005abb30
State: Down
Physical state: Disabled
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x3c010000
Port GUID: 0x268a07fffe5abb31
Link layer: Ethernet
-bash-4.1$
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Post by Boris M. Vulovic
On Fri, Jul 14, 2017 at 12:37 AM, John Hearns via users
Post by John Hearns via users
ABoris, as Gilles says - first do som elower level checkouts of your
Infiniband network.
ibdiagnet
ibhosts
and then as Gilles says 'ibstat' on each node
Post by Gilles Gouaillardet
Boris,
Open MPI should automatically detect the infiniband hardware, and use
openib (and *not* tcp) for inter node communications
and a shared memory optimized btl (e.g. sm or vader) for intra node
communications.
note if you "-mca btl openib,self", you tell Open MPI to use the openib
btl between any tasks,
including tasks running on the same node (which is less efficient than
using sm or vader)
at first, i suggest you make sure infiniband is up and running on all
your nodes.
(just run ibstat, at least one port should be listed, state should be
Active, and all nodes should have the same SM lid)
then try to run two tasks on two nodes.
if this does not work, you can
mpirun --mca btl_base_verbose 100 ...
and post the logs so we can investigate from there.
Cheers,
Gilles
Post by Boris M. Vulovic
I would like to know how to invoke InfiniBand hardware on CentOS 6x
cluster with OpenMPI (static libs.) for running my C++ code. This is
how I
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
/usr/local/open-mpi/1.10.7/bin/mpic++ -L/usr/local/open-mpi/1.10.7/
lib
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
-Bstatic main.cpp -o DoWork
usr/local/open-mpi/1.10.7/bin/mpiexec -mca btl tcp,self --hostfile
hostfile5 -host node01,node02,node03,node04,node05 -n 200 DoWork
Here, "*-mca btl tcp,self*" reveals that *TCP* is used, and the
cluster
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
has InfiniBand.
What should be changed in compiling and running commands for
InfiniBand
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
to be invoked? If I just replace "*-mca btl tcp,self*" with "*-mca btl
/At least one pair of MPI processes are unable to reach each other for
MPI communications. This means that no Open MPI device has indicated
that it
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
can be used to communicate between these processes. This is an error;
Open
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
MPI requires that all MPI processes be able to reach each other. This
error
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
can sometimes be the result of forgetting to specify the "self" BTL./
Thanks very much!!!
*Boris *
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
--
Boris M. Vulovic
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
--
*Boris M. Vulovic*
Gus Correa
2017-07-17 17:06:18 UTC
Permalink
Hi Boris

The nodes may have standard Gigabit Ethernet interfaces,
besides the Infiniband (RoCE).
You may want to direct OpenMPI to use the Infiniband interfaces,
not Gigabit Ethernet,
by adding something like this to "--mca btl self,vader,self":

"--mca btl_tcp_if_include ib0,ib1"

(Where the interface names ib0,ib1 are just my guess for
what your nodes may have. Check with your "root" system administrator!)

That syntax may also use IP address, or a subnet mask,
whichever it is simpler for you.
It is better explained in this FAQ:

https://www.open-mpi.org/faq/?category=all#tcp-selection

BTW, some of your questions (and others that you may hit later)
are covered in the OpenMPI FAQ:

https://www.open-mpi.org/faq/?category=all

I hope this helps,
Gus Correa
Post by Boris M. Vulovic
Thanks very much for the replies and the help.
I got confirmation from the "root" that it is indeed RoCE with 100G.
I'll go over the info in the link Russell provided, but have a quick
question: if I run the "*mpiexec*" with "*-mca btl tcp,self*" do I get
the benefit of *RoCE *(the fastest speed)?
I'll go over the details of all reply and post useful feedback.
Thanks very much all!
Best,
--Boris
It looks like you have two dual-port Mellanox VPI cards in this
machine. These cards can be set to run InfiniBand or Ethernet on a
port-by-port basis, and all four of your ports are set to Ethernet
mode. Two of your ports have active 100 gigabit Ethernet links, and
the other two have no link up at all.
With no InfiniBand links on the machine, you will, of course, not be
able to run your OpenMPI job over InfiniBand.
If your machines and network are set up for it, you might be able to
run your job over RoCE (RDMA Over Converged Ethernet) using one or
both of those 100 GbE links. I have never used RoCE myself, but one
starting point for gathering more information on it might be the
https://www.open-mpi.org/faq/?category=openfabrics#ompi-over-roce
<https://www.open-mpi.org/faq/?category=openfabrics#ompi-over-roce>
Sincerely,
Rusty Dekema
University of Michigan
Advanced Research Computing - Technology Services
On Fri, Jul 14, 2017 at 12:34 PM, Boris M. Vulovic
Post by Boris M. Vulovic
Gus, Gilles and John,
Thanks for the help. Let me first post (below) the output from
checkouts of
Post by Boris M. Vulovic
ibdiagnet
ibhosts
ibstat (for login node, for now)
What do you think?
Thanks
--Boris
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Post by Boris M. Vulovic
-bash-4.1$ ibdiagnet
----------
/usr/share/ibdiagnet2.1.1/plugins/
(You can specify more paths to be looked in with
"IBDIAGNET_PLUGINS_PATH"
Post by Boris M. Vulovic
env variable)
Plugin Name Result Comment
libibdiagnet_cable_diag_plugin-2.1.1 Succeeded Plugin
loaded
Post by Boris M. Vulovic
libibdiagnet_phy_diag_plugin-2.1.1 Succeeded Plugin
loaded
Post by Boris M. Vulovic
---------------------------------------------
Discovery
-E- Failed to initialize
-E- Fabric Discover failed, err=IBDiag initialize wasn't done
-E- Fabric Discover failed, MAD err=Failed to register SMI class
---------------------------------------------
Summary
-I- Stage Warnings Errors Comment
-I- Discovery NA
-I- Lids Check NA
-I- Links Check NA
-I- Subnet Manager NA
-I- Port Counters NA
-I- Nodes Information NA
-I- Speed / Width checks NA
-I- Partition Keys NA
-I- Alias GUIDs NA
-I- Temperature Sensing NA
/var/tmp/ibdiagnet2/ibdiagnet2.log
-E- A fatal error occurred, exiting...
-bash-4.1$
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Post by Boris M. Vulovic
-bash-4.1$ ibhosts
ibwarn: [168221] mad_rpc_open_port: client_register for mgmt 1 failed
src/ibnetdisc.c:766; can't open MAD port ((null):0)
/usr/sbin/ibnetdiscover: iberror: failed: discover failed
-bash-4.1$
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Post by Boris M. Vulovic
-bash-4.1$ ibstat
CA 'mlx5_0'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.2020
Hardware version: 0
Node GUID: 0x248a0703005abb1c
System image GUID: 0x248a0703005abb1c
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x3c010000
Port GUID: 0x268a07fffe5abb1c
Link layer: Ethernet
CA 'mlx5_1'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.2020
Hardware version: 0
Node GUID: 0x248a0703005abb1d
System image GUID: 0x248a0703005abb1c
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x3c010000
Port GUID: 0x0000000000000000
Link layer: Ethernet
CA 'mlx5_2'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.2020
Hardware version: 0
Node GUID: 0x248a0703005abb30
System image GUID: 0x248a0703005abb30
State: Down
Physical state: Disabled
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x3c010000
Port GUID: 0x268a07fffe5abb30
Link layer: Ethernet
CA 'mlx5_3'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.2020
Hardware version: 0
Node GUID: 0x248a0703005abb31
System image GUID: 0x248a0703005abb30
State: Down
Physical state: Disabled
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x3c010000
Port GUID: 0x268a07fffe5abb31
Link layer: Ethernet
-bash-4.1$
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Post by Boris M. Vulovic
On Fri, Jul 14, 2017 at 12:37 AM, John Hearns via users
Post by John Hearns via users
ABoris, as Gilles says - first do som elower level checkouts of your
Infiniband network.
ibdiagnet
ibhosts
and then as Gilles says 'ibstat' on each node
Post by Gilles Gouaillardet
Boris,
Open MPI should automatically detect the infiniband hardware,
and use
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
openib (and *not* tcp) for inter node communications
and a shared memory optimized btl (e.g. sm or vader) for intra node
communications.
note if you "-mca btl openib,self", you tell Open MPI to use
the openib
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
btl between any tasks,
including tasks running on the same node (which is less
efficient than
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
using sm or vader)
at first, i suggest you make sure infiniband is up and running
on all
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
your nodes.
(just run ibstat, at least one port should be listed, state
should be
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Active, and all nodes should have the same SM lid)
then try to run two tasks on two nodes.
if this does not work, you can
mpirun --mca btl_base_verbose 100 ...
and post the logs so we can investigate from there.
Cheers,
Gilles
Post by Boris M. Vulovic
I would like to know how to invoke InfiniBand hardware on
CentOS 6x
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
cluster with OpenMPI (static libs.) for running my C++ code.
This is how I
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
/usr/local/open-mpi/1.10.7/bin/mpic++
-L/usr/local/open-mpi/1.10.7/lib
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
-Bstatic main.cpp -o DoWork
usr/local/open-mpi/1.10.7/bin/mpiexec -mca btl tcp,self --hostfile
hostfile5 -host node01,node02,node03,node04,node05 -n 200 DoWork
Here, "*-mca btl tcp,self*" reveals that *TCP* is used, and
the cluster
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
has InfiniBand.
What should be changed in compiling and running commands for
InfiniBand
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
to be invoked? If I just replace "*-mca btl tcp,self*" with
"*-mca btl
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
openib,self*" then I get plenty of errors with relevant one
/At least one pair of MPI processes are unable to reach each
other for
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
MPI communications. This means that no Open MPI device has
indicated that it
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
can be used to communicate between these processes. This is an
error; Open
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
MPI requires that all MPI processes be able to reach each
other. This error
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
can sometimes be the result of forgetting to specify the
"self" BTL./
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
Thanks very much!!!
*Boris *
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
<https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
<https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
Post by Boris M. Vulovic
Post by John Hearns via users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
<https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
Post by Boris M. Vulovic
--
Boris M. Vulovic
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
<https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
<https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
--
*Boris M. Vulovic*
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Gus Correa
2017-07-17 17:09:15 UTC
Permalink
Post by Gus Correa
Hi Boris
The nodes may have standard Gigabit Ethernet interfaces,
besides the Infiniband (RoCE).
You may want to direct OpenMPI to use the Infiniband interfaces,
not Gigabit Ethernet,
Oops! Typo:
"--mca btl self,vader,tcp"
Post by Gus Correa
"--mca btl_tcp_if_include ib0,ib1"
(Where the interface names ib0,ib1 are just my guess for
what your nodes may have. Check with your "root" system administrator!)
That syntax may also use IP address, or a subnet mask,
whichever it is simpler for you.
https://www.open-mpi.org/faq/?category=all#tcp-selection
BTW, some of your questions (and others that you may hit later)
https://www.open-mpi.org/faq/?category=all
I hope this helps,
Gus Correa
Post by Boris M. Vulovic
Thanks very much for the replies and the help.
I got confirmation from the "root" that it is indeed RoCE with 100G.
I'll go over the info in the link Russell provided, but have a quick
question: if I run the "*mpiexec*" with "*-mca btl tcp,self*" do I get
the benefit of *RoCE *(the fastest speed)?
I'll go over the details of all reply and post useful feedback.
Thanks very much all!
Best,
--Boris
It looks like you have two dual-port Mellanox VPI cards in this
machine. These cards can be set to run InfiniBand or Ethernet on a
port-by-port basis, and all four of your ports are set to Ethernet
mode. Two of your ports have active 100 gigabit Ethernet links, and
the other two have no link up at all.
With no InfiniBand links on the machine, you will, of course, not be
able to run your OpenMPI job over InfiniBand.
If your machines and network are set up for it, you might be able to
run your job over RoCE (RDMA Over Converged Ethernet) using one or
both of those 100 GbE links. I have never used RoCE myself, but one
starting point for gathering more information on it might be the
https://www.open-mpi.org/faq/?category=openfabrics#ompi-over-roce
<https://www.open-mpi.org/faq/?category=openfabrics#ompi-over-roce>
Sincerely,
Rusty Dekema
University of Michigan
Advanced Research Computing - Technology Services
On Fri, Jul 14, 2017 at 12:34 PM, Boris M. Vulovic
Post by Boris M. Vulovic
Gus, Gilles and John,
Thanks for the help. Let me first post (below) the output from
checkouts of
Post by Boris M. Vulovic
ibdiagnet
ibhosts
ibstat (for login node, for now)
What do you think?
Thanks
--Boris
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Post by Boris M. Vulovic
-bash-4.1$ ibdiagnet
----------
/usr/share/ibdiagnet2.1.1/plugins/
(You can specify more paths to be looked in with
"IBDIAGNET_PLUGINS_PATH"
Post by Boris M. Vulovic
env variable)
Plugin Name Result Comment
libibdiagnet_cable_diag_plugin-2.1.1 Succeeded Plugin
loaded
Post by Boris M. Vulovic
libibdiagnet_phy_diag_plugin-2.1.1 Succeeded Plugin
loaded
Post by Boris M. Vulovic
---------------------------------------------
Discovery
-E- Failed to initialize
-E- Fabric Discover failed, err=IBDiag initialize wasn't done
-E- Fabric Discover failed, MAD err=Failed to register SMI class
---------------------------------------------
Summary
-I- Stage Warnings Errors Comment
-I- Discovery NA
-I- Lids Check NA
-I- Links Check NA
-I- Subnet Manager NA
-I- Port Counters NA
-I- Nodes Information NA
-I- Speed / Width checks NA
-I- Partition Keys NA
-I- Alias GUIDs NA
-I- Temperature Sensing NA
/var/tmp/ibdiagnet2/ibdiagnet2.log
-E- A fatal error occurred, exiting...
-bash-4.1$
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Post by Boris M. Vulovic
-bash-4.1$ ibhosts
ibwarn: [168221] mad_rpc_open_port: client_register for mgmt 1
failed
Post by Boris M. Vulovic
src/ibnetdisc.c:766; can't open MAD port ((null):0)
/usr/sbin/ibnetdiscover: iberror: failed: discover failed
-bash-4.1$
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Post by Boris M. Vulovic
-bash-4.1$ ibstat
CA 'mlx5_0'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.2020
Hardware version: 0
Node GUID: 0x248a0703005abb1c
System image GUID: 0x248a0703005abb1c
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x3c010000
Port GUID: 0x268a07fffe5abb1c
Link layer: Ethernet
CA 'mlx5_1'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.2020
Hardware version: 0
Node GUID: 0x248a0703005abb1d
System image GUID: 0x248a0703005abb1c
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x3c010000
Port GUID: 0x0000000000000000
Link layer: Ethernet
CA 'mlx5_2'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.2020
Hardware version: 0
Node GUID: 0x248a0703005abb30
System image GUID: 0x248a0703005abb30
State: Down
Physical state: Disabled
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x3c010000
Port GUID: 0x268a07fffe5abb30
Link layer: Ethernet
CA 'mlx5_3'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.2020
Hardware version: 0
Node GUID: 0x248a0703005abb31
System image GUID: 0x248a0703005abb30
State: Down
Physical state: Disabled
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x3c010000
Port GUID: 0x268a07fffe5abb31
Link layer: Ethernet
-bash-4.1$
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Post by Boris M. Vulovic
On Fri, Jul 14, 2017 at 12:37 AM, John Hearns via users
Post by John Hearns via users
ABoris, as Gilles says - first do som elower level checkouts
of your
Post by Boris M. Vulovic
Post by John Hearns via users
Infiniband network.
ibdiagnet
ibhosts
and then as Gilles says 'ibstat' on each node
Post by Gilles Gouaillardet
Boris,
Open MPI should automatically detect the infiniband hardware,
and use
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
openib (and *not* tcp) for inter node communications
and a shared memory optimized btl (e.g. sm or vader) for
intra node
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
communications.
note if you "-mca btl openib,self", you tell Open MPI to use
the openib
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
btl between any tasks,
including tasks running on the same node (which is less
efficient than
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
using sm or vader)
at first, i suggest you make sure infiniband is up and running
on all
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
your nodes.
(just run ibstat, at least one port should be listed, state
should be
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Active, and all nodes should have the same SM lid)
then try to run two tasks on two nodes.
if this does not work, you can
mpirun --mca btl_base_verbose 100 ...
and post the logs so we can investigate from there.
Cheers,
Gilles
Post by Boris M. Vulovic
I would like to know how to invoke InfiniBand hardware on
CentOS 6x
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
cluster with OpenMPI (static libs.) for running my C++ code.
This is how I
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
/usr/local/open-mpi/1.10.7/bin/mpic++
-L/usr/local/open-mpi/1.10.7/lib
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
-Bstatic main.cpp -o DoWork
usr/local/open-mpi/1.10.7/bin/mpiexec -mca btl tcp,self
--hostfile
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
hostfile5 -host node01,node02,node03,node04,node05 -n 200
DoWork
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
Here, "*-mca btl tcp,self*" reveals that *TCP* is used, and
the cluster
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
has InfiniBand.
What should be changed in compiling and running commands for
InfiniBand
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
to be invoked? If I just replace "*-mca btl tcp,self*" with
"*-mca btl
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
openib,self*" then I get plenty of errors with relevant one
/At least one pair of MPI processes are unable to reach each
other for
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
MPI communications. This means that no Open MPI device has
indicated that it
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
can be used to communicate between these processes. This is an
error; Open
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
MPI requires that all MPI processes be able to reach each
other. This error
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
can sometimes be the result of forgetting to specify the
"self" BTL./
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
Thanks very much!!!
*Boris *
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
<https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
<https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
Post by Boris M. Vulovic
Post by John Hearns via users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
<https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
Post by Boris M. Vulovic
--
Boris M. Vulovic
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
<https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
<https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
--
*Boris M. Vulovic*
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Boris M. Vulovic
2017-07-17 17:51:00 UTC
Permalink
Thanks Gus. I'll try and post results.
I am newbie in this and appreciate any advice very much.

Cheers
--Boris
Post by Gus Correa
Post by Gus Correa
Hi Boris
The nodes may have standard Gigabit Ethernet interfaces,
besides the Infiniband (RoCE).
You may want to direct OpenMPI to use the Infiniband interfaces,
not Gigabit Ethernet,
"--mca btl self,vader,tcp"
Post by Gus Correa
"--mca btl_tcp_if_include ib0,ib1"
(Where the interface names ib0,ib1 are just my guess for
what your nodes may have. Check with your "root" system administrator!)
That syntax may also use IP address, or a subnet mask,
whichever it is simpler for you.
https://www.open-mpi.org/faq/?category=all#tcp-selection
BTW, some of your questions (and others that you may hit later)
https://www.open-mpi.org/faq/?category=all
I hope this helps,
Gus Correa
Post by Boris M. Vulovic
Thanks very much for the replies and the help.
I got confirmation from the "root" that it is indeed RoCE with 100G.
I'll go over the info in the link Russell provided, but have a quick
question: if I run the "*mpiexec*" with "*-mca btl tcp,self*" do I get the
benefit of *RoCE *(the fastest speed)?
I'll go over the details of all reply and post useful feedback.
Thanks very much all!
Best,
--Boris
It looks like you have two dual-port Mellanox VPI cards in this
machine. These cards can be set to run InfiniBand or Ethernet on a
port-by-port basis, and all four of your ports are set to Ethernet
mode. Two of your ports have active 100 gigabit Ethernet links, and
the other two have no link up at all.
With no InfiniBand links on the machine, you will, of course, not be
able to run your OpenMPI job over InfiniBand.
If your machines and network are set up for it, you might be able to
run your job over RoCE (RDMA Over Converged Ethernet) using one or
both of those 100 GbE links. I have never used RoCE myself, but one
starting point for gathering more information on it might be the
https://www.open-mpi.org/faq/?category=openfabrics#ompi-over-roce
<https://www.open-mpi.org/faq/?category=openfabrics#ompi-over-roce>
Sincerely,
Rusty Dekema
University of Michigan
Advanced Research Computing - Technology Services
On Fri, Jul 14, 2017 at 12:34 PM, Boris M. Vulovic
Post by Boris M. Vulovic
Gus, Gilles and John,
Thanks for the help. Let me first post (below) the output from
checkouts of
Post by Boris M. Vulovic
ibdiagnet
ibhosts
ibstat (for login node, for now)
What do you think?
Thanks
--Boris
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Post by Boris M. Vulovic
-bash-4.1$ ibdiagnet
----------
/usr/share/ibdiagnet2.1.1/plugins/
(You can specify more paths to be looked in with
"IBDIAGNET_PLUGINS_PATH"
Post by Boris M. Vulovic
env variable)
Plugin Name Result Comment
libibdiagnet_cable_diag_plugin-2.1.1 Succeeded Plugin
loaded
Post by Boris M. Vulovic
libibdiagnet_phy_diag_plugin-2.1.1 Succeeded Plugin
loaded
Post by Boris M. Vulovic
---------------------------------------------
Discovery
-E- Failed to initialize
-E- Fabric Discover failed, err=IBDiag initialize wasn't done
-E- Fabric Discover failed, MAD err=Failed to register SMI class
---------------------------------------------
Summary
-I- Stage Warnings Errors Comment
-I- Discovery NA
-I- Lids Check NA
-I- Links Check NA
-I- Subnet Manager NA
-I- Port Counters NA
-I- Nodes Information NA
-I- Speed / Width checks NA
-I- Partition Keys NA
-I- Alias GUIDs NA
-I- Temperature Sensing NA
/var/tmp/ibdiagnet2/ibdiagnet2.log
-E- A fatal error occurred, exiting...
-bash-4.1$
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Post by Boris M. Vulovic
-bash-4.1$ ibhosts
ibwarn: [168221] mad_rpc_open_port: client_register for mgmt 1
failed
Post by Boris M. Vulovic
src/ibnetdisc.c:766; can't open MAD port ((null):0)
/usr/sbin/ibnetdiscover: iberror: failed: discover failed
-bash-4.1$
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Post by Boris M. Vulovic
-bash-4.1$ ibstat
CA 'mlx5_0'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.2020
Hardware version: 0
Node GUID: 0x248a0703005abb1c
System image GUID: 0x248a0703005abb1c
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x3c010000
Port GUID: 0x268a07fffe5abb1c
Link layer: Ethernet
CA 'mlx5_1'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.2020
Hardware version: 0
Node GUID: 0x248a0703005abb1d
System image GUID: 0x248a0703005abb1c
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x3c010000
Port GUID: 0x0000000000000000
Link layer: Ethernet
CA 'mlx5_2'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.2020
Hardware version: 0
Node GUID: 0x248a0703005abb30
System image GUID: 0x248a0703005abb30
State: Down
Physical state: Disabled
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x3c010000
Port GUID: 0x268a07fffe5abb30
Link layer: Ethernet
CA 'mlx5_3'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.2020
Hardware version: 0
Node GUID: 0x248a0703005abb31
System image GUID: 0x248a0703005abb30
State: Down
Physical state: Disabled
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x3c010000
Port GUID: 0x268a07fffe5abb31
Link layer: Ethernet
-bash-4.1$
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Post by Boris M. Vulovic
On Fri, Jul 14, 2017 at 12:37 AM, John Hearns via users
Post by John Hearns via users
ABoris, as Gilles says - first do som elower level checkouts of
your
Post by Boris M. Vulovic
Post by John Hearns via users
Infiniband network.
ibdiagnet
ibhosts
and then as Gilles says 'ibstat' on each node
Post by Gilles Gouaillardet
Boris,
Open MPI should automatically detect the infiniband hardware,
and use
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
openib (and *not* tcp) for inter node communications
and a shared memory optimized btl (e.g. sm or vader) for intra
node
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
communications.
note if you "-mca btl openib,self", you tell Open MPI to use
the openib
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
btl between any tasks,
including tasks running on the same node (which is less
efficient than
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
using sm or vader)
at first, i suggest you make sure infiniband is up and running
on all
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
your nodes.
(just run ibstat, at least one port should be listed, state
should be
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Active, and all nodes should have the same SM lid)
then try to run two tasks on two nodes.
if this does not work, you can
mpirun --mca btl_base_verbose 100 ...
and post the logs so we can investigate from there.
Cheers,
Gilles
Post by Boris M. Vulovic
I would like to know how to invoke InfiniBand hardware on
CentOS 6x
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
cluster with OpenMPI (static libs.) for running my C++ code.
This is how I
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
/usr/local/open-mpi/1.10.7/bin/mpic++
-L/usr/local/open-mpi/1.10.7/lib
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
-Bstatic main.cpp -o DoWork
usr/local/open-mpi/1.10.7/bin/mpiexec -mca btl tcp,self
--hostfile
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
hostfile5 -host node01,node02,node03,node04,node05 -n 200
DoWork
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
Here, "*-mca btl tcp,self*" reveals that *TCP* is used, and
the cluster
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
has InfiniBand.
What should be changed in compiling and running commands for
InfiniBand
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
to be invoked? If I just replace "*-mca btl tcp,self*" with
"*-mca btl
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
openib,self*" then I get plenty of errors with relevant one
/At least one pair of MPI processes are unable to reach each
other for
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
MPI communications. This means that no Open MPI device has
indicated that it
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
can be used to communicate between these processes. This is an
error; Open
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
MPI requires that all MPI processes be able to reach each
other. This error
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
can sometimes be the result of forgetting to specify the
"self" BTL./
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
Thanks very much!!!
*Boris *
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
<https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
<https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
Post by Boris M. Vulovic
Post by John Hearns via users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
<https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
Post by Boris M. Vulovic
--
Boris M. Vulovic
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
<https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
<https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
--
*Boris M. Vulovic*
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
--
*Boris M. Vulovic*
Russell Dekema
2017-07-17 18:00:04 UTC
Permalink
Since these ports are running in actual Ethernet mode (as opposed to
IPoIB), I do not think the interface names will be of the ibN (ib0,
ib1, etc) format. It is more likely that the interface names will be
of the form ethN or enPApBsCfD.

It would be best to check with your system administrator, but if the
'ethtool' tool is installed, you can check the line speed of an
interface by running 'ethtool <interface_name>' and looking for the
'Speed:' value.

Cheers,
Rusty D.
Post by Gus Correa
Hi Boris
The nodes may have standard Gigabit Ethernet interfaces,
besides the Infiniband (RoCE).
You may want to direct OpenMPI to use the Infiniband interfaces,
not Gigabit Ethernet,
"--mca btl_tcp_if_include ib0,ib1"
(Where the interface names ib0,ib1 are just my guess for
what your nodes may have. Check with your "root" system administrator!)
That syntax may also use IP address, or a subnet mask,
whichever it is simpler for you.
https://www.open-mpi.org/faq/?category=all#tcp-selection
BTW, some of your questions (and others that you may hit later)
https://www.open-mpi.org/faq/?category=all
I hope this helps,
Gus Correa
Post by Boris M. Vulovic
Thanks very much for the replies and the help.
I got confirmation from the "root" that it is indeed RoCE with 100G.
I'll go over the info in the link Russell provided, but have a quick
question: if I run the "*mpiexec*" with "*-mca btl tcp,self*" do I get the
benefit of *RoCE *(the fastest speed)?
I'll go over the details of all reply and post useful feedback.
Thanks very much all!
Best,
--Boris
It looks like you have two dual-port Mellanox VPI cards in this
machine. These cards can be set to run InfiniBand or Ethernet on a
port-by-port basis, and all four of your ports are set to Ethernet
mode. Two of your ports have active 100 gigabit Ethernet links, and
the other two have no link up at all.
With no InfiniBand links on the machine, you will, of course, not be
able to run your OpenMPI job over InfiniBand.
If your machines and network are set up for it, you might be able to
run your job over RoCE (RDMA Over Converged Ethernet) using one or
both of those 100 GbE links. I have never used RoCE myself, but one
starting point for gathering more information on it might be the
https://www.open-mpi.org/faq/?category=openfabrics#ompi-over-roce
<https://www.open-mpi.org/faq/?category=openfabrics#ompi-over-roce>
Sincerely,
Rusty Dekema
University of Michigan
Advanced Research Computing - Technology Services
On Fri, Jul 14, 2017 at 12:34 PM, Boris M. Vulovic
Post by Boris M. Vulovic
Gus, Gilles and John,
Thanks for the help. Let me first post (below) the output from
checkouts of
Post by Boris M. Vulovic
ibdiagnet
ibhosts
ibstat (for login node, for now)
What do you think?
Thanks
--Boris
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Post by Boris M. Vulovic
-bash-4.1$ ibdiagnet
----------
/usr/share/ibdiagnet2.1.1/plugins/
(You can specify more paths to be looked in with
"IBDIAGNET_PLUGINS_PATH"
Post by Boris M. Vulovic
env variable)
Plugin Name Result Comment
libibdiagnet_cable_diag_plugin-2.1.1 Succeeded Plugin
loaded
Post by Boris M. Vulovic
libibdiagnet_phy_diag_plugin-2.1.1 Succeeded Plugin
loaded
Post by Boris M. Vulovic
---------------------------------------------
Discovery
-E- Failed to initialize
-E- Fabric Discover failed, err=IBDiag initialize wasn't done
-E- Fabric Discover failed, MAD err=Failed to register SMI class
---------------------------------------------
Summary
-I- Stage Warnings Errors Comment
-I- Discovery NA
-I- Lids Check NA
-I- Links Check NA
-I- Subnet Manager NA
-I- Port Counters NA
-I- Nodes Information NA
-I- Speed / Width checks NA
-I- Partition Keys NA
-I- Alias GUIDs NA
-I- Temperature Sensing NA
/var/tmp/ibdiagnet2/ibdiagnet2.log
-E- A fatal error occurred, exiting...
-bash-4.1$
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Post by Boris M. Vulovic
-bash-4.1$ ibhosts
ibwarn: [168221] mad_rpc_open_port: client_register for mgmt 1
failed
Post by Boris M. Vulovic
src/ibnetdisc.c:766; can't open MAD port ((null):0)
/usr/sbin/ibnetdiscover: iberror: failed: discover failed
-bash-4.1$
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Post by Boris M. Vulovic
-bash-4.1$ ibstat
CA 'mlx5_0'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.2020
Hardware version: 0
Node GUID: 0x248a0703005abb1c
System image GUID: 0x248a0703005abb1c
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x3c010000
Port GUID: 0x268a07fffe5abb1c
Link layer: Ethernet
CA 'mlx5_1'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.2020
Hardware version: 0
Node GUID: 0x248a0703005abb1d
System image GUID: 0x248a0703005abb1c
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x3c010000
Port GUID: 0x0000000000000000
Link layer: Ethernet
CA 'mlx5_2'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.2020
Hardware version: 0
Node GUID: 0x248a0703005abb30
System image GUID: 0x248a0703005abb30
State: Down
Physical state: Disabled
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x3c010000
Port GUID: 0x268a07fffe5abb30
Link layer: Ethernet
CA 'mlx5_3'
CA type: MT4115
Number of ports: 1
Firmware version: 12.17.2020
Hardware version: 0
Node GUID: 0x248a0703005abb31
System image GUID: 0x248a0703005abb30
State: Down
Physical state: Disabled
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x3c010000
Port GUID: 0x268a07fffe5abb31
Link layer: Ethernet
-bash-4.1$
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Post by Boris M. Vulovic
On Fri, Jul 14, 2017 at 12:37 AM, John Hearns via users
Post by John Hearns via users
ABoris, as Gilles says - first do som elower level checkouts of
your
Post by Boris M. Vulovic
Post by John Hearns via users
Infiniband network.
ibdiagnet
ibhosts
and then as Gilles says 'ibstat' on each node
Post by Gilles Gouaillardet
Boris,
Open MPI should automatically detect the infiniband hardware,
and use
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
openib (and *not* tcp) for inter node communications
and a shared memory optimized btl (e.g. sm or vader) for intra
node
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
communications.
note if you "-mca btl openib,self", you tell Open MPI to use
the openib
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
btl between any tasks,
including tasks running on the same node (which is less
efficient than
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
using sm or vader)
at first, i suggest you make sure infiniband is up and running
on all
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
your nodes.
(just run ibstat, at least one port should be listed, state
should be
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Active, and all nodes should have the same SM lid)
then try to run two tasks on two nodes.
if this does not work, you can
mpirun --mca btl_base_verbose 100 ...
and post the logs so we can investigate from there.
Cheers,
Gilles
Post by Boris M. Vulovic
I would like to know how to invoke InfiniBand hardware on
CentOS 6x
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
cluster with OpenMPI (static libs.) for running my C++ code.
This is how I
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
/usr/local/open-mpi/1.10.7/bin/mpic++
-L/usr/local/open-mpi/1.10.7/lib
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
-Bstatic main.cpp -o DoWork
usr/local/open-mpi/1.10.7/bin/mpiexec -mca btl tcp,self
--hostfile
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
hostfile5 -host node01,node02,node03,node04,node05 -n 200 DoWork
Here, "*-mca btl tcp,self*" reveals that *TCP* is used, and
the cluster
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
has InfiniBand.
What should be changed in compiling and running commands for
InfiniBand
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
to be invoked? If I just replace "*-mca btl tcp,self*" with
"*-mca btl
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
openib,self*" then I get plenty of errors with relevant one
/At least one pair of MPI processes are unable to reach each
other for
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
MPI communications. This means that no Open MPI device has
indicated that it
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
can be used to communicate between these processes. This is an
error; Open
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
MPI requires that all MPI processes be able to reach each
other. This error
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
can sometimes be the result of forgetting to specify the
"self" BTL./
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
Post by Boris M. Vulovic
Thanks very much!!!
*Boris *
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
<https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
Post by Boris M. Vulovic
Post by John Hearns via users
Post by Gilles Gouaillardet
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
<https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
Post by Boris M. Vulovic
Post by John Hearns via users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
<https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
Post by Boris M. Vulovic
--
Boris M. Vulovic
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
<https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
<https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
--
*Boris M. Vulovic*
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Loading...