Discussion:
[OMPI users] How to verify RDMA traffic (RoCE) is being sent over a fabric when running OpenMPI
Brendan Myers
2016-11-08 22:15:17 UTC
Permalink
Hello,

I am trying to figure out how I can verify that the OpenMPI traffic is
actually being transmitted over my RoCE fabric connecting my cluster. My
MPI job runs quickly and error free but I cannot seem to verify that
significant amounts of data is being transferred to the other endpoint in my
RoCE fabric. I am able to see what I believe to be the oob data when I
remove the oob exclusion from my command when I analyze my RoCE interface
using the tools listed below.

Software:

* CentOS 7.2

* Open MPI 2.0.1

Command:

* mpirun --mca btl openib,self,sm --mca oob_tcp_if_exclude eth3
--mca btl_openib_receive_queues P,65536,120,64,32 --mca
btl_openib_cpc_include rdmacm -np 4 -hostfile mpi-hosts-ce
/usr/local/bin/IMB-MPI1

o Eth3 is my RoCE interface

o The 2 nodes involved RoCE interfaces are defined in my mpi-hosts-ce file

Ways I have looked to verify data transference:

* Through the port counters on my RoCE switch

o Sees data being sent when using ib_write_bw but not when using Open MPI

* Through ibdump

o Sees data being sent when using ib_write_bw but not when using Open MPI

* Through Wireshark

o Sees data being sent when using ib_write_bw but not when using Open MPI



I do not have much experience with Open MPI and apologize if I have left out
necessary information. I will respond with any data requested. I
appreciate the time spent to read and respond to this.





Thank you,



Brendan T. W. Myers

***@soft-forge.com <mailto:***@soft-forge.com>

Software Forge Inc
Howard Pritchard
2016-11-08 23:08:34 UTC
Permalink
Hi Brenda,

What type of ethernet device (is this a Mellanox HCA?) and ethernet switch
are you using? The mpirun configure
options look correct to me. Is it possible that you have all the mpi
processes on a single node?
It should be pretty obvious from the SendRecv IMB test if you're using
RoCE. The large message
bandwidth will be much better than if you are going through the tcp btl.

If you're using Mellanox cards, you might want to do a sanity check using
the MXM libraries.
You'd want to set MXM_TLS env. variable to "self,shm,rc". We got close to
90 Gb/sec bandwidth using Connect X-4
+ MXM MTL on a cluster earlier this year.

Howard
Post by Brendan Myers
Hello,
I am trying to figure out how I can verify that the OpenMPI traffic is
actually being transmitted over my RoCE fabric connecting my cluster. My
MPI job runs quickly and error free but I cannot seem to verify that
significant amounts of data is being transferred to the other endpoint in
my RoCE fabric. I am able to see what I believe to be the oob data when I
remove the oob exclusion from my command when I analyze my RoCE interface
using the tools listed below.
· CentOS 7.2
· Open MPI 2.0.1
· mpirun --mca btl openib,self,sm --mca oob_tcp_if_exclude eth3
--mca btl_openib_receive_queues P,65536,120,64,32 --mca
btl_openib_cpc_include rdmacm -np 4 -hostfile mpi-hosts-ce
/usr/local/bin/IMB-MPI1
o Eth3 is my RoCE interface
o The 2 nodes involved RoCE interfaces are defined in my mpi-hosts-ce file
· Through the port counters on my RoCE switch
o Sees data being sent when using ib_write_bw but not when using Open MPI
· Through ibdump
o Sees data being sent when using ib_write_bw but not when using Open MPI
· Through Wireshark
o Sees data being sent when using ib_write_bw but not when using Open MPI
I do not have much experience with Open MPI and apologize if I have left
out necessary information. I will respond with any data requested. I
appreciate the time spent to read and respond to this.
Thank you,
Brendan T. W. Myers
Software Forge Inc
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Howard Pritchard
2016-11-08 23:10:48 UTC
Permalink
HI Brenda,

I should clarify as my response may confuse folks. We had configured the
connectx4 cards to use
ethernet/RoCE rather than IB transport for these measurements.

Howard
Post by Howard Pritchard
Hi Brenda,
What type of ethernet device (is this a Mellanox HCA?) and ethernet switch
are you using? The mpirun configure
options look correct to me. Is it possible that you have all the mpi
processes on a single node?
It should be pretty obvious from the SendRecv IMB test if you're using
RoCE. The large message
bandwidth will be much better than if you are going through the tcp btl.
If you're using Mellanox cards, you might want to do a sanity check using
the MXM libraries.
You'd want to set MXM_TLS env. variable to "self,shm,rc". We got close
to 90 Gb/sec bandwidth using Connect X-4
+ MXM MTL on a cluster earlier this year.
Howard
Post by Brendan Myers
Hello,
I am trying to figure out how I can verify that the OpenMPI traffic is
actually being transmitted over my RoCE fabric connecting my cluster. My
MPI job runs quickly and error free but I cannot seem to verify that
significant amounts of data is being transferred to the other endpoint in
my RoCE fabric. I am able to see what I believe to be the oob data when I
remove the oob exclusion from my command when I analyze my RoCE interface
using the tools listed below.
· CentOS 7.2
· Open MPI 2.0.1
· mpirun --mca btl openib,self,sm --mca oob_tcp_if_exclude
eth3 --mca btl_openib_receive_queues P,65536,120,64,32 --mca
btl_openib_cpc_include rdmacm -np 4 -hostfile mpi-hosts-ce
/usr/local/bin/IMB-MPI1
o Eth3 is my RoCE interface
o The 2 nodes involved RoCE interfaces are defined in my mpi-hosts-ce file
· Through the port counters on my RoCE switch
o Sees data being sent when using ib_write_bw but not when using Open MPI
· Through ibdump
o Sees data being sent when using ib_write_bw but not when using Open MPI
· Through Wireshark
o Sees data being sent when using ib_write_bw but not when using Open MPI
I do not have much experience with Open MPI and apologize if I have left
out necessary information. I will respond with any data requested. I
appreciate the time spent to read and respond to this.
Thank you,
Brendan T. W. Myers
Software Forge Inc
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Loading...