Brendan Myers
2016-11-08 22:15:17 UTC
Hello,
I am trying to figure out how I can verify that the OpenMPI traffic is
actually being transmitted over my RoCE fabric connecting my cluster. My
MPI job runs quickly and error free but I cannot seem to verify that
significant amounts of data is being transferred to the other endpoint in my
RoCE fabric. I am able to see what I believe to be the oob data when I
remove the oob exclusion from my command when I analyze my RoCE interface
using the tools listed below.
Software:
* CentOS 7.2
* Open MPI 2.0.1
Command:
* mpirun --mca btl openib,self,sm --mca oob_tcp_if_exclude eth3
--mca btl_openib_receive_queues P,65536,120,64,32 --mca
btl_openib_cpc_include rdmacm -np 4 -hostfile mpi-hosts-ce
/usr/local/bin/IMB-MPI1
o Eth3 is my RoCE interface
o The 2 nodes involved RoCE interfaces are defined in my mpi-hosts-ce file
Ways I have looked to verify data transference:
* Through the port counters on my RoCE switch
o Sees data being sent when using ib_write_bw but not when using Open MPI
* Through ibdump
o Sees data being sent when using ib_write_bw but not when using Open MPI
* Through Wireshark
o Sees data being sent when using ib_write_bw but not when using Open MPI
I do not have much experience with Open MPI and apologize if I have left out
necessary information. I will respond with any data requested. I
appreciate the time spent to read and respond to this.
Thank you,
Brendan T. W. Myers
***@soft-forge.com <mailto:***@soft-forge.com>
Software Forge Inc
I am trying to figure out how I can verify that the OpenMPI traffic is
actually being transmitted over my RoCE fabric connecting my cluster. My
MPI job runs quickly and error free but I cannot seem to verify that
significant amounts of data is being transferred to the other endpoint in my
RoCE fabric. I am able to see what I believe to be the oob data when I
remove the oob exclusion from my command when I analyze my RoCE interface
using the tools listed below.
Software:
* CentOS 7.2
* Open MPI 2.0.1
Command:
* mpirun --mca btl openib,self,sm --mca oob_tcp_if_exclude eth3
--mca btl_openib_receive_queues P,65536,120,64,32 --mca
btl_openib_cpc_include rdmacm -np 4 -hostfile mpi-hosts-ce
/usr/local/bin/IMB-MPI1
o Eth3 is my RoCE interface
o The 2 nodes involved RoCE interfaces are defined in my mpi-hosts-ce file
Ways I have looked to verify data transference:
* Through the port counters on my RoCE switch
o Sees data being sent when using ib_write_bw but not when using Open MPI
* Through ibdump
o Sees data being sent when using ib_write_bw but not when using Open MPI
* Through Wireshark
o Sees data being sent when using ib_write_bw but not when using Open MPI
I do not have much experience with Open MPI and apologize if I have left out
necessary information. I will respond with any data requested. I
appreciate the time spent to read and respond to this.
Thank you,
Brendan T. W. Myers
***@soft-forge.com <mailto:***@soft-forge.com>
Software Forge Inc