Discussion:
[OMPI users] Support for 50G/100G HCA in openmpi
Devesh Sharma via users
2017-04-13 09:22:50 UTC
Permalink
Hello list,

I am trying to run IMB using openmpi-2.0.1/2.1.0 on a 50G 2-node
cluster in my lab, but the test does not start. it fails with
following error:

Starting for 0 th iteration. Using openmpi
LOGPATH: /MPI/Logs/openmpi/imb/runlog-openmpi-np6-n2-0
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.

Local host: calypso-rhel73GA
Local device: bnxt_re0
--------------------------------------------------------------------------
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications. This means that no Open MPI device has indicated
that it can be used to communicate between these processes. This is
an error; Open MPI requires that all MPI processes be able to reach
each other. This error can sometimes be the result of forgetting to
specify the "self" BTL.

Process 1 ([[25467,1],0]) is on host: calypso-rhel73GA
Process 2 ([[25467,1],1]) is on host: pandora-rhel73GA
BTLs attempted: self sm

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
[calypso-rhel73GA:12532] *** An error occurred in MPI_Bcast
[calypso-rhel73GA:12532] *** reported by process [140683322785793,0]
[calypso-rhel73GA:12532] *** on communicator MPI_COMM_WORLD
[calypso-rhel73GA:12532] *** MPI_ERR_INTERN: internal error
[calypso-rhel73GA:12532] *** MPI_ERRORS_ARE_FATAL (processes in this
communicator will now abort,
[calypso-rhel73GA:12532] *** and potentially your MPI job)
*** Error in `/usr/local/imb/openmpi/dcheck/IMB-MPI1': free(): invalid
pointer: 0x00007ff37b2f34d8 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7c503)[0x7ff37a9ac503]
/usr/local/mpi/openmpi/lib/libmpi.so.20(+0x58d17)[0x7ff37af65d17]
/usr/local/mpi/openmpi/lib/libmpi.so.20(ompi_mpi_errors_are_fatal_comm_handler+0x105)[0x7ff37af66485]
/usr/local/mpi/openmpi/lib/libmpi.so.20(ompi_errhandler_invoke+0x115)[0x7ff37af659c5]
/usr/local/mpi/openmpi/lib/libmpi.so.20(MPI_Bcast+0x1a3)[0x7ff37af86743]
/usr/local/imb/openmpi/dcheck/IMB-MPI1[0x402dd7]
/usr/local/imb/openmpi/dcheck/IMB-MPI1[0x401e0b]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7ff37a951b35]
/usr/local/imb/openmpi/dcheck/IMB-MPI1[0x402744]
======= Memory map: ========
00400000-00415000 r-xp 00000000 fd:00 33970734
/usr/local/imb/openmpi/dcheck/IMB-MPI1
00614000-00615000 r--p 00014000 fd:00 33970734
/usr/local/imb/openmpi/dcheck/IMB-MPI1
00615000-00616000 rw-p 00015000 fd:00 33970734
/usr/local/imb/openmpi/dcheck/IMB-MPI1
00616000-0061a000 rw-p 00000000 00:00 0
00f2c000-01071000 rw-p 00000000 00:00 0 [heap]
7ff35ffff000-7ff368000000 rw-s 00000000 fd:00 17524899
/tmp/openmpi-sessions-***@calypso-rhel73GA_0/25467/1/shared_mem_pool.calypso-rhel73GA
(deleted)
7ff368000000-7ff368021000 rw-p 00000000 00:00 0
7ff368021000-7ff36c000000 ---p 00000000 00:00 0
7ff36c000000-7ff36c021000 rw-p 00000000 00:00 0
7ff36c021000-7ff370000000 ---p 00000000 00:00 0
7ff370000000-7ff370021000 rw-p 00000000 00:00 0
7ff370021000-7ff374000000 ---p 00000000 00:00 0
7ff374698000-7ff37469e000 r-xp 00000000 fd:00 51165705
/usr/local/lib/libbnxtre-rdmav2.so
7ff37469e000-7ff37489d000 ---p 00006000 fd:00 51165705
/usr/local/lib/libbnxtre-rdmav2.so
7ff37489d000-7ff37489e000 r--p 00005000 fd:00 51165705
/usr/local/lib/libbnxtre-rdmav2.so
7ff37489e000-7ff37489f000 rw-p 00006000 fd:00 51165705
/usr/local/lib/libbnxtre-rdmav2.so
7ff37489f000-7ff3748a4000 r-xp 00000000 fd:00 252310528
/usr/lib64/libibverbs/libcxgb3-rdmav2.so
7ff3748a4000-7ff374aa3000 ---p 00005000 fd:00 252310528
/usr/lib64/libibverbs/libcxgb3-rdmav2.so
7ff374aa3000-7ff374aa4000 r--p 00004000 fd:00 252310528
/usr/lib64/libibverbs/libcxgb3-rdmav2.so
7ff374aa4000-7ff374aa5000 rw-p 00005000 fd:00 252310528
/usr/lib64/libibverbs/libcxgb3-rdmav2.so
7ff374aa5000-7ff374aac000 r-xp 00000000 fd:00 252310529
/usr/lib64/libibverbs/libcxgb4-rdmav2.so
7ff374aac000-7ff374cab000 ---p 00007000 fd:00 252310529
/usr/lib64/libibverbs/libcxgb4-rdmav2.so
7ff374cab000-7ff374cac000 r--p 00006000 fd:00 252310529
/usr/lib64/libibverbs/libcxgb4-rdmav2.so
7ff374cac000-7ff374cad000 rw-p 00007000 fd:00 252310529
/usr/lib64/libibverbs/libcxgb4-rdmav2.so
7ff374cad000-7ff374cb1000 r-xp 00000000 fd:00 252310530
/usr/lib64/libibverbs/libhfi1verbs-rdmav2.so
7ff374cb1000-7ff374eb0000 ---p 00004000 fd:00 252310530
/usr/lib64/libibverbs/libhfi1verbs-rdmav2.so
7ff374eb0000-7ff374eb1000 r--p 00003000 fd:00 252310530
/usr/lib64/libibverbs/libhfi1verbs-rdmav2.so
7ff374eb1000-7ff374eb2000 rw-p 00004000 fd:00 252310530
/usr/lib64/libibverbs/libhfi1verbs-rdmav2.so
7ff374eb2000-7ff374eb7000 r-xp 00000000 fd:00 252310531
/usr/lib64/libibverbs/libhns-rdmav2.so
7ff374eb7000-7ff3750b6000 ---p 00005000 fd:00 252310531
/usr/lib64/libibverbs/libhns-rdmav2.so
7ff3750b6000-7ff3750b7000 r--p 00004000 fd:00 252310531
/usr/lib64/libibverbs/libhns-rdmav2.so
7ff3750b7000-7ff3750b8000 rw-p 00005000 fd:00 252310531
/usr/lib64/libibverbs/libhns-rdmav2.so
7ff3750b8000-7ff3750be000 r-xp 00000000 fd:00 252310532
/usr/lib64/libibverbs/libi40iw-rdmav2.so
7ff3750be000-7ff3752be000 ---p 00006000 fd:00 252310532
/usr/lib64/libibverbs/libi40iw-rdmav2.so
7ff3752be000-7ff3752bf000 r--p 00006000 fd:00 252310532
/usr/lib64/libibverbs/libi40iw-rdmav2.so
7ff3752bf000-7ff3752c0000 rw-p 00007000 fd:00 252310532
/usr/lib64/libibverbs/libi40iw-rdmav2.so
7ff3752c0000-7ff3752c4000 r-xp 00000000 fd:00 252310533
/usr/lib64/libibverbs/libipathverbs-rdmav2.so
7ff3752c4000-7ff3754c3000 ---p 00004000 fd:00 252310533
/usr/lib64/libibverbs/libipathverbs-rdmav2.so
7ff3754c3000-7ff3754c4000 r--p 00003000 fd:00 252310533
/usr/lib64/libibverbs/libipathverbs-rdmav2.so
7ff3754c4000-7ff3754c5000 rw-p 00004000 fd:00 252310533
/usr/lib64/libibverbs/libipathverbs-rdmav2.so
7ff3754c5000-7ff3754cd000 r-xp 00000000 fd:00 252310534
/usr/lib64/libibverbs/libmlx4-rdmav2.so
7ff3754cd000-7ff3756cc000 ---p 00008000 fd:00 252310534
/usr/lib64/libibverbs/libmlx4-rdmav2.so
7ff3756ce000-7ff3756e5000 r-xp 00000000 fd:00 252310535
/usr/lib64/libibverbs/libmlx5-rdmav2.so
7ff3756e5000-7ff3758e4000 ---p 00017000 fd:00 252310535
/usr/lib64/libibverbs/libmlx5-rdmav2.so
7ff3758e4000-7ff3758e5000 r--p 00016000 fd:00 252310535
/usr/lib64/libibverbs/libmlx5-rdmav2.so
7ff3758e5000-7ff3758e6000 rw-p 00017000 fd:00 252310535
/usr/lib64/libibverbs/libmlx5-rdmav2.so
7ff3758e6000-7ff3758ee000 r-xp 00000000 fd:00 252310536
/usr/lib64/libibverbs/libmthca-rdmav2.so
7ff3758ee000-7ff375aed000 ---p 00008000 fd:00 252310536
/usr/lib64/libibverbs/libmthca-rdmav2.so
7ff375aed000-7ff375aee000 r--p 00007000 fd:00 252310536
/usr/lib64/libibverbs/libmthca-rdmav2.so
7ff375aee000-7ff375aef000 rw-p 00008000 fd:00 252310536
/usr/lib64/libibverbs/libmthca-rdmav2.so
7ff375aef000-7ff375af4000 r-xp 00000000 fd:00 252310537
/usr/lib64/libibverbs/libnes-rdmav2.so
7ff375af4000-7ff375cf3000 ---p 00005000 fd:00 252310537
/usr/lib64/libibverbs/libnes-rdmav2.so
7ff375cf3000-7ff375cf4000 r--p 00004000 fd:00 252310537
/usr/lib64/libibverbs/libnes-rdmav2.so
7ff375cf4000-7ff375cf5000 rw-p 00005000 fd:00 252310537
/usr/lib64/libibverbs/libnes-rdmav2.so
7ff375cf5000-7ff375cfb000 r-xp 00000000 fd:00 252310538
/usr/lib64/libibverbs/libocrdma-rdmav2.so
7ff375cfb000-7ff375efa000 ---p 00006000 fd:00 252310538
/usr/lib64/libibverbs/libocrdma-rdmav2.so
7ff375efa000-7ff375efb000 r--p 00005000 fd:00 252310538
/usr/lib64/libibverbs/libocrdma-rdmav2.so[calypso-rhel73GA:12532]
*** Process received signal ***
[calypso-rhel73GA:12532] Signal: Aborted (6)
[calypso-rhel73GA:12532] Signal code: (-6)
[calypso-rhel73GA:12532] [ 0] /lib64/libpthread.so.0(+0xf370)[0x7ff37ad00370]
[calypso-rhel73GA:12532] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x7ff37a9651d7]
[calypso-rhel73GA:12532] [ 2] /lib64/libc.so.6(abort+0x148)[0x7ff37a9668c8]
[calypso-rhel73GA:12532] [ 3] /lib64/libc.so.6(+0x74f07)[0x7ff37a9a4f07]
[calypso-rhel73GA:12532] [ 4] /lib64/libc.so.6(+0x7c503)[0x7ff37a9ac503]
[calypso-rhel73GA:12532] [ 5]
/usr/local/mpi/openmpi/lib/libmpi.so.20(+0x58d17)[0x7ff37af65d17]
[calypso-rhel73GA:12532] [ 6]
/usr/local/mpi/openmpi/lib/libmpi.so.20(ompi_mpi_errors_are_fatal_comm_handler+0x105)[0x7ff37af66485]
[calypso-rhel73GA:12532] [ 7]
/usr/local/mpi/openmpi/lib/libmpi.so.20(ompi_errhandler_invoke+0x115)[0x7ff37af659c5]
[calypso-rhel73GA:12532] [ 8]
/usr/local/mpi/openmpi/lib/libmpi.so.20(MPI_Bcast+0x1a3)[0x7ff37af86743]
[calypso-rhel73GA:12532] [ 9] /usr/local/imb/openmpi/dcheck/IMB-MPI1[0x402dd7]
[calypso-rhel73GA:12532] [10] /usr/local/imb/openmpi/dcheck/IMB-MPI1[0x401e0b]
[calypso-rhel73GA:12532] [11]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7ff37a951b35]
[calypso-rhel73GA:12532] [12] /usr/local/imb/openmpi/dcheck/IMB-MPI1[0x402744]
[calypso-rhel73GA:12532] *** End of error message ***

Following are the run-time parameters I used:

mpirun -np 6 -hostfile./hostfile --mca btl openib,self,sm --mca
btl_openib_receive_queues P,65536,256,192,128 -mca
btl_openib_cpc_include rdmacm -mca pml ob1 --allow-run-as-root
--bind-to none --map-by node /usr/local/imb/openmpi/IMB-MPI1


After digging a little in the openmpi source code I figured out that,
openmpi is failing because the Speed returned by my device is "64"
(50G link speed).

It worked only when I applied this patch to the source:
diff --git a/opal/mca/common/verbs/common_verbs_port.c
b/opal/mca/common/verbs/common_verbs_port.c
index 831ba3f..e1d5834 100644
--- a/opal/mca/common/verbs/common_verbs_port.c
+++ b/opal/mca/common/verbs/common_verbs_port.c
@@ -68,6 +68,10 @@ int opal_common_verbs_port_bw(struct ibv_port_attr
*port_attr,
/* EDR: 25.78125 Gbps * 64/66, in megabits */
*bandwidth = 25000;
break;
+ case 64:
+ /* EDR: 25.78125 Gbps * 64/66, in megabits */
+ *bandwidth = 50000;
+ break;
default:
/* Who knows? */
return OPAL_ERR_NOT_FOUND;

I think this change needs to be included in the openmpi code to
support 50G RoCE devices.

The above double free problem still need someone's attention.

Following are the entries in the .ini file (just for reference) :
vendor_id = 0x14e4
vendor_part_id = 0x16d7
use_eager_rdma = 1
mtu = 1024
receive_queues = P,65536,256,192,128
max_inline_data = 96


-Regards
Devesh

Loading...