[OMPI users] A couple of general questions
Charles A Taylor
2018-06-14 11:33:26 UTC
Because of the issues we are having with OpenMPI and the openib BTL (questions previously asked), I’ve been looking into what other transports are available. I was particularly interested in OFI/libfabric support but cannot find any information on it more recent than a reference to the usNIC BTL from 2015 (Jeff Squyres, Cisco). Unfortunately, the openmpi-org website FAQ’s covering OpenFabrics support don’t mention anything beyond OpenMPI 1.8. Given that 3.1 is the current stable version, that seems odd.

That being the case, I thought I’d ask here. After laying down the libfabric-devel RPM and building (3.1.0) with —with-libfabric=/usr, I end up with an “ofi” MTL but nothing else. I can run with OMPI_MCA_mtl=ofi and OMPI_MCA_btl=“self,vader,openib” but it eventually crashes in libopen-pal.so. (mpi_waitall() higher up the stack).

GIZMO:9185 terminated with signal 11 at PC=2b4d4b68a91d SP=7ffcfbde9ff0. Backtrace:

Questions: Am I using the OFI MTL as intended? Should there be an “ofi” BTL? Does anyone use this?


Charlie Taylor
UF Research Computing

PS - If you could use some help updating the FAQs, I’d be willing to put in some time. I’d probably learn a lot.
Gilles Gouaillardet
2018-06-14 11:46:14 UTC

If you are using infiniband hardware, the recommended way is to use UCX.


Post by Charles A Taylor
Because of the issues we are having with OpenMPI and the openib BTL
(questions previously asked), I’ve been looking into what other transports
are available. I was particularly interested in OFI/libfabric support but
cannot find any information on it more recent than a reference to the usNIC
BTL from 2015 (Jeff Squyres, Cisco). Unfortunately, the openmpi-org
website FAQ’s covering OpenFabrics support don’t mention anything beyond
OpenMPI 1.8. Given that 3.1 is the current stable version, that seems odd.
That being the case, I thought I’d ask here. After laying down the
libfabric-devel RPM and building (3.1.0) with —with-libfabric=/usr, I end
up with an “ofi” MTL but nothing else. I can run with OMPI_MCA_mtl=ofi
and OMPI_MCA_btl=“self,vader,openib” but it eventually crashes in
libopen-pal.so. (mpi_waitall() higher up the stack).
GIZMO:9185 terminated with signal 11 at PC=2b4d4b68a91d SP=7ffcfbde9ff0.
Questions: Am I using the OFI MTL as intended? Should there be an “ofi”
BTL? Does anyone use this?
Charlie Taylor
UF Research Computing
PS - If you could use some help updating the FAQs, I’d be willing to put
in some time. I’d probably learn a lot.
users mailing list
Howard Pritchard
2018-06-14 11:48:00 UTC
Hello Charles

You are heading in the right direction.

First you might want to run the libfabric fi_info command to see what
capabilities you picked up from the libfabric RPMs.

Next you may well not actually be using the OFI mtl.

Could you run your app with

export OMPI_MCA_mtl_base_verbose=100

and post the output?

It would also help if you described the system you are using : OS
interconnect cpu type etc.

Post by Charles A Taylor
Because of the issues we are having with OpenMPI and the openib BTL
(questions previously asked), I’ve been looking into what other transports
are available. I was particularly interested in OFI/libfabric support but
cannot find any information on it more recent than a reference to the usNIC
BTL from 2015 (Jeff Squyres, Cisco). Unfortunately, the openmpi-org
website FAQ’s covering OpenFabrics support don’t mention anything beyond
OpenMPI 1.8. Given that 3.1 is the current stable version, that seems odd.
That being the case, I thought I’d ask here. After laying down the
libfabric-devel RPM and building (3.1.0) with —with-libfabric=/usr, I end
up with an “ofi” MTL but nothing else. I can run with OMPI_MCA_mtl=ofi
and OMPI_MCA_btl=“self,vader,openib” but it eventually crashes in
libopen-pal.so. (mpi_waitall() higher up the stack).
GIZMO:9185 terminated with signal 11 at PC=2b4d4b68a91d SP=7ffcfbde9ff0.
Questions: Am I using the OFI MTL as intended? Should there be an “ofi”
BTL? Does anyone use this?
Charlie Taylor
UF Research Computing
PS - If you could use some help updating the FAQs, I’d be willing to put
in some time. I’d probably learn a lot.
users mailing list
Charles A Taylor
2018-06-14 12:37:11 UTC
I see what you mean. Below is the output (filtered for a single host). Our setup is very generic.

Dell SOS6320 hosts (haswell)
Mellanox connectx-3 HCAs (mlx4 drivers - native RHEL, not mofed).
FDR/EDR switches (stand-alone opensm)
slurm 16.05.11
pmix (pmix-1.1.5-1.el7.x86_64)
openmpi (3.0.0, 3.1.0)

Apps include the well known, LAMMPS, VASP, GROMACS, amber, raxml, espresso, namd2, (i.e. the usual list of research university apps).
gadget/gizmo/arepo are really the only ones giving us trouble but I know they run fine under both openmpi and impi/mpich/mvapich at other sites. I’m trying to figure out why we can’t seem to run it reliably but I’d also like to get up-to-date with our transport API’s. Seems we’ve fallen behind and are just doing the things we’ve always done (openib BTL).

I’ll try running with modified “provider_include” list and see what happens. The fi_info output shows the verbs, udp, and sockets providers.



[***@login4 mufasa]$ grep 'c29a-s2.ufhpc' mz0.e
[c29a-s2.ufhpc:01463] mca: base: components_register: registering framework mtl components
[c29a-s2.ufhpc:01463] mca: base: components_register: found loaded component ofi
[c29a-s2.ufhpc:01463] mca: base: components_register: component ofi register function successful
[c29a-s2.ufhpc:01463] mca: base: components_open: opening mtl components
[c29a-s2.ufhpc:01463] mca: base: components_open: found loaded component ofi
[c29a-s2.ufhpc:01463] mca: base: components_open: component ofi open function successful
[c29a-s2.ufhpc:01464] mca: base: components_register: registering framework mtl components
[c29a-s2.ufhpc:01464] mca: base: components_register: found loaded component ofi
[c29a-s2.ufhpc:01464] mca: base: components_register: component ofi register function successful
[c29a-s2.ufhpc:01464] mca: base: components_open: opening mtl components
[c29a-s2.ufhpc:01464] mca: base: components_open: found loaded component ofi
[c29a-s2.ufhpc:01464] mca: base: components_open: component ofi open function successful
[c29a-s2.ufhpc:01465] mca: base: components_register: registering framework mtl components
[c29a-s2.ufhpc:01465] mca: base: components_register: found loaded component ofi
[c29a-s2.ufhpc:01465] mca: base: components_register: component ofi register function successful
[c29a-s2.ufhpc:01465] mca: base: components_open: opening mtl components
[c29a-s2.ufhpc:01465] mca: base: components_open: found loaded component ofi
[c29a-s2.ufhpc:01465] mca: base: components_open: component ofi open function successful
[c29a-s2.ufhpc:01466] mca: base: components_register: registering framework mtl components
[c29a-s2.ufhpc:01466] mca: base: components_register: found loaded component ofi
[c29a-s2.ufhpc:01466] mca: base: components_register: component ofi register function successful
[c29a-s2.ufhpc:01466] mca: base: components_open: opening mtl components
[c29a-s2.ufhpc:01466] mca: base: components_open: found loaded component ofi
[c29a-s2.ufhpc:01466] mca: base: components_open: component ofi open function successful
[c29a-s2.ufhpc:01463] mca:base:select: Auto-selecting mtl components
[c29a-s2.ufhpc:01463] mca:base:select:( mtl) Querying component [ofi]
[c29a-s2.ufhpc:01463] mca:base:select:( mtl) Query of component [ofi] set priority to 25
[c29a-s2.ufhpc:01463] mca:base:select:( mtl) Selected component [ofi]
[c29a-s2.ufhpc:01463] select: initializing mtl component ofi
[c29a-s2.ufhpc:01464] mca:base:select: Auto-selecting mtl components
[c29a-s2.ufhpc:01464] mca:base:select:( mtl) Querying component [ofi]
[c29a-s2.ufhpc:01464] mca:base:select:( mtl) Query of component [ofi] set priority to 25
[c29a-s2.ufhpc:01464] mca:base:select:( mtl) Selected component [ofi]
[c29a-s2.ufhpc:01464] select: initializing mtl component ofi
[c29a-s2.ufhpc:01465] mca:base:select: Auto-selecting mtl components
[c29a-s2.ufhpc:01465] mca:base:select:( mtl) Querying component [ofi]
[c29a-s2.ufhpc:01465] mca:base:select:( mtl) Query of component [ofi] set priority to 25
[c29a-s2.ufhpc:01465] mca:base:select:( mtl) Selected component [ofi]
[c29a-s2.ufhpc:01465] select: initializing mtl component ofi
[c29a-s2.ufhpc:01466] mca:base:select: Auto-selecting mtl components
[c29a-s2.ufhpc:01466] mca:base:select:( mtl) Querying component [ofi]
[c29a-s2.ufhpc:01466] mca:base:select:( mtl) Query of component [ofi] set priority to 25
[c29a-s2.ufhpc:01466] mca:base:select:( mtl) Selected component [ofi]
[c29a-s2.ufhpc:01466] select: initializing mtl component ofi
[c29a-s2.ufhpc:01464] mtl_ofi_component.c:269: mtl:ofi:provider_include = "psm,psm2,gni"
[c29a-s2.ufhpc:01464] mtl_ofi_component.c:272: mtl:ofi:provider_exclude = "(null)"
[c29a-s2.ufhpc:01464] mtl_ofi_component.c:280: mtl:ofi: "verbs" not in include list
[c29a-s2.ufhpc:01464] mtl_ofi_component.c:280: mtl:ofi: "sockets" not in include list
[c29a-s2.ufhpc:01464] mtl_ofi_component.c:280: mtl:ofi: "sockets" not in include list
[c29a-s2.ufhpc:01464] mtl_ofi_component.c:280: mtl:ofi: "sockets" not in include list
[c29a-s2.ufhpc:01464] mtl_ofi_component.c:280: mtl:ofi: "sockets" not in include list
[c29a-s2.ufhpc:01464] mtl_ofi_component.c:301: mtl:ofi:prov: none
[c29a-s2.ufhpc:01464] mtl_ofi_component.c:410: select_ofi_provider: no provider found
[c29a-s2.ufhpc:01464] select: init returned failure for component ofi
[c29a-s2.ufhpc:01464] select: no component selected
[c29a-s2.ufhpc:01464] mca: base: close: component ofi closed
[c29a-s2.ufhpc:01464] mca: base: close: unloading component ofi
[c29a-s2.ufhpc:01465] mtl_ofi_component.c:269: mtl:ofi:provider_include = "psm,psm2,gni"
[c29a-s2.ufhpc:01465] mtl_ofi_component.c:272: mtl:ofi:provider_exclude = "(null)"
[c29a-s2.ufhpc:01465] mtl_ofi_component.c:280: mtl:ofi: "verbs" not in include list
[c29a-s2.ufhpc:01465] mtl_ofi_component.c:280: mtl:ofi: "sockets" not in include list
[c29a-s2.ufhpc:01465] mtl_ofi_component.c:280: mtl:ofi: "sockets" not in include list
[c29a-s2.ufhpc:01465] mtl_ofi_component.c:280: mtl:ofi: "sockets" not in include list
[c29a-s2.ufhpc:01465] mtl_ofi_component.c:280: mtl:ofi: "sockets" not in include list
[c29a-s2.ufhpc:01465] mtl_ofi_component.c:301: mtl:ofi:prov: none
[c29a-s2.ufhpc:01465] mtl_ofi_component.c:410: select_ofi_provider: no provider found
[c29a-s2.ufhpc:01465] select: init returned failure for component ofi
[c29a-s2.ufhpc:01465] select: no component selected
[c29a-s2.ufhpc:01465] mca: base: close: component ofi closed
[c29a-s2.ufhpc:01465] mca: base: close: unloading component ofi
[c29a-s2.ufhpc:01463] mtl_ofi_component.c:269: mtl:ofi:provider_include = "psm,psm2,gni"
[c29a-s2.ufhpc:01463] mtl_ofi_component.c:272: mtl:ofi:provider_exclude = "(null)"
[c29a-s2.ufhpc:01466] mtl_ofi_component.c:269: mtl:ofi:provider_include = "psm,psm2,gni"
[c29a-s2.ufhpc:01466] mtl_ofi_component.c:272: mtl:ofi:provider_exclude = "(null)"
[c29a-s2.ufhpc:01463] mtl_ofi_component.c:280: mtl:ofi: "verbs" not in include list
[c29a-s2.ufhpc:01463] mtl_ofi_component.c:280: mtl:ofi: "sockets" not in include list
[c29a-s2.ufhpc:01463] mtl_ofi_component.c:280: mtl:ofi: "sockets" not in include list
[c29a-s2.ufhpc:01463] mtl_ofi_component.c:280: mtl:ofi: "sockets" not in include list
[c29a-s2.ufhpc:01463] mtl_ofi_component.c:280: mtl:ofi: "sockets" not in include list
[c29a-s2.ufhpc:01466] mtl_ofi_component.c:280: mtl:ofi: "verbs" not in include list
[c29a-s2.ufhpc:01466] mtl_ofi_component.c:280: mtl:ofi: "sockets" not in include list
[c29a-s2.ufhpc:01466] mtl_ofi_component.c:280: mtl:ofi: "sockets" not in include list
[c29a-s2.ufhpc:01466] mtl_ofi_component.c:280: mtl:ofi: "sockets" not in include list
[c29a-s2.ufhpc:01466] mtl_ofi_component.c:280: mtl:ofi: "sockets" not in include list
[c29a-s2.ufhpc:01466] mtl_ofi_component.c:301: mtl:ofi:prov: none
[c29a-s2.ufhpc:01466] mtl_ofi_component.c:410: select_ofi_provider: no provider found
[c29a-s2.ufhpc:01463] mtl_ofi_component.c:301: mtl:ofi:prov: none
[c29a-s2.ufhpc:01463] mtl_ofi_component.c:410: select_ofi_provider: no provider found
[c29a-s2.ufhpc:01463] select: init returned failure for component ofi
[c29a-s2.ufhpc:01463] select: no component selected
[c29a-s2.ufhpc:01466] select: init returned failure for component ofi
[c29a-s2.ufhpc:01466] select: no component selected
[c29a-s2.ufhpc:01466] mca: base: close: component ofi closed
[c29a-s2.ufhpc:01466] mca: base: close: unloading component ofi
[c29a-s2.ufhpc:01463] mca: base: close: component ofi closed
[c29a-s2.ufhpc:01463] mca: base: close: unloading component ofi
Post by Howard Pritchard
Hello Charles
You are heading in the right direction.
First you might want to run the libfabric fi_info command to see what capabilities you picked up from the libfabric RPMs.
Next you may well not actually be using the OFI mtl.
Could you run your app with
export OMPI_MCA_mtl_base_verbose=100
and post the output?
It would also help if you described the system you are using : OS interconnect cpu type etc.
Because of the issues we are having with OpenMPI and the openib BTL (questions previously asked), I’ve been looking into what other transports are available. I was particularly interested in OFI/libfabric support but cannot find any information on it more recent than a reference to the usNIC BTL from 2015 (Jeff Squyres, Cisco). Unfortunately, the openmpi-org website FAQ’s covering OpenFabrics support don’t mention anything beyond OpenMPI 1.8. Given that 3.1 is the current stable version, that seems odd.
That being the case, I thought I’d ask here. After laying down the libfabric-devel RPM and building (3.1.0) with —with-libfabric=/usr, I end up with an “ofi” MTL but nothing else. I can run with OMPI_MCA_mtl=ofi and OMPI_MCA_btl=“self,vader,openib” but it eventually crashes in libopen-pal.so. (mpi_waitall() higher up the stack).
Questions: Am I using the OFI MTL as intended? Should there be an “ofi” BTL? Does anyone use this?
Charlie Taylor
UF Research Computing
PS - If you could use some help updating the FAQs, I’d be willing to put in some time. I’d probably learn a lot.
users mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.open-2Dmpi.org_mailman_listinfo_users&d=DwIFaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=8sBODgXZKw_dNqkFqkTqbGD3_7nNlm_pat-D6AqiaC8&m=EGR5U297e0v1wN5gzlnqAsj7sHLpSN3I_tjwpfbJQAI&s=k64is7lySeSVrkP8ys8ZIVuVHRY6VJpxBEXU1dXczAY&e= <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.open-2Dmpi.org_mailman_listinfo_users&d=DwMFaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=HOtXciFqK5GlgIgLAxthUQ&m=nOFQDWuhmU9qhe6be-0JeNMGn1q64kJj0nWQV-vZg7k&s=PoOVfxkE7rR9spMSFabAs8TokTpgbCIyJRGuWTf5jIk&e=>_______________________________________________
users mailing list
Charles A Taylor
2018-06-14 13:08:35 UTC

GIZMO: prov/verbs/src/ep_rdm/verbs_tagged_ep_rdm.c:443: fi_ibv_rdm_tagged_release_remote_sbuff: Assertion `0' failed.

GIZMO:10405 terminated with signal 6 at PC=2add5835c1f7 SP=7fff8071b008. Backtrace:
Post by Howard Pritchard
Hello Charles
You are heading in the right direction.
First you might want to run the libfabric fi_info command to see what capabilities you picked up from the libfabric RPMs.
Next you may well not actually be using the OFI mtl.
Could you run your app with
export OMPI_MCA_mtl_base_verbose=100
and post the output?
It would also help if you described the system you are using : OS interconnect cpu type etc.
Because of the issues we are having with OpenMPI and the openib BTL (questions previously asked), I’ve been looking into what other transports are available. I was particularly interested in OFI/libfabric support but cannot find any information on it more recent than a reference to the usNIC BTL from 2015 (Jeff Squyres, Cisco). Unfortunately, the openmpi-org website FAQ’s covering OpenFabrics support don’t mention anything beyond OpenMPI 1.8. Given that 3.1 is the current stable version, that seems odd.
That being the case, I thought I’d ask here. After laying down the libfabric-devel RPM and building (3.1.0) with —with-libfabric=/usr, I end up with an “ofi” MTL but nothing else. I can run with OMPI_MCA_mtl=ofi and OMPI_MCA_btl=“self,vader,openib” but it eventually crashes in libopen-pal.so. (mpi_waitall() higher up the stack).
Questions: Am I using the OFI MTL as intended? Should there be an “ofi” BTL? Does anyone use this?
Charlie Taylor
UF Research Computing
PS - If you could use some help updating the FAQs, I’d be willing to put in some time. I’d probably learn a lot.
users mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.open-2Dmpi.org_mailman_listinfo_users&d=DwIFaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=8sBODgXZKw_dNqkFqkTqbGD3_7nNlm_pat-D6AqiaC8&m=pDOR2yTEZWtS3wHCqrASHkfd22e7kPU3D1XnttWrL7Y&s=UYlpo1EvM2cQqSZ5N-DoOLoE-G9_kWlffvJ2WfuESP4&e= <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.open-2Dmpi.org_mailman_listinfo_users&d=DwMFaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=HOtXciFqK5GlgIgLAxthUQ&m=nOFQDWuhmU9qhe6be-0JeNMGn1q64kJj0nWQV-vZg7k&s=PoOVfxkE7rR9spMSFabAs8TokTpgbCIyJRGuWTf5jIk&e=>_______________________________________________
users mailing list
Cabral, Matias A
2018-06-14 16:49:08 UTC
Hi Charles,

What version of libfabric do you have installed? To run OMPI using the verbs provider you need to pair it with the ofi_rxm provider. fi_info should list it like:

provider: verbs;ofi_rxm

So in your command line you have to specify:
mpirun -mca pml cm -mca mtl ofi -mca mtl_ofi_provider_include “verbs;ofi_rxm” 

(don’t skip the quotes)
Post by Howard Pritchard
Unfortunately, the openmpi-org website FAQ’s covering OpenFabrics support don’t mention anything beyond OpenMPI 1.8.
Good feedback, I’ll look to see how this could be improved.



From: users [mailto:users-***@lists.open-mpi.org] On Behalf Of Charles A Taylor
Sent: Thursday, June 14, 2018 6:09 AM
To: Open MPI Users <***@lists.open-mpi.org>
Subject: Re: [OMPI users] A couple of general questions


GIZMO: prov/verbs/src/ep_rdm/verbs_tagged_ep_rdm.c:443: fi_ibv_rdm_tagged_release_remote_sbuff: Assertion `0' failed.

GIZMO:10405 terminated with signal 6 at PC=2add5835c1f7 SP=7fff8071b008. Backtrace:

On Jun 14, 2018, at 7:48 AM, Howard Pritchard <***@gmail.com<mailto:***@gmail.com>> wrote:

Hello Charles

You are heading in the right direction.

First you might want to run the libfabric fi_info command to see what capabilities you picked up from the libfabric RPMs.

Next you may well not actually be using the OFI mtl.

Could you run your app with

export OMPI_MCA_mtl_base_verbose=100

and post the output?

It would also help if you described the system you are using : OS interconnect cpu type etc.


Charles A Taylor <***@ufl.edu<mailto:***@ufl.edu>> schrieb am Do. 14. Juni 2018 um 06:36:
Because of the issues we are having with OpenMPI and the openib BTL (questions previously asked), I’ve been looking into what other transports are available. I was particularly interested in OFI/libfabric support but cannot find any information on it more recent than a reference to the usNIC BTL from 2015 (Jeff Squyres, Cisco). Unfortunately, the openmpi-org website FAQ’s covering OpenFabrics support don’t mention anything beyond OpenMPI 1.8. Given that 3.1 is the current stable version, that seems odd.

That being the case, I thought I’d ask here. After laying down the libfabric-devel RPM and building (3.1.0) with —with-libfabric=/usr, I end up with an “ofi” MTL but nothing else. I can run with OMPI_MCA_mtl=ofi and OMPI_MCA_btl=“self,vader,openib” but it eventually crashes in libopen-pal.so. (mpi_waitall() higher up the stack).

GIZMO:9185 terminated with signal 11 at PC=2b4d4b68a91d SP=7ffcfbde9ff0. Backtrace:

Questions: Am I using the OFI MTL as intended? Should there be an “ofi” BTL? Does anyone use this?


Charlie Taylor
UF Research Computing

PS - If you could use some help updating the FAQs, I’d be willing to put in some time. I’d probably learn a lot.
users mailing list
users mailing list
Charles A Taylor
2018-06-14 17:01:12 UTC
Hi Matias,

Thanks for the response.

As of a couple of hours ago we are running:


As for the provider, I saw that one but just listed “verbs”. I’ll go with the “verbs;ofi_rxm” going forward.


Hi Charles, <>

provider: verbs;ofi_rxm

mpirun -mca pml cm -mca mtl ofi -mca mtl_ofi_provider_include “verbs;ofi_rxm” 
(don’t skip the quotes)
Jeff Squyres (jsquyres) via users
2018-06-14 17:18:06 UTC
Charles --

It may have gotten lost in the middle of this thread, but the vendor-recommended way of running on InfiniBand these days is with UCX. I.e., install OpenUCX and use one of the UCX transports in Open MPI. Unless you have special requirements, you should likely give this a try and see if it works for you.

The libfabric / verbs combo *may* work, but I don't know how robust the verbs libfabric support was in the v1.5 release series.
Post by Charles A Taylor
Hi Matias,
Thanks for the response.
As for the provider, I saw that one but just listed “verbs”. I’ll go with the “verbs;ofi_rxm” going forward.
Post by Cabral, Matias A
Hi Charles,

provider: verbs;ofi_rxm

mpirun -mca pml cm -mca mtl ofi -mca mtl_ofi_provider_include “verbs;ofi_rxm” ….
(don’t skip the quotes)
users mailing list
Jeff Squyres
Charles A Taylor
2018-06-14 18:08:01 UTC
Thank you, Jeff.

The ofi MTL with the verbs provider seems to be working well at the moment. I’ll need to let it run a day or so before I know whether we can avoid the deadlocks experienced with the straight openib BTL.

I’ve also built-in UCX support so I’ll be trying that next.

Again, thanks for the response.

Oh, before I forget and I hope this doesn’t sound snarky, but how does the community find out that things like UCX and libfabric exist as well as how to use them when the FAQs on open-mpi.org <https://urldefense.proofpoint.com/v2/url?u=http-3A__open-2Dmpi.org_&d=DwIFaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=8sBODgXZKw_dNqkFqkTqbGD3_7nNlm_pat-D6AqiaC8&m=zMXFh_GHJfN48Lvx4fO5qaqagW-hClSTcej-1viq_D8&s=wvhz6s_XWmVBz7U6_SJYQEg1LkcB9UWoWumDgoaxNLg&e=> don’t have much information beyond the now ancient 1.8 series? Afterall, this is hardly your typical “mpiexec” command line

mpirun -mca pml cm -mca mtl ofi -mca mtl_ofi_provider_include “verbs;ofi_rxm ...” ,

if you get my drift. Even google doesn’t seem to know all that much about these things. I’m feeling more than a little ignorant these days. :)

Thanks to all for the responses. It has been a huge help.

Post by Jeff Squyres (jsquyres) via users
Charles --
It may have gotten lost in the middle of this thread, but the vendor-recommended way of running on InfiniBand these days is with UCX. I.e., install OpenUCX and use one of the UCX transports in Open MPI. Unless you have special requirements, you should likely give this a try and see if it works for you.
The libfabric / verbs combo *may* work, but I don't know how robust the verbs libfabric support was in the v1.5 release series.
Jeff Squyres (jsquyres) via users
2018-06-14 19:49:49 UTC
Yeah, keeping the documentation / FAQ up to date is... difficult. :-(

We could definitely use some help with that.

Does anyone have some cycles to help update our FAQ, perchance?
Post by Charles A Taylor
Thank you, Jeff.
The ofi MTL with the verbs provider seems to be working well at the moment. I’ll need to let it run a day or so before I know whether we can avoid the deadlocks experienced with the straight openib BTL.
I’ve also built-in UCX support so I’ll be trying that next.
Again, thanks for the response.
Oh, before I forget and I hope this doesn’t sound snarky, but how does the community find out that things like UCX and libfabric exist as well as how to use them when the FAQs on open-mpi.org don’t have much information beyond the now ancient 1.8 series? Afterall, this is hardly your typical “mpiexec” command line…
mpirun -mca pml cm -mca mtl ofi -mca mtl_ofi_provider_include “verbs;ofi_rxm ...” ,
if you get my drift. Even google doesn’t seem to know all that much about these things. I’m feeling more than a little ignorant these days. :)
Thanks to all for the responses. It has been a huge help.
Post by Jeff Squyres (jsquyres) via users
Charles --
It may have gotten lost in the middle of this thread, but the vendor-recommended way of running on InfiniBand these days is with UCX. I.e., install OpenUCX and use one of the UCX transports in Open MPI. Unless you have special requirements, you should likely give this a try and see if it works for you.
The libfabric / verbs combo *may* work, but I don't know how robust the verbs libfabric support was in the v1.5 release series.
users mailing list
Jeff Squyres
Cabral, Matias A
2018-06-14 20:14:50 UTC
Hey Jeff,

I will help with the OFI part.


-----Original Message-----
From: users [mailto:users-***@lists.open-mpi.org] On Behalf Of Jeff Squyres (jsquyres) via users
Sent: Thursday, June 14, 2018 12:50 PM
To: Open MPI User's List <***@lists.open-mpi.org>
Cc: Jeff Squyres (jsquyres) <***@cisco.com>
Subject: Re: [OMPI users] A couple of general questions

Yeah, keeping the documentation / FAQ up to date is... difficult. :-(

We could definitely use some help with that.

Does anyone have some cycles to help update our FAQ, perchance?
Post by Charles A Taylor
Thank you, Jeff.
The ofi MTL with the verbs provider seems to be working well at the moment. I’ll need to let it run a day or so before I know whether we can avoid the deadlocks experienced with the straight openib BTL.
I’ve also built-in UCX support so I’ll be trying that next.
Again, thanks for the response.
Oh, before I forget and I hope this doesn’t sound snarky, but how does the community find out that things like UCX and libfabric exist as well as how to use them when the FAQs on open-mpi.org don’t have much information beyond the now ancient 1.8 series? Afterall, this is hardly your typical “mpiexec” command line…
mpirun -mca pml cm -mca mtl ofi -mca mtl_ofi_provider_include “verbs;ofi_rxm ...” ,
if you get my drift. Even google doesn’t seem to know all that much about these things. I’m feeling more than a little ignorant these days. :)
Thanks to all for the responses. It has been a huge help.
Post by Jeff Squyres (jsquyres) via users
Charles --
It may have gotten lost in the middle of this thread, but the vendor-recommended way of running on InfiniBand these days is with UCX. I.e., install OpenUCX and use one of the UCX transports in Open MPI. Unless you have special requirements, you should likely give this a try and see if it works for you.
The libfabric / verbs combo *may* work, but I don't know how robust the verbs libfabric support was in the v1.5 release series.
users mailing list
Jeff Squyres

users mailing list
Jeff Squyres (jsquyres) via users
2018-06-15 13:40:18 UTC
Matias --

Sweet! PR's against ompi-www would be greatly appreciated.

I wrote this wiki page a long time ago on how to write a good FAQ entry:

Post by Cabral, Matias A
Hey Jeff,
I will help with the OFI part.
-----Original Message-----
Sent: Thursday, June 14, 2018 12:50 PM
Subject: Re: [OMPI users] A couple of general questions
Yeah, keeping the documentation / FAQ up to date is... difficult. :-(
We could definitely use some help with that.
Does anyone have some cycles to help update our FAQ, perchance?
Post by Charles A Taylor
Thank you, Jeff.
The ofi MTL with the verbs provider seems to be working well at the moment. I’ll need to let it run a day or so before I know whether we can avoid the deadlocks experienced with the straight openib BTL.
I’ve also built-in UCX support so I’ll be trying that next.
Again, thanks for the response.
Oh, before I forget and I hope this doesn’t sound snarky, but how does the community find out that things like UCX and libfabric exist as well as how to use them when the FAQs on open-mpi.org don’t have much information beyond the now ancient 1.8 series? Afterall, this is hardly your typical “mpiexec” command line…
mpirun -mca pml cm -mca mtl ofi -mca mtl_ofi_provider_include “verbs;ofi_rxm ...” ,
if you get my drift. Even google doesn’t seem to know all that much about these things. I’m feeling more than a little ignorant these days. :)
Thanks to all for the responses. It has been a huge help.
Post by Jeff Squyres (jsquyres) via users
Charles --
It may have gotten lost in the middle of this thread, but the vendor-recommended way of running on InfiniBand these days is with UCX. I.e., install OpenUCX and use one of the UCX transports in Open MPI. Unless you have special requirements, you should likely give this a try and see if it works for you.
The libfabric / verbs combo *may* work, but I don't know how robust the verbs libfabric support was in the v1.5 release series.
users mailing list
Jeff Squyres
users mailing list
Jeff Squyres
Charles A Taylor
2018-06-14 20:23:38 UTC
Hmmm. ompi_info only shows the ucx pml. I don’t see any “transports”. Will they show up somewhere or are they documented. Right now it looks like the only UCX related thing I can do with openmpi 3.1.0 is

export OMPI_MCA_pml=ucx
mpiexec ….

From ompi_info…

$ ompi_info --param all all | more | grep ucx
MCA osc: ucx (MCA v2.1.0, API v3.0.0, Component v3.1.0)
MCA pml: ucx (MCA v2.1.0, API v2.0.0, Component v3.1.0)

I’m assuming there is more to it than that.


Post by Jeff Squyres (jsquyres) via users
Charles --
It may have gotten lost in the middle of this thread, but the vendor-recommended way of running on InfiniBand these days is with UCX. I.e., install OpenUCX and use one of the UCX transports in Open MPI. Unless you have special requirements, you should likely give this a try and see if it works for you.
The libfabric / verbs combo *may* work, but I don't know how robust the verbs libfabric support was in the v1.5 release series.
Post by Charles A Taylor
Hi Matias,
Thanks for the response.
As for the provider, I saw that one but just listed “verbs”. I’ll go with the “verbs;ofi_rxm” going forward.
Post by Cabral, Matias A
Hi Charles,

provider: verbs;ofi_rxm

mpirun -mca pml cm -mca mtl ofi -mca mtl_ofi_provider_include “verbs;ofi_rxm” ….
(don’t skip the quotes)
users mailing list
Jeff Squyres
users mailing list
Pavel Shamis
2018-06-14 20:38:05 UTC
You just have to switch PML to UCX.
You have some example of the command line here:
Hmmm. ompi_info only shows the ucx pml. I don’t see any “transports”.
Will they show up somewhere or are they documented. Right now it looks
like the only UCX related thing I can do with openmpi 3.1.0 is
export OMPI_MCA_pml=ucx
From ompi_info

$ ompi_info --param all all | more | grep ucx
MCA osc: ucx (MCA v2.1.0, API v3.0.0, Component v3.1.0)
MCA pml: ucx (MCA v2.1.0, API v2.0.0, Component v3.1.0)
I’m assuming there is more to it than that.
On Jun 14, 2018, at 1:18 PM, Jeff Squyres (jsquyres) via users <
Charles --
It may have gotten lost in the middle of this thread, but the
vendor-recommended way of running on InfiniBand these days is with UCX.
I.e., install OpenUCX and use one of the UCX transports in Open MPI.
Unless you have special requirements, you should likely give this a try and
see if it works for you.
The libfabric / verbs combo *may* work, but I don't know how robust the
verbs libfabric support was in the v1.5 release series.
Post by Charles A Taylor
Hi Matias,
Thanks for the response.
As for the provider, I saw that one but just listed “verbs”. I’ll go
with the “verbs;ofi_rxm” going forward.
Post by Charles A Taylor
On Jun 14, 2018, at 12:49 PM, Cabral, Matias A <
Hi Charles,
What version of libfabric do you have installed? To run OMPI using the
verbs provider you need to pair it with the ofi_rxm provider. fi_info
Post by Charles A Taylor

provider: verbs;ofi_rxm

mpirun -mca pml cm -mca mtl ofi -mca mtl_ofi_provider_include
Post by Charles A Taylor
(don’t skip the quotes)
users mailing list
Jeff Squyres
users mailing list
users mailing list
Charles A Taylor
2018-06-14 22:01:30 UTC
Aw, sheesh. Thanks. Somehow I missed that despite being on the page - lack of focus, I guess.


Post by Pavel Shamis
You just have to switch PML to UCX.
You have some example of the command line here: https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openucx_ucx_wiki_OpenMPI-2Dand-2DOpenSHMEM-2Dinstallation-2Dwith-2DUCX&d=DwIFaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=8sBODgXZKw_dNqkFqkTqbGD3_7nNlm_pat-D6AqiaC8&m=AtjVGlnk5Sxl6o7bcEa0LnFxgfmLD0qjnMKDqvn085s&s=2BFC-oRO8l3PwqI2eZFfGzFCa4eVxg8xmlx3adKzjug&e= <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openucx_ucx_wiki_OpenMPI-2Dand-2DOpenSHMEM-2Dinstallation-2Dwith-2DUCX&d=DwMFaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=HOtXciFqK5GlgIgLAxthUQ&m=69tLaZgRut2phhUywsZ7EtczOR47D8Jb5O22ESQO_TI&s=cop4oKioc-d7X7CFVHdWTiX4p6tsnD7V-uT7JdSnIdw&e=>
Hmmm. ompi_info only shows the ucx pml. I don’t see any “transports”. Will they show up somewhere or are they documented. Right now it looks like the only UCX related thing I can do with openmpi 3.1.0 is
export OMPI_MCA_pml=ucx
From ompi_info

$ ompi_info --param all all | more | grep ucx
MCA osc: ucx (MCA v2.1.0, API v3.0.0, Component v3.1.0)
MCA pml: ucx (MCA v2.1.0, API v2.0.0, Component v3.1.0)
I’m assuming there is more to it than that.
Post by Jeff Squyres (jsquyres) via users
Charles --
It may have gotten lost in the middle of this thread, but the vendor-recommended way of running on InfiniBand these days is with UCX. I.e., install OpenUCX and use one of the UCX transports in Open MPI. Unless you have special requirements, you should likely give this a try and see if it works for you.
The libfabric / verbs combo *may* work, but I don't know how robust the verbs libfabric support was in the v1.5 release series.
Post by Charles A Taylor
Hi Matias,
Thanks for the response.
As for the provider, I saw that one but just listed “verbs”. I’ll go with the “verbs;ofi_rxm” going forward.
Post by Cabral, Matias A
Hi Charles,

provider: verbs;ofi_rxm

mpirun -mca pml cm -mca mtl ofi -mca mtl_ofi_provider_include “verbs;ofi_rxm” 
(don’t skip the quotes)
users mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.open-2Dmpi.org_mailman_listinfo_users&d=DwIGaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=HOtXciFqK5GlgIgLAxthUQ&m=6DdoqVoTIfPtbcYwMs5Kf4wAb1E-3ip44LC0DodP-qM&s=Tj45vOxdXErSAFSkD9LEyWCCMfBkS345sgPIqmLRy5c&e= <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.open-2Dmpi.org_mailman_listinfo_users&d=DwIGaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=HOtXciFqK5GlgIgLAxthUQ&m=6DdoqVoTIfPtbcYwMs5Kf4wAb1E-3ip44LC0DodP-qM&s=Tj45vOxdXErSAFSkD9LEyWCCMfBkS345sgPIqmLRy5c&e=>
Jeff Squyres
users mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.open-2Dmpi.org_mailman_listinfo_users&d=DwIGaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=HOtXciFqK5GlgIgLAxthUQ&m=6DdoqVoTIfPtbcYwMs5Kf4wAb1E-3ip44LC0DodP-qM&s=Tj45vOxdXErSAFSkD9LEyWCCMfBkS345sgPIqmLRy5c&e= <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.open-2Dmpi.org_mailman_listinfo_users&d=DwIGaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=HOtXciFqK5GlgIgLAxthUQ&m=6DdoqVoTIfPtbcYwMs5Kf4wAb1E-3ip44LC0DodP-qM&s=Tj45vOxdXErSAFSkD9LEyWCCMfBkS345sgPIqmLRy5c&e=>
users mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.open-2Dmpi.org_mailman_listinfo_users&d=DwIFaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=8sBODgXZKw_dNqkFqkTqbGD3_7nNlm_pat-D6AqiaC8&m=AtjVGlnk5Sxl6o7bcEa0LnFxgfmLD0qjnMKDqvn085s&s=obTbUGDsuj5GUQPwfuYpI53sxcmmrGIwCanNJdS2u7U&e= <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.open-2Dmpi.org_mailman_listinfo_users&d=DwMFaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=HOtXciFqK5GlgIgLAxthUQ&m=69tLaZgRut2phhUywsZ7EtczOR47D8Jb5O22ESQO_TI&s=fosrNjkTo6gAC9OJRXHUA5V2HLVeRxi9BbVD2ZjGAtQ&e=>_______________________________________________
users mailing list
Jeff Squyres (jsquyres) via users
2018-06-15 13:42:00 UTC
Post by Charles A Taylor
Hmmm. ompi_info only shows the ucx pml. I don’t see any “transports”. Will they show up somewhere or are they documented. Right now it looks like the only UCX related thing I can do with openmpi 3.1.0 is
Actually, I know that Pasha already mentioned this in a reply, but perhaps that might be a point worth clarifying somewhere on the FAQ:

PML OB1: uses BTLs.
PML CM: uses MTLs.
PML UCX: is its own, self-contained thing.
PML Yalla: is its own, self-contained thing.

Jeff Squyres
