Noam Bernstein
2018-10-09 18:26:47 UTC
Hi - Iâm trying to get OpenMPI working on a newly configured CentOS 7 system, and Iâm not even sure what information would be useful to provide. Iâm using the CentOS built in libibverbs and/or libfabric, and I configure openmpi with just
âwith-verbs âwith-ofi âprefix=$DEST
also tried âwithout-ofi, no change. Basically, I can run with ââmca btl self,vaderâ, but if I try ââmca btl,openibâ I get an error from each process:
[compute-0-0][[24658,1],5][connect/btl_openib_connect_udcm.c:1245:udcm_rc_qp_to_rtr] error modifing QP to RTR errno says Invalid argument
If I donât specify the btl it appears to try to set up openib with the same errors, then crashes on some free() related segfault, presumably when it tries to actually use vader.
The machine seems to be able to see its IB interface, as reported by things like ibstatus or ibv_devinfo. Iâm not sure what else to look for. I also confirmed that âulimit -lâ reports unlimited.
Does anyone have any suggestions as to how to diagnose this issue?
thanks,
Noam
âwith-verbs âwith-ofi âprefix=$DEST
also tried âwithout-ofi, no change. Basically, I can run with ââmca btl self,vaderâ, but if I try ââmca btl,openibâ I get an error from each process:
[compute-0-0][[24658,1],5][connect/btl_openib_connect_udcm.c:1245:udcm_rc_qp_to_rtr] error modifing QP to RTR errno says Invalid argument
If I donât specify the btl it appears to try to set up openib with the same errors, then crashes on some free() related segfault, presumably when it tries to actually use vader.
The machine seems to be able to see its IB interface, as reported by things like ibstatus or ibv_devinfo. Iâm not sure what else to look for. I also confirmed that âulimit -lâ reports unlimited.
Does anyone have any suggestions as to how to diagnose this issue?
thanks,
Noam