Post by Daniele TartariniAnyway, /dev/hfi1_0 doesn't exist.
Make sure you have the hfi1 module/driver loaded.
In addition, please confirm the links are in active state on all the nodes `opainfo`
_MAC
From: users [mailto:users-***@lists.open-mpi.org] On Behalf Of Howard Pritchard
Sent: Thursday, December 08, 2016 9:23 AM
To: Open MPI Users <***@lists.open-mpi.org>
Subject: Re: [OMPI users] device failed to appear .. Connection timed out
hello Daniele,
Could you post the output from ompi_info command? I'm noticing on the RPMS that came with the rhel7.2 distro on
one of our systems that it was built to support psm2/hfi-1.
Two things, could you try running applications with
mpirun --mca pml ob1 (all the rest of your args)
and see if that works?
Second, what sort of system are you using? Is this a cluster? If it is, you may want to check whether
you have a situation where its an omnipath interconnect and you have the psm2/hfi1 packages installed
but for some reason the omnipath HCAs themselves are not active.
On one of our omnipath systems the following hfi1 related pms are installed:
hfidiags-0.8-13.x86_64
hfi1-psm-devel-0.7-244.x86_64
libhfi1verbs-0.5-16.el7.x86_64
hfi1-psm-0.7-244.x86_64
hfi1-firmware-0.9-36.noarch
hfi1-psm-compat-0.7-244.x86_64
libhfi1verbs-devel-0.5-16.el7.x86_64
hfi1-0.11.3.10.0_327.el7.x86_64-245.x86_64
hfi1-firmware_debug-0.9-36.noarc
hfi1-diagtools-sw-0.8-13.x86_64
Howard
2016-12-08 8:45 GMT-07:00 ***@open-mpi.org<mailto:***@open-mpi.org> <***@open-mpi.org<mailto:***@open-mpi.org>>:
Sounds like something didnât quite get configured right, or maybe you have a library installed that isnât quite setup correctly, or...
Regardless, we generally advise building from source to avoid such problems. Is there some reason not to just do so?
On Dec 8, 2016, at 6:16 AM, Daniele Tartarini <***@sheffield.ac.uk<mailto:***@sheffield.ac.uk>> wrote:
Hi,
I've installed on a Red Hat 7.2 the OpenMPI distributed via Yum:
openmpi-devel.x86_64 1.10.3-3.el7
any code I try to run (including the mpitests-*) I get the following message with slight variants:
my_machine.171619hfi_wait_for_device: The /dev/hfi1_0 device failed to appear after 15.0 seconds: Connection timed out
Is anyone able to help me in identifying the source of the problem?
Anyway, /dev/hfi1_0 doesn't exist.
If I use an OpenMPI version compiled from source I have no issue (gcc 4.8.5).
many thanks in advance.
cheers
Daniele
_______________________________________________
users mailing list
***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/users