[OMPI users] Problem building OpenMPI with CUDA 8.0

Discussion:

Justin Luitjens

2016-10-18 15:53:01 UTC

I have the release version of CUDA 8.0 installed and am trying to build OpenMPI.

Here is my configure and build line:

./configure --prefix=$PREFIXPATH --with-cuda=$CUDA_HOME --with-tm= --with-openib= && make && sudo make install

Where CUDA_HOME points to the cuda install path.

When I run the above command it builds for quite a while but eventually errors out wit this:

make[2]: Entering directory `/home/jluitjens/Perforce/jluitjens_dtlogin_p4sw/sw/devrel/DevtechCompute/Internal/Tools/dtlogin/scripts/mpi/openmpi-1.10.1-gcc5.0_2014_11-cuda8.0/opal/tools/wrappers'
CCLD opal_wrapper
../../../opal/.libs/libopen-pal.so: undefined reference to `nvmlInit_v2'
../../../opal/.libs/libopen-pal.so: undefined reference to `nvmlDeviceGetHandleByIndex_v2'
../../../opal/.libs/libopen-pal.so: undefined reference to `nvmlDeviceGetCount_v2'

Any idea what I might need to change to get around this error?

Thanks,
Justin

-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

Justin Luitjens

2016-10-18 18:26:46 UTC

Permalink

After looking into this a bit more it appears that the issue is I am building on a head node which does not have the driver installed. Building on back node resolves this issue. In CUDA 8.0 the NVML stubs can be found in the toolkit at the following path: ${CUDA_HOME}/lib64/stubs

For 8.0 I'd suggest updating the configure/make scripts to look for nvml there and link in the stubs. This way the build is not dependent on the driver being installed and only the toolkit.

Thanks,
Justin

From: users [mailto:users-***@lists.open-mpi.org] On Behalf Of Justin Luitjens
Sent: Tuesday, October 18, 2016 9:53 AM
To: ***@lists.open-mpi.org
Subject: [OMPI users] Problem building OpenMPI with CUDA 8.0

I have the release version of CUDA 8.0 installed and am trying to build OpenMPI.

Here is my configure and build line:

./configure --prefix=$PREFIXPATH --with-cuda=$CUDA_HOME --with-tm= --with-openib= && make && sudo make install

Where CUDA_HOME points to the cuda install path.

When I run the above command it builds for quite a while but eventually errors out wit this:

make[2]: Entering directory `/home/jluitjens/Perforce/jluitjens_dtlogin_p4sw/sw/devrel/DevtechCompute/Internal/Tools/dtlogin/scripts/mpi/openmpi-1.10.1-gcc5.0_2014_11-cuda8.0/opal/tools/wrappers'
CCLD opal_wrapper
../../../opal/.libs/libopen-pal.so: undefined reference to `nvmlInit_v2'
../../../opal/.libs/libopen-pal.so: undefined reference to `nvmlDeviceGetHandleByIndex_v2'
../../../opal/.libs/libopen-pal.so: undefined reference to `nvmlDeviceGetCount_v2'

Any idea what I might need to change to get around this error?

Thanks,
Justin
________________________________
This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________

Jeff Squyres (jsquyres)

2016-10-19 15:49:19 UTC

Permalink

Justin --

Fair point. Can you work with Sylvain Jeaugey (at Nvidia) to submit a pull request for this functionality?

Thanks.

Post by Justin Luitjens
After looking into this a bit more it appears that the issue is I am building on a head node which does not have the driver installed. Building on back node resolves this issue. In CUDA 8.0 the NVML stubs can be found in the toolkit at the following path: ${CUDA_HOME}/lib64/stubs
For 8.0 I’d suggest updating the configure/make scripts to look for nvml there and link in the stubs. This way the build is not dependent on the driver being installed and only the toolkit.
Thanks,
Justin
Sent: Tuesday, October 18, 2016 9:53 AM
Subject: [OMPI users] Problem building OpenMPI with CUDA 8.0
I have the release version of CUDA 8.0 installed and am trying to build OpenMPI.
./configure --prefix=$PREFIXPATH --with-cuda=$CUDA_HOME --with-tm= --with-openib= && make && sudo make install
Where CUDA_HOME points to the cuda install path.
make[2]: Entering directory `/home/jluitjens/Perforce/jluitjens_dtlogin_p4sw/sw/devrel/DevtechCompute/Internal/Tools/dtlogin/scripts/mpi/openmpi-1.10.1-gcc5.0_2014_11-cuda8.0/opal/tools/wrappers'
CCLD opal_wrapper
../../../opal/.libs/libopen-pal.so: undefined reference to `nvmlInit_v2'
../../../opal/.libs/libopen-pal.so: undefined reference to `nvmlDeviceGetHandleByIndex_v2'
../../../opal/.libs/libopen-pal.so: undefined reference to `nvmlDeviceGetCount_v2'
Any idea what I might need to change to get around this error?
Thanks,
Justin
This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

--
Jeff Squyres
***@cisco.com
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/

Gilles Gouaillardet

2016-10-24 00:12:04 UTC

Permalink

Justin,

iirc, NVML is only used by hwloc (e.g. not by CUDA) and there is no real
benefit for having that.

as a workaround, you can

export enable_nvml=no

and then configure && make install

Cheers,

Gilles

Post by Jeff Squyres (jsquyres)
Justin --
Fair point. Can you work with Sylvain Jeaugey (at Nvidia) to submit a pull request for this functionality?
Thanks.

Brice Goglin

2016-10-24 07:45:37 UTC

Permalink

FWIW, I am still open to implementing something to workaround this in hwloc.
Could be shell variable such as HWLOC_DISABLE_NVML=yes for all our major
configured dependencies.

Brice

Post by Gilles Gouaillardet
Justin,
iirc, NVML is only used by hwloc (e.g. not by CUDA) and there is no
real benefit for having that.
as a workaround, you can
export enable_nvml=no
and then configure && make install
Cheers,
Gilles

Post by Jeff Squyres (jsquyres)
Justin --
Fair point. Can you work with Sylvain Jeaugey (at Nvidia) to submit
a pull request for this functionality?
Thanks.

Post by Justin Luitjens
After looking into this a bit more it appears that the issue is I am
building on a head node which does not have the driver installed.
Building on back node resolves this issue. In CUDA 8.0 the NVML
${CUDA_HOME}/lib64/stubs
For 8.0 I’d suggest updating the configure/make scripts to look
for nvml there and link in the stubs. This way the build is not
dependent on the driver being installed and only the toolkit.
Thanks,
Justin
Sent: Tuesday, October 18, 2016 9:53 AM
Subject: [OMPI users] Problem building OpenMPI with CUDA 8.0
I have the release version of CUDA 8.0 installed and am trying to build OpenMPI.
./configure --prefix=$PREFIXPATH --with-cuda=$CUDA_HOME --with-tm=
--with-openib= && make && sudo make install
Where CUDA_HOME points to the cuda install path.
When I run the above command it builds for quite a while but
make[2]: Entering directory
`/home/jluitjens/Perforce/jluitjens_dtlogin_p4sw/sw/devrel/DevtechCompute/Internal/Tools/dtlogin/scripts/mpi/openmpi-1.10.1-gcc5.0_2014_11-cuda8.0/opal/tools/wrappers'
CCLD opal_wrapper
../../../opal/.libs/libopen-pal.so: undefined reference to
`nvmlInit_v2'
../../../opal/.libs/libopen-pal.so: undefined reference to
`nvmlDeviceGetHandleByIndex_v2'
../../../opal/.libs/libopen-pal.so: undefined reference to
`nvmlDeviceGetCount_v2'
Any idea what I might need to change to get around this error?
Thanks,
Justin
This email message is for the sole use of the intended recipient(s)
and may contain confidential information. Any unauthorized review,
use, disclosure or distribution is prohibited. If you are not the
intended recipient, please contact the sender by reply email and
destroy all copies of the original message.
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Gilles Gouaillardet

2016-10-24 08:12:00 UTC

Permalink

Brice,

unless you want to enable/disable nvml at runtime, and assuming we do
not need nvml in Open MPI,

and IMHO, the easiest workaround is to update

https://github.com/open-mpi/ompi/blob/master/opal/mca/hwloc/hwloc1113/configure.m4

and add the oneliner

enable_nvml=no

a better option could be to update
https://github.com/open-mpi/ompi/blob/master/opal/mca/hwloc/configure.m4

and pass the --enable-nvml option from Open MPI down to hwloc.

Cheers,

Gilles

Post by Brice Goglin
FWIW, I am still open to implementing something to workaround this in hwloc.
Could be shell variable such as HWLOC_DISABLE_NVML=yes for all our major
configured dependencies.
Brice

Post by Jeff Squyres (jsquyres)
Justin --
Fair point. Can you work with Sylvain Jeaugey (at Nvidia) to submit
a pull request for this functionality?
Thanks.

Post by Justin Luitjens
After looking into this a bit more it appears that the issue is I am
building on a head node which does not have the driver installed.
Building on back node resolves this issue. In CUDA 8.0 the NVML
${CUDA_HOME}/lib64/stubs
For 8.0 I’d suggest updating the configure/make scripts to look
for nvml there and link in the stubs. This way the build is not
dependent on the driver being installed and only the toolkit.
Thanks,
Justin
Sent: Tuesday, October 18, 2016 9:53 AM
Subject: [OMPI users] Problem building OpenMPI with CUDA 8.0
I have the release version of CUDA 8.0 installed and am trying to build OpenMPI.
./configure --prefix=$PREFIXPATH --with-cuda=$CUDA_HOME --with-tm=
--with-openib= && make && sudo make install
Where CUDA_HOME points to the cuda install path.
When I run the above command it builds for quite a while but
make[2]: Entering directory
`/home/jluitjens/Perforce/jluitjens_dtlogin_p4sw/sw/devrel/DevtechCompute/Internal/Tools/dtlogin/scripts/mpi/openmpi-1.10.1-gcc5.0_2014_11-cuda8.0/opal/tools/wrappers'
CCLD opal_wrapper
../../../opal/.libs/libopen-pal.so: undefined reference to
`nvmlInit_v2'
../../../opal/.libs/libopen-pal.so: undefined reference to
`nvmlDeviceGetHandleByIndex_v2'
../../../opal/.libs/libopen-pal.so: undefined reference to
`nvmlDeviceGetCount_v2'
Any idea what I might need to change to get around this error?
Thanks,
Justin
This email message is for the sole use of the intended recipient(s)
and may contain confidential information. Any unauthorized review,
use, disclosure or distribution is prohibited. If you are not the
intended recipient, please contact the sender by reply email and
destroy all copies of the original message.
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users