[OMPI users] CentOS-7/openmpi-3.0.0/cuda-9.0.176_384.81: libopen-pal.so: undefined reference to `nvmlDeviceGetPciInfo

Tru Huynh

2017-10-11 07:03:51 UTC

Hi,

I have successfully built openmpi-3.0.0 from source with cuda 8.0.61.2 and
7.5.18 on CentOS-7 x86_64 (default system gnu compilers).
I am trying to build openmpi-3.0.0 with cuda9 on CentOS-7 and failed
with cuda9 with this error:

make[2]: Leaving directory `/c7/home/tru/build/openmpi-3.0.0/build-cuda-9.0.176_384.81/opal/mca/shmem/sysv'
Making all in tools/wrappers
make[2]: Entering directory `/c7/home/tru/build/openmpi-3.0.0/build-cuda-9.0.176_384.81/opal/tools/wrappers'
CCLD opal_wrapper
../../../opal/.libs/libopen-pal.so: undefined reference to `nvmlDeviceGetPciInfo_v3'
collect2: error: ld returned 1 exit status
make[2]: *** [opal_wrapper] Error 1
make[2]: Leaving directory `/c7/home/tru/build/openmpi-3.0.0/build-cuda-9.0.176_384.81/opal/tools/wrappers'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/c7/home/tru/build/openmpi-3.0.0/build-cuda-9.0.176_384.81/opal'
make: *** [all-recursive] Error 1

<Additionnal informations (failing builder)>
[***@manolito build-cuda-9.0.176_384.81]$ grep -r nvmlDeviceGetPciInfo_v3 $CUDA_INSTALL_PATH
Binary file /c7/shared/cuda/9.0.176_384.81/lib64/stubs/libnvidia-ml.so matches
/c7/shared/cuda/9.0.176_384.81/include/nvml.h:#define nvmlDeviceGetPciInfo nvmlDeviceGetPciInfo_v3

The desktop has a legacy card and the supporting driver does not support the cuda9,
but I would not expect that would cause such an error, but maybe?

[***@manolito build-cuda-9.0.176_384.81]$ nvidia-smi
Wed Oct 11 08:42:33 2017
+------------------------------------------------------+
| NVIDIA-SMI 340.102 Driver Version: 340.102 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce 8600 GT Off | 0000:01:00.0 N/A | N/A |
| 0% 72C P0 N/A / N/A | 3MiB / 511MiB | N/A Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+

[***@manolito build-cuda-9.0.176_384.81]$ deviceQuery
deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL
[***@manolito build-cuda-9.0.176_384.81]$ deviceQueryDrv
deviceQueryDrv Starting...

CUDA Device Query (Driver API) statically linked version
Detected 1 CUDA Capable device(s)

Device 0: "GeForce 8600 GT"
CUDA Driver Version: 6.5
CUDA Capability Major/Minor version number: 1.1
Total amount of global memory: 511 MBytes (536150016 bytes)
MapSMtoCores for SM 1.1 is undefined. Default to use 64 Cores/SM
MapSMtoCores for SM 1.1 is undefined. Default to use 64 Cores/SM
( 4) Multiprocessors, ( 64) CUDA Cores/MP: 256 CUDA Cores
GPU Max Clock rate: 1188 MHz (1.19 GHz)
Memory Clock rate: 700 Mhz
Memory Bus Width: 128-bit
Max Texture Dimension Sizes 1D=(8192) 2D=(65536, 32768) 3D=(2048, 2048, 2048)
Maximum Layered 1D Texture Size, (num) layers 1D=(8192), 512 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(8192, 8192), 512 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per multiprocessor: 768
Maximum number of threads per block: 512
Max dimension size of a thread block (x,y,z): (512, 512, 64)
Max dimension size of a grid size (x,y,z): (65535, 65535, 1)
Texture alignment: 256 bytes
Maximum memory pitch: 2147483647 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: No
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): No
cuDeviceGetAttribute returned 1
-> CUDA_ERROR_INVALID_VALUE

The nvidia driver (340.102) only support version 6.5, but no issue building
for cuda 7.5 and 8.
</Additionnal informations (failing builder)>

If I switch to a newer machine (same OS, just different card and Nvidia driver),
the build does through and check pass!

Bottom line, for cuda9(only?) one might need to build on the target machine,
not on a legacy one, of course ymmv.

Cheers

Tru

<Additionnal info (successfull builder)>
[***@borma build-cuda-9.0.176_384.81]$ deviceQueryDrv
deviceQueryDrv Starting...

CUDA Device Query (Driver API) statically linked version
Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1080 Ti"
CUDA Driver Version: 9.0
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 11172 MBytes (11714691072 bytes)
(28) Multiprocessors, (128) CUDA Cores/MP: 3584 CUDA Cores
GPU Max Clock rate: 1582 MHz (1.58 GHz)
Memory Clock rate: 5505 Mhz
Memory Bus Width: 352-bit
L2 Cache Size: 2883584 bytes
Max Texture Dimension Sizes 1D=(131072) 2D=(131072, 65536) 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Texture alignment: 512 bytes
Maximum memory pitch: 2147483647 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 6 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
Result = PASS

</Additionnal info (successfull builder)>

--
Dr Tru Huynh | mailto:***@pasteur.fr | tel/fax +33 1 45 68 87 37/19
https://research.pasteur.fr/en/team/structural-bioinformatics/
Institut Pasteur, 25-28 rue du Docteur Roux, 75724 Paris CEDEX 15 France