[OMPI users] Unable to compile OpenMPI 1.10.3 with CUDA

Discussion:

Craig tierney

2016-10-27 22:23:19 UTC

Hello,

I am trying to build OpenMPI 1.10.3 with CUDA but I am unable to build the
library that will allow me to use IPC on a node or GDR between nodes. I
have tried with with 1.10.4 and 2.0.1 and have the same problems. Here is
my build script:

---------------------------
#!/bin/bash

export OPENMPI_VERSION=1.10.3
export BASEDIR=/tmp/mpi_testing/
export CUDA_HOME=/usr/local/cuda
export PATH=$CUDA_HOME/bin/:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
export MPI_HOME=$BASEDIR/openmpi-$OPENMPI_VERSION

which nvcc
nvcc --version

tar -zxf openmpi-$OPENMPI_VERSION.tar.gz
cd openmpi-$OPENMPI_VERSION

./configure --prefix=$MPI_HOME --with-cuda=$CUDA_HOME/include > config.out
2>&1

make -j > build.out 2>&1
make install >> build.out 2>&1
-----------------------

From the docs, it appears that I should not have to set anything but
--with-cuda since my CUDA is in /usr/local/cuda. However, I appended
/usr/local/cuda/include just in case when the first way didn't work.

From the output in config.log, I see that cuda.h is not found. When the
tests are called there is no extra include flag added to specify the
/usr/local/cuda/include path.

With the resulting build, I test for CUDA and GDR with ompi_info. Results
are:

***@dgx-1:~/temp$ /tmp/mpi_testing/openmpi-1.10.3/bin/ompi_info |
grep cuda
MCA btl: smcuda (MCA v2.0.0, API v2.0.0, Component v1.10.3)
MCA coll: cuda (MCA v2.0.0, API v2.0.0, Component v1.10.3)
***@dgx-1:~/temp$ /tmp/mpi_testing/openmpi-1.10.3/bin/ompi_info |
grep gdr
***@dgx-1:~/temp$

Configure and build logs are attached.

Thanks,
Craig

Sylvain Jeaugey

2016-10-27 22:47:58 UTC

Permalink

I guess --with-cuda is disabling the default CUDA path which is
/usr/local/cuda. So you should either not set --with-cuda or set
--with-cuda $CUDA_HOME (no include).

Sylvain

Post by Craig tierney
Hello,
I am trying to build OpenMPI 1.10.3 with CUDA but I am unable to build
the library that will allow me to use IPC on a node or GDR between
nodes. I have tried with with 1.10.4 and 2.0.1 and have the same
---------------------------
#!/bin/bash
export OPENMPI_VERSION=1.10.3
export BASEDIR=/tmp/mpi_testing/
export CUDA_HOME=/usr/local/cuda
export PATH=$CUDA_HOME/bin/:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
export MPI_HOME=$BASEDIR/openmpi-$OPENMPI_VERSION
which nvcc
nvcc --version
tar -zxf openmpi-$OPENMPI_VERSION.tar.gz
cd openmpi-$OPENMPI_VERSION
./configure --prefix=$MPI_HOME --with-cuda=$CUDA_HOME/include >
config.out 2>&1
make -j > build.out 2>&1
make install >> build.out 2>&1
-----------------------
From the docs, it appears that I should not have to set anything but
--with-cuda since my CUDA is in /usr/local/cuda. However, I appended
/usr/local/cuda/include just in case when the first way didn't work.
From the output in config.log, I see that cuda.h is not found. When
the tests are called there is no extra include flag added to specify
the /usr/local/cuda/include path.
With the resulting build, I test for CUDA and GDR with ompi_info.
| grep cuda
MCA btl: smcuda (MCA v2.0.0, API v2.0.0, Component v1.10.3)
MCA coll: cuda (MCA v2.0.0, API v2.0.0, Component v1.10.3)
| grep gdr
Configure and build logs are attached.
Thanks,
Craig
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

Craig tierney

2016-10-28 17:33:28 UTC

Permalink

Sylvain,

If I do not set --with-cuda, I get:

configure:9964: result: no
configure:10023: checking whether CU_POINTER_ATTRIBUTE_SYNC_MEMOPS is
declared
configure:10023: gcc -c -DNDEBUG conftest.c >&5
conftest.c:83:19: fatal error: /cuda.h: No such file or directory
#include </cuda.h>
^

If I specify the path to cuda, the same results as before. In the
configure process, the first time cuda.h is tested it works.

configure:9843: checking if --with-cuda is set
configure:9897: result: found (/usr/local/cuda/include/cuda.h)
configure:9964: checking for struct CUipcMemHandle_st.reserved

But the next time the compile command doesn't add an include to the compile
line and the compile fails:

configure:74312: checking for CL/cl_ext.h
configure:74312: result: no
configure:74425: checking cuda.h usability
configure:74425: gcc -std=gnu99 -c -O3 -DNDEBUG conftest.c >&5
conftest.c:648:18: fatal error: cuda.h: No such file or directory
#include <cuda.h>
^
compilation terminated.
configure:74425: $? = 1

Craig

Post by Sylvain Jeaugey
I guess --with-cuda is disabling the default CUDA path which is
/usr/local/cuda. So you should either not set --with-cuda or set
--with-cuda $CUDA_HOME (no include).
Sylvain
Hello,
I am trying to build OpenMPI 1.10.3 with CUDA but I am unable to build the
library that will allow me to use IPC on a node or GDR between nodes. I
have tried with with 1.10.4 and 2.0.1 and have the same problems. Here is
---------------------------
#!/bin/bash
export OPENMPI_VERSION=1.10.3
export BASEDIR=/tmp/mpi_testing/
export CUDA_HOME=/usr/local/cuda
export PATH=$CUDA_HOME/bin/:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
export MPI_HOME=$BASEDIR/openmpi-$OPENMPI_VERSION
which nvcc
nvcc --version
tar -zxf openmpi-$OPENMPI_VERSION.tar.gz
cd openmpi-$OPENMPI_VERSION
./configure --prefix=$MPI_HOME --with-cuda=$CUDA_HOME/include > config.out
2>&1
make -j > build.out 2>&1
make install >> build.out 2>&1
-----------------------
From the docs, it appears that I should not have to set anything but
--with-cuda since my CUDA is in /usr/local/cuda. However, I appended
/usr/local/cuda/include just in case when the first way didn't work.
From the output in config.log, I see that cuda.h is not found. When the
tests are called there is no extra include flag added to specify the
/usr/local/cuda/include path.
With the resulting build, I test for CUDA and GDR with ompi_info. Results
grep cuda
MCA btl: smcuda (MCA v2.0.0, API v2.0.0, Component v1.10.3)
MCA coll: cuda (MCA v2.0.0, API v2.0.0, Component v1.10.3)
grep gdr
Configure and build logs are attached.
Thanks,
Craig
_______________________________________________
------------------------------
This email message is for the sole use of the intended recipient(s) and
may contain confidential information. Any unauthorized review, use,
disclosure or distribution is prohibited. If you are not the intended
recipient, please contact the sender by reply email and destroy all copies
of the original message.
------------------------------
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Sylvain Jeaugey

2016-10-28 17:47:43 UTC

Permalink

Post by Craig tierney
Sylvain,
configure:9964: result: no
configure:10023: checking whether CU_POINTER_ATTRIBUTE_SYNC_MEMOPS is
declared
configure:10023: gcc -c -DNDEBUG conftest.c >&5
conftest.c:83:19: fatal error: /cuda.h: No such file or directory
#include </cuda.h>
^

It looks like your environment has variables that the configure tries to
use. You should look the output of :
env | grep CUDA
and unset them.
Or you can specify --with-cuda=/usr/local/cuda to be sure.

Post by Craig tierney
If I specify the path to cuda, the same results as before. In the
configure process, the first time cuda.h is tested it works.
configure:9843: checking if --with-cuda is set
configure:9897: result: found (/usr/local/cuda/include/cuda.h)
configure:9964: checking for struct CUipcMemHandle_st.reserved

Good.

Post by Craig tierney
But the next time the compile command doesn't add an include to the
configure:74312: checking for CL/cl_ext.h
configure:74312: result: no
configure:74425: checking cuda.h usability
configure:74425: gcc -std=gnu99 -c -O3 -DNDEBUG conftest.c >&5
conftest.c:648:18: fatal error: cuda.h: No such file or directory
#include <cuda.h>
^
compilation terminated.
configure:74425: $? = 1

Is the Open MPI configure explicitely failing ? If not, is the Open MPI
compilation failing ? If it works, you should see CUDA support has been
compiled in (in ompi_info).

It seems you are fooled by the hwloc configure here : the hwloc
configure includes checks for CUDA but we don't need them in Open MPI so
they are failing, but you still get CUDA support.

In the latest version of Open MPI, there should be a report at the end
of configure explicitely stating if CUDA support has been enabled or not.

Sylvain

-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------