Pharthiphan Asokan
2018-03-13 15:30:24 UTC
Hi,
I'm using Mellanox 56G FDR with SRIOV on KVM virtualization, and I want to use the RDMA to communicate between VM with FDR Virtual Function.
* Operating system/version: CentsOS 7.3
* Computer hardware: KVM Virtualization
* Network type: 56G FDR -- Virtual Function
* OpenMPI Version - Open MPI
Build Openmpi
wget https://www.open-mpi.org/software/ompi/v3.0/downloads/openmpi-3.0.0.tar.gz
tar -zxf openmpi-3.0.0.tar.gz
mv openmpi-3.0.0 openmpi-3.0.0-src
mkdir openmpi-3.0.0
./configure --prefix=/mnt/lustre_client/pasokan/openmpi-3.0.0/openmpi-3.0.0
make all install
Exporting OpenMPI Variables
# export PATH=/mnt/lustre_client/pasokan/openmpi-3.0.0/openmpi-3.0.0/bin:$PATH
# export LD_LIBRARY_PATH=/mnt/lustre_client/pasokan/openmpi-3.0.0/openmpi-3.0.0/lib:$LD_LIBRARY_PATH
# export INCLUDE=/mnt/lustre_client/pasokan/openmpi-3.0.0/openmpi-3.0.0/include:$INCLUDE
# which mpirun
/mnt/lustre_client/pasokan/openmpi-3.0.0/openmpi-3.0.0/bin/mpirun
# which mpicc
/mnt/lustre_client/pasokan/openmpi-3.0.0/openmpi-3.0.0/bin/mpicc
# cd /mnt/lustre_client/pasokan/
# mpicc /mnt/lustre_client/pasokan/mpi_hello_world.c
# ./a.out
--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:
Local host: vcn03
Device name: mlx5_0
Device vendor ID: 0x02c9
Device vendor part ID: 4114
Default device parameters will be used, which may result in lower
performance. You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.
NOTE: You can turn off this warning by setting the MCA parameter
btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
[vcn03][[34710,1],0][connect/btl_openib_connect_udcm.c:1235:udcm_rc_qp_to_rtr] error modifing QP to RTR errno says Invalid argument
Hello world from processor vcn03, rank 0 out of 1 processors
#
Compiling IOR using Openmpi on SR-IOV
[***@vcn03 IOR-July12]# cd src/C
[***@vcn03 C]# gmake posix mpiio
mpicc -o IOR IOR.o utilities.o parse_options.o \
aiori-POSIX.o aiori-noMPIIO.o aiori-noHDF5.o aiori-noNCMPI.o \
-lm
mpicc -o IOR IOR.o utilities.o parse_options.o \
aiori-POSIX.o aiori-MPIIO.o aiori-noHDF5.o aiori-noNCMPI.o \
-lm
[***@vcn03 C]# ./IOR
--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:
Local host: vcn03
Device name: mlx5_0
Device vendor ID: 0x02c9
Device vendor part ID: 4114
Default device parameters will be used, which may result in lower
performance. You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.
NOTE: You can turn off this warning by setting the MCA parameter
btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
[vcn03][[34753,1],0][connect/btl_openib_connect_udcm.c:1235:udcm_rc_qp_to_rtr] error modifing QP to RTR errno says Invalid argument
Segmentation fault
[***@vcn03 C]#
[***@vcn03 C]# mpirun --allow-run-as-root -np 2 -host vcn03,vcn04 hostname
bash: orted: command not found
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:
* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default
* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.
* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.
* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.
* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
Passwordless SSH between two systems are configured
[***@vcn03 C]# ssh vcn04
Last login: Mon Mar 12 12:03:42 2018 from vcn03
[***@vcn04 ~]# ssh vcn03
Last login: Tue Mar 13 01:56:46 2018 from pime6-01.ime.md.ddn.com
[***@vcn03 ~]#
Please help, need the procedure to build OpenMPI to support FDR over SR-IOV + KVM
I'm using Mellanox 56G FDR with SRIOV on KVM virtualization, and I want to use the RDMA to communicate between VM with FDR Virtual Function.
* Operating system/version: CentsOS 7.3
* Computer hardware: KVM Virtualization
* Network type: 56G FDR -- Virtual Function
* OpenMPI Version - Open MPI
Build Openmpi
wget https://www.open-mpi.org/software/ompi/v3.0/downloads/openmpi-3.0.0.tar.gz
tar -zxf openmpi-3.0.0.tar.gz
mv openmpi-3.0.0 openmpi-3.0.0-src
mkdir openmpi-3.0.0
./configure --prefix=/mnt/lustre_client/pasokan/openmpi-3.0.0/openmpi-3.0.0
make all install
Exporting OpenMPI Variables
# export PATH=/mnt/lustre_client/pasokan/openmpi-3.0.0/openmpi-3.0.0/bin:$PATH
# export LD_LIBRARY_PATH=/mnt/lustre_client/pasokan/openmpi-3.0.0/openmpi-3.0.0/lib:$LD_LIBRARY_PATH
# export INCLUDE=/mnt/lustre_client/pasokan/openmpi-3.0.0/openmpi-3.0.0/include:$INCLUDE
# which mpirun
/mnt/lustre_client/pasokan/openmpi-3.0.0/openmpi-3.0.0/bin/mpirun
# which mpicc
/mnt/lustre_client/pasokan/openmpi-3.0.0/openmpi-3.0.0/bin/mpicc
# cd /mnt/lustre_client/pasokan/
# mpicc /mnt/lustre_client/pasokan/mpi_hello_world.c
# ./a.out
--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:
Local host: vcn03
Device name: mlx5_0
Device vendor ID: 0x02c9
Device vendor part ID: 4114
Default device parameters will be used, which may result in lower
performance. You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.
NOTE: You can turn off this warning by setting the MCA parameter
btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
[vcn03][[34710,1],0][connect/btl_openib_connect_udcm.c:1235:udcm_rc_qp_to_rtr] error modifing QP to RTR errno says Invalid argument
Hello world from processor vcn03, rank 0 out of 1 processors
#
Compiling IOR using Openmpi on SR-IOV
[***@vcn03 IOR-July12]# cd src/C
[***@vcn03 C]# gmake posix mpiio
mpicc -o IOR IOR.o utilities.o parse_options.o \
aiori-POSIX.o aiori-noMPIIO.o aiori-noHDF5.o aiori-noNCMPI.o \
-lm
mpicc -o IOR IOR.o utilities.o parse_options.o \
aiori-POSIX.o aiori-MPIIO.o aiori-noHDF5.o aiori-noNCMPI.o \
-lm
[***@vcn03 C]# ./IOR
--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:
Local host: vcn03
Device name: mlx5_0
Device vendor ID: 0x02c9
Device vendor part ID: 4114
Default device parameters will be used, which may result in lower
performance. You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.
NOTE: You can turn off this warning by setting the MCA parameter
btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
[vcn03][[34753,1],0][connect/btl_openib_connect_udcm.c:1235:udcm_rc_qp_to_rtr] error modifing QP to RTR errno says Invalid argument
Segmentation fault
[***@vcn03 C]#
[***@vcn03 C]# mpirun --allow-run-as-root -np 2 -host vcn03,vcn04 hostname
bash: orted: command not found
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:
* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default
* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.
* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.
* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.
* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
Passwordless SSH between two systems are configured
[***@vcn03 C]# ssh vcn04
Last login: Mon Mar 12 12:03:42 2018 from vcn03
[***@vcn04 ~]# ssh vcn03
Last login: Tue Mar 13 01:56:46 2018 from pime6-01.ime.md.ddn.com
[***@vcn03 ~]#
Please help, need the procedure to build OpenMPI to support FDR over SR-IOV + KVM