Siegmar Gross
2017-03-21 07:38:17 UTC
Hi,
I have installed openmpi-2.1.0rc4 on my "SUSE Linux Enterprise Server
12.2 (x86_64)" with Sun C 5.14 and gcc-6.3.0. Sometimes I get once
more a warning about a missing item for one of my small programs (it
doesn't matter if I use my cc or gcc version). My gcc version also
displays the message "NVIDIA: no NVIDIA devices found" for the server
without NVIDIA devices (I don't get the message for my cc version).
I used the following commands to build the package (${SYSTEM_ENV}
is Linux and ${MACHINE_ENV} is x86_64).
mkdir openmpi-2.1.0rc4-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc
cd openmpi-2.1.0rc4-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc
../openmpi-2.1.0rc4/configure \
--prefix=/usr/local/openmpi-2.1.0_64_cc \
--libdir=/usr/local/openmpi-2.1.0_64_cc/lib64 \
--with-jdk-bindir=/usr/local/jdk1.8.0_66/bin \
--with-jdk-headers=/usr/local/jdk1.8.0_66/include \
JAVA_HOME=/usr/local/jdk1.8.0_66 \
LDFLAGS="-m64 -mt -Wl,-z -Wl,noexecstack -L/usr/local/lib64 -L/usr/local/cuda/
lib64" \
CC="cc" CXX="CC" FC="f95" \
CFLAGS="-m64 -mt -I/usr/local/include -I/usr/local/cuda/include" \
CXXFLAGS="-m64 -I/usr/local/include -I/usr/local/cuda/include" \
FCFLAGS="-m64" \
CPP="cpp -I/usr/local/include -I/usr/local/cuda/include" \
CXXCPP="cpp -I/usr/local/include -I/usr/local/cuda/include" \
--enable-mpi-cxx \
--enable-cxx-exceptions \
--enable-mpi-java \
--with-cuda=/usr/local/cuda \
--with-valgrind=/usr/local/valgrind \
--enable-mpi-thread-multiple \
--with-hwloc=internal \
--without-verbs \
--with-wrapper-cflags="-m64 -mt" \
--with-wrapper-cxxflags="-m64" \
--with-wrapper-fcflags="-m64" \
--with-wrapper-ldflags="-mt" \
--enable-debug \
|& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc
make |& tee log.make.$SYSTEM_ENV.$MACHINE_ENV.64_cc
rm -r /usr/local/openmpi-2.1.0_64_cc.old
mv /usr/local/openmpi-2.1.0_64_cc /usr/local/openmpi-2.1.0_64_cc.old
make install |& tee log.make-install.$SYSTEM_ENV.$MACHINE_ENV.64_cc
make check |& tee log.make-check.$SYSTEM_ENV.$MACHINE_ENV.64_cc
Sometimes everything works as expected.
loki spawn 144 mpiexec -np 1 --host loki,nfs1,nfs2 spawn_intra_comm
Parent process 0: I create 2 slave processes
Parent process 0 running on loki
MPI_COMM_WORLD ntasks: 1
COMM_CHILD_PROCESSES ntasks_local: 1
COMM_CHILD_PROCESSES ntasks_remote: 2
COMM_ALL_PROCESSES ntasks: 3
mytid in COMM_ALL_PROCESSES: 0
Child process 0 running on nfs1
MPI_COMM_WORLD ntasks: 2
COMM_ALL_PROCESSES ntasks: 3
mytid in COMM_ALL_PROCESSES: 1
Child process 1 running on nfs2
MPI_COMM_WORLD ntasks: 2
COMM_ALL_PROCESSES ntasks: 3
mytid in COMM_ALL_PROCESSES: 2
More often I get a warning.
loki spawn 144 mpiexec -np 1 --host loki,nfs1,nfs2 spawn_intra_comm
Parent process 0: I create 2 slave processes
Parent process 0 running on loki
MPI_COMM_WORLD ntasks: 1
COMM_CHILD_PROCESSES ntasks_local: 1
COMM_CHILD_PROCESSES ntasks_remote: 2
COMM_ALL_PROCESSES ntasks: 3
mytid in COMM_ALL_PROCESSES: 0
Child process 0 running on nfs1
MPI_COMM_WORLD ntasks: 2
COMM_ALL_PROCESSES ntasks: 3
Child process 1 running on nfs2
MPI_COMM_WORLD ntasks: 2
COMM_ALL_PROCESSES ntasks: 3
mytid in COMM_ALL_PROCESSES: 2
mytid in COMM_ALL_PROCESSES: 1
Warning :: opal_list_remove_item - the item 0x25a76f0 is not on the list 0x7f96db515998
loki spawn 144
I would be grateful, if somebody can fix the problem. Do you need anything
else? Thank you very much for any help in advance.
Kind regards
Siegmar
I have installed openmpi-2.1.0rc4 on my "SUSE Linux Enterprise Server
12.2 (x86_64)" with Sun C 5.14 and gcc-6.3.0. Sometimes I get once
more a warning about a missing item for one of my small programs (it
doesn't matter if I use my cc or gcc version). My gcc version also
displays the message "NVIDIA: no NVIDIA devices found" for the server
without NVIDIA devices (I don't get the message for my cc version).
I used the following commands to build the package (${SYSTEM_ENV}
is Linux and ${MACHINE_ENV} is x86_64).
mkdir openmpi-2.1.0rc4-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc
cd openmpi-2.1.0rc4-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc
../openmpi-2.1.0rc4/configure \
--prefix=/usr/local/openmpi-2.1.0_64_cc \
--libdir=/usr/local/openmpi-2.1.0_64_cc/lib64 \
--with-jdk-bindir=/usr/local/jdk1.8.0_66/bin \
--with-jdk-headers=/usr/local/jdk1.8.0_66/include \
JAVA_HOME=/usr/local/jdk1.8.0_66 \
LDFLAGS="-m64 -mt -Wl,-z -Wl,noexecstack -L/usr/local/lib64 -L/usr/local/cuda/
lib64" \
CC="cc" CXX="CC" FC="f95" \
CFLAGS="-m64 -mt -I/usr/local/include -I/usr/local/cuda/include" \
CXXFLAGS="-m64 -I/usr/local/include -I/usr/local/cuda/include" \
FCFLAGS="-m64" \
CPP="cpp -I/usr/local/include -I/usr/local/cuda/include" \
CXXCPP="cpp -I/usr/local/include -I/usr/local/cuda/include" \
--enable-mpi-cxx \
--enable-cxx-exceptions \
--enable-mpi-java \
--with-cuda=/usr/local/cuda \
--with-valgrind=/usr/local/valgrind \
--enable-mpi-thread-multiple \
--with-hwloc=internal \
--without-verbs \
--with-wrapper-cflags="-m64 -mt" \
--with-wrapper-cxxflags="-m64" \
--with-wrapper-fcflags="-m64" \
--with-wrapper-ldflags="-mt" \
--enable-debug \
|& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc
make |& tee log.make.$SYSTEM_ENV.$MACHINE_ENV.64_cc
rm -r /usr/local/openmpi-2.1.0_64_cc.old
mv /usr/local/openmpi-2.1.0_64_cc /usr/local/openmpi-2.1.0_64_cc.old
make install |& tee log.make-install.$SYSTEM_ENV.$MACHINE_ENV.64_cc
make check |& tee log.make-check.$SYSTEM_ENV.$MACHINE_ENV.64_cc
Sometimes everything works as expected.
loki spawn 144 mpiexec -np 1 --host loki,nfs1,nfs2 spawn_intra_comm
Parent process 0: I create 2 slave processes
Parent process 0 running on loki
MPI_COMM_WORLD ntasks: 1
COMM_CHILD_PROCESSES ntasks_local: 1
COMM_CHILD_PROCESSES ntasks_remote: 2
COMM_ALL_PROCESSES ntasks: 3
mytid in COMM_ALL_PROCESSES: 0
Child process 0 running on nfs1
MPI_COMM_WORLD ntasks: 2
COMM_ALL_PROCESSES ntasks: 3
mytid in COMM_ALL_PROCESSES: 1
Child process 1 running on nfs2
MPI_COMM_WORLD ntasks: 2
COMM_ALL_PROCESSES ntasks: 3
mytid in COMM_ALL_PROCESSES: 2
More often I get a warning.
loki spawn 144 mpiexec -np 1 --host loki,nfs1,nfs2 spawn_intra_comm
Parent process 0: I create 2 slave processes
Parent process 0 running on loki
MPI_COMM_WORLD ntasks: 1
COMM_CHILD_PROCESSES ntasks_local: 1
COMM_CHILD_PROCESSES ntasks_remote: 2
COMM_ALL_PROCESSES ntasks: 3
mytid in COMM_ALL_PROCESSES: 0
Child process 0 running on nfs1
MPI_COMM_WORLD ntasks: 2
COMM_ALL_PROCESSES ntasks: 3
Child process 1 running on nfs2
MPI_COMM_WORLD ntasks: 2
COMM_ALL_PROCESSES ntasks: 3
mytid in COMM_ALL_PROCESSES: 2
mytid in COMM_ALL_PROCESSES: 1
Warning :: opal_list_remove_item - the item 0x25a76f0 is not on the list 0x7f96db515998
loki spawn 144
I would be grateful, if somebody can fix the problem. Do you need anything
else? Thank you very much for any help in advance.
Kind regards
Siegmar