Bennet Fauber
2017-11-16 15:34:27 UTC
I think that OpenMPI is supposed to support SLURM integration such that
srun ./hello-mpi
should work? I built OMPI 2.1.2 with
export CONFIGURE_FLAGS='--disable-dlopen --enable-shared'
export COMPILERS='CC=gcc CXX=g++ FC=gfortran F77=gfortran'
CMD="./configure \
--prefix=${PREFIX} \
--mandir=${PREFIX}/share/man \
--with-slurm \
--with-pmi \
--with-lustre \
--with-verbs \
$CONFIGURE_FLAGS \
$COMPILERS
I have a simple hello-mpi.c (source included below), which compiles
and runs with mpirun, both on the login node and in a job. However,
when I try to use srun in place of mpirun, I get instead a hung job,
which upon cancellation produces this output.
[bn2.stage.arc-ts.umich.edu:116377] PMI_Init [pmix_s1.c:162:s1_init]:
PMI is not initialized
[bn1.stage.arc-ts.umich.edu:36866] PMI_Init [pmix_s1.c:162:s1_init]:
PMI is not initialized
[warn] opal_libevent2022_event_active: event has no event_base set.
[warn] opal_libevent2022_event_active: event has no event_base set.
slurmstepd: error: *** STEP 86.0 ON bn1 CANCELLED AT 2017-11-16T10:03:24 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
slurmstepd: error: *** JOB 86 ON bn1 CANCELLED AT 2017-11-16T10:03:24 ***
The SLURM web page suggests that OMPI 2.x and later support PMIx, and
to use `srun --mpi=pimx`, however that no longer seems to be an
option, and using the `openmpi` type isn't working (neither is pmi2).
[***@beta-build hello]$ srun --mpi=list
srun: MPI types are...
srun: mpi/pmi2
srun: mpi/lam
srun: mpi/openmpi
srun: mpi/mpich1_shmem
srun: mpi/none
srun: mpi/mvapich
srun: mpi/mpich1_p4
srun: mpi/mpichgm
srun: mpi/mpichmx
To get the Intel PMI to work with srun, I have to set
I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so
Is there a comparable environment variable that must be set to enable
`srun` to work?
Am I missing a build option or misspecifying one?
-- bennet
Source of hello-mpi.c
==========================================
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
int main(int argc, char **argv){
int rank; /* rank of process */
int numprocs; /* size of COMM_WORLD */
int namelen;
int tag=10; /* expected tag */
int message; /* Recv'd message */
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Status status; /* status of recv */
/* call Init, size, and rank */
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);
printf("Process %d on %s out of %d\n", rank, processor_name, numprocs);
if(rank != 0){
MPI_Recv(&message, /*buffer for message */
1, /*MAX count to recv */
MPI_INT, /*type to recv */
0, /*recv from 0 only */
tag, /*tag of messgae */
MPI_COMM_WORLD, /*communicator to use */
&status); /*status object */
printf("Hello from process %d!\n",rank);
}
else{
/* rank 0 ONLY executes this */
printf("MPI_COMM_WORLD is %d processes big!\n", numprocs);
int x;
for(x=1; x<numprocs; x++){
MPI_Send(&x, /*send x to process x */
1, /*number to send */
MPI_INT, /*type to send */
x, /*rank to send to */
tag, /*tag for message */
MPI_COMM_WORLD); /*communicator to use */
}
} /* end else */
/* always call at end */
MPI_Finalize();
return 0;
}
srun ./hello-mpi
should work? I built OMPI 2.1.2 with
export CONFIGURE_FLAGS='--disable-dlopen --enable-shared'
export COMPILERS='CC=gcc CXX=g++ FC=gfortran F77=gfortran'
CMD="./configure \
--prefix=${PREFIX} \
--mandir=${PREFIX}/share/man \
--with-slurm \
--with-pmi \
--with-lustre \
--with-verbs \
$CONFIGURE_FLAGS \
$COMPILERS
I have a simple hello-mpi.c (source included below), which compiles
and runs with mpirun, both on the login node and in a job. However,
when I try to use srun in place of mpirun, I get instead a hung job,
which upon cancellation produces this output.
[bn2.stage.arc-ts.umich.edu:116377] PMI_Init [pmix_s1.c:162:s1_init]:
PMI is not initialized
[bn1.stage.arc-ts.umich.edu:36866] PMI_Init [pmix_s1.c:162:s1_init]:
PMI is not initialized
[warn] opal_libevent2022_event_active: event has no event_base set.
[warn] opal_libevent2022_event_active: event has no event_base set.
slurmstepd: error: *** STEP 86.0 ON bn1 CANCELLED AT 2017-11-16T10:03:24 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
slurmstepd: error: *** JOB 86 ON bn1 CANCELLED AT 2017-11-16T10:03:24 ***
The SLURM web page suggests that OMPI 2.x and later support PMIx, and
to use `srun --mpi=pimx`, however that no longer seems to be an
option, and using the `openmpi` type isn't working (neither is pmi2).
[***@beta-build hello]$ srun --mpi=list
srun: MPI types are...
srun: mpi/pmi2
srun: mpi/lam
srun: mpi/openmpi
srun: mpi/mpich1_shmem
srun: mpi/none
srun: mpi/mvapich
srun: mpi/mpich1_p4
srun: mpi/mpichgm
srun: mpi/mpichmx
To get the Intel PMI to work with srun, I have to set
I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so
Is there a comparable environment variable that must be set to enable
`srun` to work?
Am I missing a build option or misspecifying one?
-- bennet
Source of hello-mpi.c
==========================================
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
int main(int argc, char **argv){
int rank; /* rank of process */
int numprocs; /* size of COMM_WORLD */
int namelen;
int tag=10; /* expected tag */
int message; /* Recv'd message */
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Status status; /* status of recv */
/* call Init, size, and rank */
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);
printf("Process %d on %s out of %d\n", rank, processor_name, numprocs);
if(rank != 0){
MPI_Recv(&message, /*buffer for message */
1, /*MAX count to recv */
MPI_INT, /*type to recv */
0, /*recv from 0 only */
tag, /*tag of messgae */
MPI_COMM_WORLD, /*communicator to use */
&status); /*status object */
printf("Hello from process %d!\n",rank);
}
else{
/* rank 0 ONLY executes this */
printf("MPI_COMM_WORLD is %d processes big!\n", numprocs);
int x;
for(x=1; x<numprocs; x++){
MPI_Send(&x, /*send x to process x */
1, /*number to send */
MPI_INT, /*type to send */
x, /*rank to send to */
tag, /*tag for message */
MPI_COMM_WORLD); /*communicator to use */
}
} /* end else */
/* always call at end */
MPI_Finalize();
return 0;
}