[OMPI users] OMPI 2.1.2 and SLURM compatibility
Bennet Fauber
2017-11-16 15:34:27 UTC
I think that OpenMPI is supposed to support SLURM integration such that

srun ./hello-mpi

should work? I built OMPI 2.1.2 with

export CONFIGURE_FLAGS='--disable-dlopen --enable-shared'
export COMPILERS='CC=gcc CXX=g++ FC=gfortran F77=gfortran'

CMD="./configure \
--prefix=${PREFIX} \
--mandir=${PREFIX}/share/man \
--with-slurm \
--with-pmi \
--with-lustre \
--with-verbs \

I have a simple hello-mpi.c (source included below), which compiles
and runs with mpirun, both on the login node and in a job. However,
when I try to use srun in place of mpirun, I get instead a hung job,
which upon cancellation produces this output.

[bn2.stage.arc-ts.umich.edu:116377] PMI_Init [pmix_s1.c:162:s1_init]:
PMI is not initialized
[bn1.stage.arc-ts.umich.edu:36866] PMI_Init [pmix_s1.c:162:s1_init]:
PMI is not initialized
[warn] opal_libevent2022_event_active: event has no event_base set.
[warn] opal_libevent2022_event_active: event has no event_base set.
slurmstepd: error: *** STEP 86.0 ON bn1 CANCELLED AT 2017-11-16T10:03:24 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
slurmstepd: error: *** JOB 86 ON bn1 CANCELLED AT 2017-11-16T10:03:24 ***

The SLURM web page suggests that OMPI 2.x and later support PMIx, and
to use `srun --mpi=pimx`, however that no longer seems to be an
option, and using the `openmpi` type isn't working (neither is pmi2).

[***@beta-build hello]$ srun --mpi=list
srun: MPI types are...
srun: mpi/pmi2
srun: mpi/lam
srun: mpi/openmpi
srun: mpi/mpich1_shmem
srun: mpi/none
srun: mpi/mvapich
srun: mpi/mpich1_p4
srun: mpi/mpichgm
srun: mpi/mpichmx

To get the Intel PMI to work with srun, I have to set


Is there a comparable environment variable that must be set to enable
`srun` to work?

Am I missing a build option or misspecifying one?

-- bennet

Source of hello-mpi.c
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"

int main(int argc, char **argv){

int rank; /* rank of process */
int numprocs; /* size of COMM_WORLD */
int namelen;
int tag=10; /* expected tag */
int message; /* Recv'd message */
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Status status; /* status of recv */

/* call Init, size, and rank */
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);

printf("Process %d on %s out of %d\n", rank, processor_name, numprocs);

if(rank != 0){
MPI_Recv(&message, /*buffer for message */
1, /*MAX count to recv */
MPI_INT, /*type to recv */
0, /*recv from 0 only */
tag, /*tag of messgae */
MPI_COMM_WORLD, /*communicator to use */
&status); /*status object */
printf("Hello from process %d!\n",rank);
/* rank 0 ONLY executes this */
printf("MPI_COMM_WORLD is %d processes big!\n", numprocs);
int x;
for(x=1; x<numprocs; x++){
MPI_Send(&x, /*send x to process x */
1, /*number to send */
MPI_INT, /*type to send */
x, /*rank to send to */
tag, /*tag for message */
MPI_COMM_WORLD); /*communicator to use */
} /* end else */

/* always call at end */

return 0;
Charles A Taylor
2017-11-16 15:54:08 UTC
Hi Bennet,

Three things...

1. OpenMPI 2.x requires PMIx in lieu of pmi1/pmi2.

2. You will need slurm 16.05 or greater built with —with-pmix

2a. You will need pmix 1.1.5 which you can get from github. (https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_pmix_tarballs&d=DwIFaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=8sBODgXZKw_dNqkFqkTqbGD3_7nNlm_pat-D6AqiaC8&m=c-BYSHbQBLKztnmjE6vyXD1qJPjhdol-A6vS7z11_CY&s=8l86GZPJBXZP3xA9iy-tZFiPJ9fhG82mcOFjzz04gRE&e= <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_pmix_tarballs&d=DwIFaQ&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=8sBODgXZKw_dNqkFqkTqbGD3_7nNlm_pat-D6AqiaC8&m=c-BYSHbQBLKztnmjE6vyXD1qJPjhdol-A6vS7z11_CY&s=8l86GZPJBXZP3xA9iy-tZFiPJ9fhG82mcOFjzz04gRE&e= >).

3. then, to launch your mpi tasks on the allocated resources,

srun —mpi=pmix ./hello-mpi

I’m replying to the list because,

a) this information is harder to find than you might think.
b) someone/anyone can correct me if I’’m giving a bum steer.

Hope this helps,

Charlie Taylor
University of Florida
Post by Bennet Fauber
I think that OpenMPI is supposed to support SLURM integration such that
srun ./hello-mpi
should work? I built OMPI 2.1.2 with
export CONFIGURE_FLAGS='--disable-dlopen --enable-shared'
export COMPILERS='CC=gcc CXX=g++ FC=gfortran F77=gfortran'
CMD="./configure \
--prefix=${PREFIX} \
--mandir=${PREFIX}/share/man \
--with-slurm \
--with-pmi \
--with-lustre \
--with-verbs \
I have a simple hello-mpi.c (source included below), which compiles
and runs with mpirun, both on the login node and in a job. However,
when I try to use srun in place of mpirun, I get instead a hung job,
which upon cancellation produces this output.
PMI is not initialized
PMI is not initialized
[warn] opal_libevent2022_event_active: event has no event_base set.
[warn] opal_libevent2022_event_active: event has no event_base set.
slurmstepd: error: *** STEP 86.0 ON bn1 CANCELLED AT 2017-11-16T10:03:24 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
slurmstepd: error: *** JOB 86 ON bn1 CANCELLED AT 2017-11-16T10:03:24 ***
The SLURM web page suggests that OMPI 2.x and later support PMIx, and
to use `srun --mpi=pimx`, however that no longer seems to be an
option, and using the `openmpi` type isn't working (neither is pmi2).
srun: MPI types are...
srun: mpi/pmi2
srun: mpi/lam
srun: mpi/openmpi
srun: mpi/mpich1_shmem
srun: mpi/none
srun: mpi/mvapich
srun: mpi/mpich1_p4
srun: mpi/mpichgm
srun: mpi/mpichmx
To get the Intel PMI to work with srun, I have to set
Is there a comparable environment variable that must be set to enable
`srun` to work?
Am I missing a build option or misspecifying one?
-- bennet
Source of hello-mpi.c
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
int main(int argc, char **argv){
int rank; /* rank of process */
int numprocs; /* size of COMM_WORLD */
int namelen;
int tag=10; /* expected tag */
int message; /* Recv'd message */
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Status status; /* status of recv */
/* call Init, size, and rank */
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);
printf("Process %d on %s out of %d\n", rank, processor_name, numprocs);
if(rank != 0){
MPI_Recv(&message, /*buffer for message */
1, /*MAX count to recv */
MPI_INT, /*type to recv */
0, /*recv from 0 only */
tag, /*tag of messgae */
MPI_COMM_WORLD, /*communicator to use */
&status); /*status object */
printf("Hello from process %d!\n",rank);
/* rank 0 ONLY executes this */
printf("MPI_COMM_WORLD is %d processes big!\n", numprocs);
int x;
for(x=1; x<numprocs; x++){
MPI_Send(&x, /*send x to process x */
1, /*number to send */
MPI_INT, /*type to send */
x, /*rank to send to */
tag, /*tag for message */
MPI_COMM_WORLD); /*communicator to use */
} /* end else */
/* always call at end */
return 0;
users mailing list
Bennet Fauber
2017-11-16 16:11:27 UTC

Thanks a ton! Yes, we are missing two of the three steps.

Will report back after we get pmix installed and after we rebuild
Slurm. We do have a new enough version of it, at least, so we might
have missed the target, but we did at least hit the barn. ;-)
Post by Charles A Taylor
Hi Bennet,
Three things...
1. OpenMPI 2.x requires PMIx in lieu of pmi1/pmi2.
2. You will need slurm 16.05 or greater built with —with-pmix
2a. You will need pmix 1.1.5 which you can get from github.
3. then, to launch your mpi tasks on the allocated resources,
srun —mpi=pmix ./hello-mpi
I’m replying to the list because,
a) this information is harder to find than you might think.
b) someone/anyone can correct me if I’’m giving a bum steer.
Hope this helps,
Charlie Taylor
University of Florida
I think that OpenMPI is supposed to support SLURM integration such that
srun ./hello-mpi
should work? I built OMPI 2.1.2 with
export CONFIGURE_FLAGS='--disable-dlopen --enable-shared'
export COMPILERS='CC=gcc CXX=g++ FC=gfortran F77=gfortran'
CMD="./configure \
--prefix=${PREFIX} \
--mandir=${PREFIX}/share/man \
--with-slurm \
--with-pmi \
--with-lustre \
--with-verbs \
I have a simple hello-mpi.c (source included below), which compiles
and runs with mpirun, both on the login node and in a job. However,
when I try to use srun in place of mpirun, I get instead a hung job,
which upon cancellation produces this output.
PMI is not initialized
PMI is not initialized
[warn] opal_libevent2022_event_active: event has no event_base set.
[warn] opal_libevent2022_event_active: event has no event_base set.
slurmstepd: error: *** STEP 86.0 ON bn1 CANCELLED AT 2017-11-16T10:03:24 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
slurmstepd: error: *** JOB 86 ON bn1 CANCELLED AT 2017-11-16T10:03:24 ***
The SLURM web page suggests that OMPI 2.x and later support PMIx, and
to use `srun --mpi=pimx`, however that no longer seems to be an
option, and using the `openmpi` type isn't working (neither is pmi2).
srun: MPI types are...
srun: mpi/pmi2
srun: mpi/lam
srun: mpi/openmpi
srun: mpi/mpich1_shmem
srun: mpi/none
srun: mpi/mvapich
srun: mpi/mpich1_p4
srun: mpi/mpichgm
srun: mpi/mpichmx
To get the Intel PMI to work with srun, I have to set
Is there a comparable environment variable that must be set to enable
`srun` to work?
Am I missing a build option or misspecifying one?
-- bennet
Source of hello-mpi.c
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
int main(int argc, char **argv){
int rank; /* rank of process */
int numprocs; /* size of COMM_WORLD */
int namelen;
int tag=10; /* expected tag */
int message; /* Recv'd message */
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Status status; /* status of recv */
/* call Init, size, and rank */
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);
printf("Process %d on %s out of %d\n", rank, processor_name, numprocs);
if(rank != 0){
MPI_Recv(&message, /*buffer for message */
1, /*MAX count to recv */
MPI_INT, /*type to recv */
0, /*recv from 0 only */
tag, /*tag of messgae */
MPI_COMM_WORLD, /*communicator to use */
&status); /*status object */
printf("Hello from process %d!\n",rank);
/* rank 0 ONLY executes this */
printf("MPI_COMM_WORLD is %d processes big!\n", numprocs);
int x;
for(x=1; x<numprocs; x++){
MPI_Send(&x, /*send x to process x */
1, /*number to send */
MPI_INT, /*type to send */
x, /*rank to send to */
tag, /*tag for message */
MPI_COMM_WORLD); /*communicator to use */
} /* end else */
/* always call at end */
return 0;
users mailing list
users mailing list
2017-11-16 16:30:42 UTC
What Charles said was true but not quite complete. We still support the older PMI libraries but you likely have to point us to wherever slurm put them.

However,we definitely recommend using PMIx as you will get a faster launch

Sent from my iPad
Post by Bennet Fauber
Thanks a ton! Yes, we are missing two of the three steps.
Will report back after we get pmix installed and after we rebuild
Slurm. We do have a new enough version of it, at least, so we might
have missed the target, but we did at least hit the barn. ;-)
Post by Charles A Taylor
Hi Bennet,
Three things...
1. OpenMPI 2.x requires PMIx in lieu of pmi1/pmi2.
2. You will need slurm 16.05 or greater built with —with-pmix
2a. You will need pmix 1.1.5 which you can get from github.
3. then, to launch your mpi tasks on the allocated resources,
srun —mpi=pmix ./hello-mpi
I’m replying to the list because,
a) this information is harder to find than you might think.
b) someone/anyone can correct me if I’’m giving a bum steer.
Hope this helps,
Charlie Taylor
University of Florida
I think that OpenMPI is supposed to support SLURM integration such that
srun ./hello-mpi
should work? I built OMPI 2.1.2 with
export CONFIGURE_FLAGS='--disable-dlopen --enable-shared'
export COMPILERS='CC=gcc CXX=g++ FC=gfortran F77=gfortran'
CMD="./configure \
--prefix=${PREFIX} \
--mandir=${PREFIX}/share/man \
--with-slurm \
--with-pmi \
--with-lustre \
--with-verbs \
I have a simple hello-mpi.c (source included below), which compiles
and runs with mpirun, both on the login node and in a job. However,
when I try to use srun in place of mpirun, I get instead a hung job,
which upon cancellation produces this output.
PMI is not initialized
PMI is not initialized
[warn] opal_libevent2022_event_active: event has no event_base set.
[warn] opal_libevent2022_event_active: event has no event_base set.
slurmstepd: error: *** STEP 86.0 ON bn1 CANCELLED AT 2017-11-16T10:03:24 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
slurmstepd: error: *** JOB 86 ON bn1 CANCELLED AT 2017-11-16T10:03:24 ***
The SLURM web page suggests that OMPI 2.x and later support PMIx, and
to use `srun --mpi=pimx`, however that no longer seems to be an
option, and using the `openmpi` type isn't working (neither is pmi2).
srun: MPI types are...
srun: mpi/pmi2
srun: mpi/lam
srun: mpi/openmpi
srun: mpi/mpich1_shmem
srun: mpi/none
srun: mpi/mvapich
srun: mpi/mpich1_p4
srun: mpi/mpichgm
srun: mpi/mpichmx
To get the Intel PMI to work with srun, I have to set
Is there a comparable environment variable that must be set to enable
`srun` to work?
Am I missing a build option or misspecifying one?
-- bennet
Source of hello-mpi.c
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
int main(int argc, char **argv){
int rank; /* rank of process */
int numprocs; /* size of COMM_WORLD */
int namelen;
int tag=10; /* expected tag */
int message; /* Recv'd message */
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Status status; /* status of recv */
/* call Init, size, and rank */
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);
printf("Process %d on %s out of %d\n", rank, processor_name, numprocs);
if(rank != 0){
MPI_Recv(&message, /*buffer for message */
1, /*MAX count to recv */
MPI_INT, /*type to recv */
0, /*recv from 0 only */
tag, /*tag of messgae */
MPI_COMM_WORLD, /*communicator to use */
&status); /*status object */
printf("Hello from process %d!\n",rank);
/* rank 0 ONLY executes this */
printf("MPI_COMM_WORLD is %d processes big!\n", numprocs);
int x;
for(x=1; x<numprocs; x++){
MPI_Send(&x, /*send x to process x */
1, /*number to send */
MPI_INT, /*type to send */
x, /*rank to send to */
tag, /*tag for message */
MPI_COMM_WORLD); /*communicator to use */
} /* end else */
/* always call at end */
return 0;
users mailing list
users mailing list
users mailing list
Howard Pritchard
2017-11-18 04:45:33 UTC
Hello Bennet,

What you are trying to do using srun as the job launcher should work.
Could you post the contents
of /etc/slurm/slurm.conf for your system?

Could you also post the output of the following command:

ompi_info --all | grep pmix

to the mail list.

the config.log from your build would also be useful.

Post by r***@open-mpi.org
What Charles said was true but not quite complete. We still support the
older PMI libraries but you likely have to point us to wherever slurm put
However,we definitely recommend using PMIx as you will get a faster launch
Sent from my iPad
Post by Bennet Fauber
Thanks a ton! Yes, we are missing two of the three steps.
Will report back after we get pmix installed and after we rebuild
Slurm. We do have a new enough version of it, at least, so we might
have missed the target, but we did at least hit the barn. ;-)
Post by Charles A Taylor
Hi Bennet,
Three things...
1. OpenMPI 2.x requires PMIx in lieu of pmi1/pmi2.
2. You will need slurm 16.05 or greater built with —with-pmix
2a. You will need pmix 1.1.5 which you can get from github.
3. then, to launch your mpi tasks on the allocated resources,
srun —mpi=pmix ./hello-mpi
I’m replying to the list because,
a) this information is harder to find than you might think.
b) someone/anyone can correct me if I’’m giving a bum steer.
Hope this helps,
Charlie Taylor
University of Florida
I think that OpenMPI is supposed to support SLURM integration such that
srun ./hello-mpi
should work? I built OMPI 2.1.2 with
export CONFIGURE_FLAGS='--disable-dlopen --enable-shared'
export COMPILERS='CC=gcc CXX=g++ FC=gfortran F77=gfortran'
CMD="./configure \
--prefix=${PREFIX} \
--mandir=${PREFIX}/share/man \
--with-slurm \
--with-pmi \
--with-lustre \
--with-verbs \
I have a simple hello-mpi.c (source included below), which compiles
and runs with mpirun, both on the login node and in a job. However,
when I try to use srun in place of mpirun, I get instead a hung job,
which upon cancellation produces this output.
PMI is not initialized
PMI is not initialized
[warn] opal_libevent2022_event_active: event has no event_base set.
[warn] opal_libevent2022_event_active: event has no event_base set.
slurmstepd: error: *** STEP 86.0 ON bn1 CANCELLED AT
2017-11-16T10:03:24 ***
Post by Bennet Fauber
Post by Charles A Taylor
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
slurmstepd: error: *** JOB 86 ON bn1 CANCELLED AT 2017-11-16T10:03:24
Post by Bennet Fauber
Post by Charles A Taylor
The SLURM web page suggests that OMPI 2.x and later support PMIx, and
to use `srun --mpi=pimx`, however that no longer seems to be an
option, and using the `openmpi` type isn't working (neither is pmi2).
srun: MPI types are...
srun: mpi/pmi2
srun: mpi/lam
srun: mpi/openmpi
srun: mpi/mpich1_shmem
srun: mpi/none
srun: mpi/mvapich
srun: mpi/mpich1_p4
srun: mpi/mpichgm
srun: mpi/mpichmx
To get the Intel PMI to work with srun, I have to set
Is there a comparable environment variable that must be set to enable
`srun` to work?
Am I missing a build option or misspecifying one?
-- bennet
Source of hello-mpi.c
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
int main(int argc, char **argv){
int rank; /* rank of process */
int numprocs; /* size of COMM_WORLD */
int namelen;
int tag=10; /* expected tag */
int message; /* Recv'd message */
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Status status; /* status of recv */
/* call Init, size, and rank */
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);
printf("Process %d on %s out of %d\n", rank, processor_name, numprocs);
if(rank != 0){
MPI_Recv(&message, /*buffer for message */
1, /*MAX count to recv */
MPI_INT, /*type to recv */
0, /*recv from 0 only */
tag, /*tag of messgae */
MPI_COMM_WORLD, /*communicator to use */
&status); /*status object */
printf("Hello from process %d!\n",rank);
/* rank 0 ONLY executes this */
printf("MPI_COMM_WORLD is %d processes big!\n", numprocs);
int x;
for(x=1; x<numprocs; x++){
MPI_Send(&x, /*send x to process x */
1, /*number to send */
MPI_INT, /*type to send */
x, /*rank to send to */
tag, /*tag for message */
MPI_COMM_WORLD); /*communicator to use */
} /* end else */
/* always call at end */
return 0;
users mailing list
Post by Bennet Fauber
Post by Charles A Taylor
users mailing list
users mailing list
users mailing list
Bennet Fauber
2017-11-18 17:03:00 UTC

Thanks for the reply.

I think, based on a previous reply, that we may not have the right
combinations of pmi and slurm lined up. I will have to coordinate with our
admin who compiles and installs slurm, and once we think we have slurm with
pmix, I'll try again and post the files/information you suggest.

Thanks for telling me which files and output is most useful here.

-- bennet
Post by Howard Pritchard
Hello Bennet,
What you are trying to do using srun as the job launcher should work.
Could you post the contents
of /etc/slurm/slurm.conf for your system?
ompi_info --all | grep pmix
to the mail list.
the config.log from your build would also be useful.
Post by r***@open-mpi.org
What Charles said was true but not quite complete. We still support the
older PMI libraries but you likely have to point us to wherever slurm put
However,we definitely recommend using PMIx as you will get a faster launch
Sent from my iPad
Post by Bennet Fauber
Thanks a ton! Yes, we are missing two of the three steps.
Will report back after we get pmix installed and after we rebuild
Slurm. We do have a new enough version of it, at least, so we might
have missed the target, but we did at least hit the barn. ;-)
Post by Charles A Taylor
Hi Bennet,
Three things...
1. OpenMPI 2.x requires PMIx in lieu of pmi1/pmi2.
2. You will need slurm 16.05 or greater built with —with-pmix
2a. You will need pmix 1.1.5 which you can get from github.
3. then, to launch your mpi tasks on the allocated resources,
srun —mpi=pmix ./hello-mpi
I’m replying to the list because,
a) this information is harder to find than you might think.
b) someone/anyone can correct me if I’’m giving a bum steer.
Hope this helps,
Charlie Taylor
University of Florida
I think that OpenMPI is supposed to support SLURM integration such that
srun ./hello-mpi
should work? I built OMPI 2.1.2 with
export CONFIGURE_FLAGS='--disable-dlopen --enable-shared'
export COMPILERS='CC=gcc CXX=g++ FC=gfortran F77=gfortran'
CMD="./configure \
--prefix=${PREFIX} \
--mandir=${PREFIX}/share/man \
--with-slurm \
--with-pmi \
--with-lustre \
--with-verbs \
I have a simple hello-mpi.c (source included below), which compiles
and runs with mpirun, both on the login node and in a job. However,
when I try to use srun in place of mpirun, I get instead a hung job,
which upon cancellation produces this output.
PMI is not initialized
PMI is not initialized
[warn] opal_libevent2022_event_active: event has no event_base set.
[warn] opal_libevent2022_event_active: event has no event_base set.
slurmstepd: error: *** STEP 86.0 ON bn1 CANCELLED AT
2017-11-16T10:03:24 ***
Post by Bennet Fauber
Post by Charles A Taylor
srun: Job step aborted: Waiting up to 32 seconds for job step to
Post by Bennet Fauber
Post by Charles A Taylor
slurmstepd: error: *** JOB 86 ON bn1 CANCELLED AT 2017-11-16T10:03:24
Post by Bennet Fauber
Post by Charles A Taylor
The SLURM web page suggests that OMPI 2.x and later support PMIx, and
to use `srun --mpi=pimx`, however that no longer seems to be an
option, and using the `openmpi` type isn't working (neither is pmi2).
srun: MPI types are...
srun: mpi/pmi2
srun: mpi/lam
srun: mpi/openmpi
srun: mpi/mpich1_shmem
srun: mpi/none
srun: mpi/mvapich
srun: mpi/mpich1_p4
srun: mpi/mpichgm
srun: mpi/mpichmx
To get the Intel PMI to work with srun, I have to set
Is there a comparable environment variable that must be set to enable
`srun` to work?
Am I missing a build option or misspecifying one?
-- bennet
Source of hello-mpi.c
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
int main(int argc, char **argv){
int rank; /* rank of process */
int numprocs; /* size of COMM_WORLD */
int namelen;
int tag=10; /* expected tag */
int message; /* Recv'd message */
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Status status; /* status of recv */
/* call Init, size, and rank */
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);
printf("Process %d on %s out of %d\n", rank, processor_name, numprocs);
if(rank != 0){
MPI_Recv(&message, /*buffer for message */
1, /*MAX count to recv */
MPI_INT, /*type to recv */
0, /*recv from 0 only */
tag, /*tag of messgae */
MPI_COMM_WORLD, /*communicator to use */
&status); /*status object */
printf("Hello from process %d!\n",rank);
/* rank 0 ONLY executes this */
printf("MPI_COMM_WORLD is %d processes big!\n", numprocs);
int x;
for(x=1; x<numprocs; x++){
MPI_Send(&x, /*send x to process x */
1, /*number to send */
MPI_INT, /*type to send */
x, /*rank to send to */
tag, /*tag for message */
MPI_COMM_WORLD); /*communicator to use */
} /* end else */
/* always call at end */
return 0;
users mailing list
Post by Bennet Fauber
Post by Charles A Taylor
users mailing list
users mailing list
users mailing list
users mailing list
2017-11-19 06:20:09 UTC
users mailing list
Bennet Fauber
2017-11-29 13:44:09 UTC

Thanks very much for the help identifying what information I should provide.

This is some information about our SLURM version

$ srun --mpi list
srun: MPI types are...
srun: pmi2
srun: pmix_v1
srun: openmpi
srun: pmix
srun: none

$ srun --version
slurm 17.11.0-0rc3

This is the output from my build script, which should show all the
configure options I used.

Checking compilers and things
OMPI is ompi
COMP_NAME is gcc_4_8_5
SRC_ROOT is /sw/src/arcts
PREFIX_ROOT is /sw/arcts/centos7/apps
PREFIX is /sw/arcts/centos7/apps/gcc_4_8_5/openmpi/2.1.2
CONFIGURE_FLAGS are --disable-dlopen --enable-shared
COMPILERS are CC=gcc CXX=g++ FC=gfortran F77=gfortran
No modules loaded
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11)
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO

--disable-dlopen --enable-shared
CC=gcc CXX=g++ FC=gfortran F77=gfortran

I remove the build directory and re-expand from the source tarball for
each build, so there should not be lingering configuration files from
prior trials.

Here is the output of

ompi_info | grep pmix

MCA pmix: s2 (MCA v2.1.0, API v2.0.0, Component v2.1.2)
MCA pmix: s1 (MCA v2.1.0, API v2.0.0, Component v2.1.2)
MCA pmix: pmix112 (MCA v2.1.0, API v2.0.0, Component v2.1.2)
MCA pmix base: ---------------------------------------------------
MCA pmix base: parameter "pmix" (current value: "", data
source: default, level: 2 user/detail, type: string)
Default selection set of components for the
pmix framework (<none> means use all components that can be found)
MCA pmix base: ---------------------------------------------------
MCA pmix base: parameter "pmix_base_verbose" (current
value: "error", data source: default, level: 8 dev/detail, type: int)
Verbosity level for the pmix framework (default: 0)
MCA pmix base: parameter "pmix_base_async_modex" (current
value: "false", data source: default, level: 9 dev/all, type: bool)
MCA pmix base: parameter "pmix_base_collect_data" (current
value: "true", data source: default, level: 9 dev/all, type: bool)
MCA pmix s2: ---------------------------------------------------
MCA pmix s2: parameter "pmix_s2_priority" (current value:
"20", data source: default, level: 9 dev/all, type: int)
Priority of the pmix s2 component (default: 20)
MCA pmix s1: ---------------------------------------------------
MCA pmix s1: parameter "pmix_s1_priority" (current value:
"10", data source: default, level: 9 dev/all, type: int)
Priority of the pmix s1 component (default: 10)

I also attach the hello-mpi.c file I am using as a test. I compiled it using

$ mpicc -o hello-mpi hello-mpi.c

and this is the information about the actual compile command

$ mpicc --showme -o hello-mpi hello-mpi.c
gcc -o hello-mpi hello-mpi.c
-I/sw/arcts/centos7/apps/gcc_4_8_5/openmpi/2.1.2/include -pthread
-L/usr/lib64 -Wl,-rpath -Wl,/usr/lib64 -Wl,-rpath
-L/sw/arcts/centos7/apps/gcc_4_8_5/openmpi/2.1.2/lib -lmpi

I use some variation on the following submit script

$ cat test.slurm
#SBATCH --mail-user=***@umich.edu
#SBATCH --mail-type=NONE

#SBATCH --ntasks-per-node=1
#SBATCH --mem-per-cpu=1g
#SBATCH --cpus-per-task=1
#SBATCH -A hpcstaff
#SBATCH -p standard

#Your code here

cd /home/bennet/hello
srun ./hello-mpi

The results are attached as slurm-114.out, where it looks to me like
it is trying to invoke pmi2 instead of pmix.

If I use `srun --mpi pmix ./hello-mpi` in the file submitted to SLURM,
I get a core dump.

[bn1.stage.arc-ts.umich.edu:34722] PMIX ERROR: BAD-PARAM in file
src/dstore/pmix_esh.c at line 996
[bn2.stage.arc-ts.umich.edu:04597] PMIX ERROR: BAD-PARAM in file
src/dstore/pmix_esh.c at line 996
[bn1:34722] *** Process received signal ***
[bn1:34722] Signal: Segmentation fault (11)
[bn1:34722] Signal code: Invalid permissions (2)
[bn1:34722] Failing at address: 0xcf73a0
[bn1:34722] [ 0] /usr/lib64/libpthread.so.0(+0xf370)[0x2b2420b1d370]
[bn1:34722] [ 1] [0xcf73a0]
[bn1:34722] *** End of error message ***
[bn2:04597] *** Process received signal ***
[bn2:04597] Signal: Segmentation fault (11)
[bn2:04597] Signal code: (128)
[bn2:04597] Failing at address: (nil)
[bn2:04597] [ 0] /usr/lib64/libpthread.so.0(+0xf370)[0x2ab526447370]
[bn2:04597] [ 1]
[bn2:04597] [ 2]
[bn2:04597] [ 3]
[bn2:04597] [ 4]
[bn2:04597] [ 5]
[bn2:04597] [ 6]
[bn2:04597] [ 7]
[bn2:04597] [ 8] /home/bennet/hello/./hello-mpi[0x4009d5]
[bn2:04597] [ 9] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2ab526675b35]
[bn2:04597] [10] /home/bennet/hello/./hello-mpi[0x4008d9]
[bn2:04597] *** End of error message ***
srun: error: bn1: task 0: Segmentation fault (core dumped)
srun: error: bn2: task 1: Segmentation fault (core dumped)

If I use `srun --mpi openmpi` in the submit script, the job hangs, and
when I cancel it, I get

[bn2.stage.arc-ts.umich.edu:04855] PMI_Init [pmix_s1.c:162:s1_init]:
PMI is not initialized
[bn1.stage.arc-ts.umich.edu:35000] PMI_Init [pmix_s1.c:162:s1_init]:
PMI is not initialized
[warn] opal_libevent2022_event_active: event has no event_base set.
[warn] opal_libevent2022_event_active: event has no event_base set.
slurmstepd: error: *** STEP 116.0 ON bn1 CANCELLED AT 2017-11-29T08:42:54 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
slurmstepd: error: *** JOB 116 ON bn1 CANCELLED AT 2017-11-29T08:42:54 ***

Any thoughts you might have on this would be very much appreciated.

Thanks, -- bennet
Post by Howard Pritchard
Hello Bennet,
What you are trying to do using srun as the job launcher should work. Could
you post the contents
of /etc/slurm/slurm.conf for your system?
ompi_info --all | grep pmix
to the mail list.
the config.log from your build would also be useful.
Post by r***@open-mpi.org
What Charles said was true but not quite complete. We still support the
older PMI libraries but you likely have to point us to wherever slurm put
However,we definitely recommend using PMIx as you will get a faster launch
Sent from my iPad
Post by Bennet Fauber
Thanks a ton! Yes, we are missing two of the three steps.
Will report back after we get pmix installed and after we rebuild
Slurm. We do have a new enough version of it, at least, so we might
have missed the target, but we did at least hit the barn. ;-)
Post by Charles A Taylor
Hi Bennet,
Three things...
1. OpenMPI 2.x requires PMIx in lieu of pmi1/pmi2.
2. You will need slurm 16.05 or greater built with —with-pmix
2a. You will need pmix 1.1.5 which you can get from github.
3. then, to launch your mpi tasks on the allocated resources,
srun —mpi=pmix ./hello-mpi
I’m replying to the list because,
a) this information is harder to find than you might think.
b) someone/anyone can correct me if I’’m giving a bum steer.
Hope this helps,
Charlie Taylor
University of Florida
I think that OpenMPI is supposed to support SLURM integration such that
srun ./hello-mpi
should work? I built OMPI 2.1.2 with
export CONFIGURE_FLAGS='--disable-dlopen --enable-shared'
export COMPILERS='CC=gcc CXX=g++ FC=gfortran F77=gfortran'
CMD="./configure \
--prefix=${PREFIX} \
--mandir=${PREFIX}/share/man \
--with-slurm \
--with-pmi \
--with-lustre \
--with-verbs \
I have a simple hello-mpi.c (source included below), which compiles
and runs with mpirun, both on the login node and in a job. However,
when I try to use srun in place of mpirun, I get instead a hung job,
which upon cancellation produces this output.
PMI is not initialized
PMI is not initialized
[warn] opal_libevent2022_event_active: event has no event_base set.
[warn] opal_libevent2022_event_active: event has no event_base set.
slurmstepd: error: *** STEP 86.0 ON bn1 CANCELLED AT
2017-11-16T10:03:24 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
slurmstepd: error: *** JOB 86 ON bn1 CANCELLED AT 2017-11-16T10:03:24 ***
The SLURM web page suggests that OMPI 2.x and later support PMIx, and
to use `srun --mpi=pimx`, however that no longer seems to be an
option, and using the `openmpi` type isn't working (neither is pmi2).
srun: MPI types are...
srun: mpi/pmi2
srun: mpi/lam
srun: mpi/openmpi
srun: mpi/mpich1_shmem
srun: mpi/none
srun: mpi/mvapich
srun: mpi/mpich1_p4
srun: mpi/mpichgm
srun: mpi/mpichmx
To get the Intel PMI to work with srun, I have to set
Is there a comparable environment variable that must be set to enable
`srun` to work?
Am I missing a build option or misspecifying one?
-- bennet
Source of hello-mpi.c
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
int main(int argc, char **argv){
int rank; /* rank of process */
int numprocs; /* size of COMM_WORLD */
int namelen;
int tag=10; /* expected tag */
int message; /* Recv'd message */
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Status status; /* status of recv */
/* call Init, size, and rank */
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);
printf("Process %d on %s out of %d\n", rank, processor_name, numprocs);
if(rank != 0){
MPI_Recv(&message, /*buffer for message */
1, /*MAX count to recv */
MPI_INT, /*type to recv */
0, /*recv from 0 only */
tag, /*tag of messgae */
MPI_COMM_WORLD, /*communicator to use */
&status); /*status object */
printf("Hello from process %d!\n",rank);
/* rank 0 ONLY executes this */
printf("MPI_COMM_WORLD is %d processes big!\n", numprocs);
int x;
for(x=1; x<numprocs; x++){
MPI_Send(&x, /*send x to process x */
1, /*number to send */
MPI_INT, /*type to send */
x, /*rank to send to */
tag, /*tag for message */
MPI_COMM_WORLD); /*communicator to use */
} /* end else */
/* always call at end */
return 0;
users mailing list
users mailing list
users mailing list
users mailing list
users mailing list
2017-11-29 14:53:20 UTC
Hi Bennet

I suspect the problem here lies in the slurm PMIx plugin. Slurm 17.11 supports PMIx v2.0 as well as (I believe) PMIx v1.2. I’m not sure if slurm is somehow finding one of those on your system and building the plugin or not, but it looks like OMPI is picking up signs of PMIx being active and trying to use it - and hitting an incompatibility.

You can test this out by adding --mpi=pmi2 to your srun cmd line and see if that solves the problem (you may also need to add OMPI_MCA_pmix=s2 to your environment as slurm has a tendency to publish envars even when they aren’t being used).
Post by Bennet Fauber
Thanks very much for the help identifying what information I should provide.
This is some information about our SLURM version
$ srun --mpi list
srun: MPI types are...
srun: pmi2
srun: pmix_v1
srun: openmpi
srun: pmix
srun: none
$ srun --version
slurm 17.11.0-0rc3
This is the output from my build script, which should show all the
configure options I used.
Checking compilers and things
OMPI is ompi
COMP_NAME is gcc_4_8_5
SRC_ROOT is /sw/src/arcts
PREFIX_ROOT is /sw/arcts/centos7/apps
PREFIX is /sw/arcts/centos7/apps/gcc_4_8_5/openmpi/2.1.2
CONFIGURE_FLAGS are --disable-dlopen --enable-shared
COMPILERS are CC=gcc CXX=g++ FC=gfortran F77=gfortran
No modules loaded
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11)
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
--disable-dlopen --enable-shared
CC=gcc CXX=g++ FC=gfortran F77=gfortran
I remove the build directory and re-expand from the source tarball for
each build, so there should not be lingering configuration files from
prior trials.
Here is the output of
ompi_info | grep pmix
MCA pmix: s2 (MCA v2.1.0, API v2.0.0, Component v2.1.2)
MCA pmix: s1 (MCA v2.1.0, API v2.0.0, Component v2.1.2)
MCA pmix: pmix112 (MCA v2.1.0, API v2.0.0, Component v2.1.2)
MCA pmix base: ---------------------------------------------------
MCA pmix base: parameter "pmix" (current value: "", data
source: default, level: 2 user/detail, type: string)
Default selection set of components for the
pmix framework (<none> means use all components that can be found)
MCA pmix base: ---------------------------------------------------
MCA pmix base: parameter "pmix_base_verbose" (current
value: "error", data source: default, level: 8 dev/detail, type: int)
Verbosity level for the pmix framework (default: 0)
MCA pmix base: parameter "pmix_base_async_modex" (current
value: "false", data source: default, level: 9 dev/all, type: bool)
MCA pmix base: parameter "pmix_base_collect_data" (current
value: "true", data source: default, level: 9 dev/all, type: bool)
MCA pmix s2: ---------------------------------------------------
"20", data source: default, level: 9 dev/all, type: int)
Priority of the pmix s2 component (default: 20)
MCA pmix s1: ---------------------------------------------------
"10", data source: default, level: 9 dev/all, type: int)
Priority of the pmix s1 component (default: 10)
I also attach the hello-mpi.c file I am using as a test. I compiled it using
$ mpicc -o hello-mpi hello-mpi.c
and this is the information about the actual compile command
$ mpicc --showme -o hello-mpi hello-mpi.c
gcc -o hello-mpi hello-mpi.c
-I/sw/arcts/centos7/apps/gcc_4_8_5/openmpi/2.1.2/include -pthread
-L/usr/lib64 -Wl,-rpath -Wl,/usr/lib64 -Wl,-rpath
-L/sw/arcts/centos7/apps/gcc_4_8_5/openmpi/2.1.2/lib -lmpi
I use some variation on the following submit script
$ cat test.slurm
#SBATCH --mail-type=NONE
#SBATCH --ntasks-per-node=1
#SBATCH --mem-per-cpu=1g
#SBATCH --cpus-per-task=1
#SBATCH -A hpcstaff
#SBATCH -p standard
#Your code here
cd /home/bennet/hello
srun ./hello-mpi
The results are attached as slurm-114.out, where it looks to me like
it is trying to invoke pmi2 instead of pmix.
If I use `srun --mpi pmix ./hello-mpi` in the file submitted to SLURM,
I get a core dump.
[bn1.stage.arc-ts.umich.edu:34722] PMIX ERROR: BAD-PARAM in file
src/dstore/pmix_esh.c at line 996
[bn2.stage.arc-ts.umich.edu:04597] PMIX ERROR: BAD-PARAM in file
src/dstore/pmix_esh.c at line 996
[bn1:34722] *** Process received signal ***
[bn1:34722] Signal: Segmentation fault (11)
[bn1:34722] Signal code: Invalid permissions (2)
[bn1:34722] Failing at address: 0xcf73a0
[bn1:34722] [ 0] /usr/lib64/libpthread.so.0(+0xf370)[0x2b2420b1d370]
[bn1:34722] [ 1] [0xcf73a0]
[bn1:34722] *** End of error message ***
[bn2:04597] *** Process received signal ***
[bn2:04597] Signal: Segmentation fault (11)
[bn2:04597] Signal code: (128)
[bn2:04597] Failing at address: (nil)
[bn2:04597] [ 0] /usr/lib64/libpthread.so.0(+0xf370)[0x2ab526447370]
[bn2:04597] [ 1]
[bn2:04597] [ 2]
[bn2:04597] [ 3]
[bn2:04597] [ 4]
[bn2:04597] [ 5]
[bn2:04597] [ 6]
[bn2:04597] [ 7]
[bn2:04597] [ 8] /home/bennet/hello/./hello-mpi[0x4009d5]
[bn2:04597] [ 9] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2ab526675b35]
[bn2:04597] [10] /home/bennet/hello/./hello-mpi[0x4008d9]
[bn2:04597] *** End of error message ***
srun: error: bn1: task 0: Segmentation fault (core dumped)
srun: error: bn2: task 1: Segmentation fault (core dumped)
If I use `srun --mpi openmpi` in the submit script, the job hangs, and
when I cancel it, I get
PMI is not initialized
PMI is not initialized
[warn] opal_libevent2022_event_active: event has no event_base set.
[warn] opal_libevent2022_event_active: event has no event_base set.
slurmstepd: error: *** STEP 116.0 ON bn1 CANCELLED AT 2017-11-29T08:42:54 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
slurmstepd: error: *** JOB 116 ON bn1 CANCELLED AT 2017-11-29T08:42:54 ***
Any thoughts you might have on this would be very much appreciated.
Thanks, -- bennet
Post by Howard Pritchard
Hello Bennet,
What you are trying to do using srun as the job launcher should work. Could
you post the contents
of /etc/slurm/slurm.conf for your system?
ompi_info --all | grep pmix
to the mail list.
the config.log from your build would also be useful.
Post by r***@open-mpi.org
What Charles said was true but not quite complete. We still support the
older PMI libraries but you likely have to point us to wherever slurm put
However,we definitely recommend using PMIx as you will get a faster launch
Sent from my iPad
Post by Bennet Fauber
Thanks a ton! Yes, we are missing two of the three steps.
Will report back after we get pmix installed and after we rebuild
Slurm. We do have a new enough version of it, at least, so we might
have missed the target, but we did at least hit the barn. ;-)
Post by Charles A Taylor
Hi Bennet,
Three things...
1. OpenMPI 2.x requires PMIx in lieu of pmi1/pmi2.
2. You will need slurm 16.05 or greater built with —with-pmix
2a. You will need pmix 1.1.5 which you can get from github.
3. then, to launch your mpi tasks on the allocated resources,
srun —mpi=pmix ./hello-mpi
I’m replying to the list because,
a) this information is harder to find than you might think.
b) someone/anyone can correct me if I’’m giving a bum steer.
Hope this helps,
Charlie Taylor
University of Florida
I think that OpenMPI is supposed to support SLURM integration such that
srun ./hello-mpi
should work? I built OMPI 2.1.2 with
export CONFIGURE_FLAGS='--disable-dlopen --enable-shared'
export COMPILERS='CC=gcc CXX=g++ FC=gfortran F77=gfortran'
CMD="./configure \
--prefix=${PREFIX} \
--mandir=${PREFIX}/share/man \
--with-slurm \
--with-pmi \
--with-lustre \
--with-verbs \
I have a simple hello-mpi.c (source included below), which compiles
and runs with mpirun, both on the login node and in a job. However,
when I try to use srun in place of mpirun, I get instead a hung job,
which upon cancellation produces this output.
PMI is not initialized
PMI is not initialized
[warn] opal_libevent2022_event_active: event has no event_base set.
[warn] opal_libevent2022_event_active: event has no event_base set.
slurmstepd: error: *** STEP 86.0 ON bn1 CANCELLED AT
2017-11-16T10:03:24 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
slurmstepd: error: *** JOB 86 ON bn1 CANCELLED AT 2017-11-16T10:03:24 ***
The SLURM web page suggests that OMPI 2.x and later support PMIx, and
to use `srun --mpi=pimx`, however that no longer seems to be an
option, and using the `openmpi` type isn't working (neither is pmi2).
srun: MPI types are...
srun: mpi/pmi2
srun: mpi/lam
srun: mpi/openmpi
srun: mpi/mpich1_shmem
srun: mpi/none
srun: mpi/mvapich
srun: mpi/mpich1_p4
srun: mpi/mpichgm
srun: mpi/mpichmx
To get the Intel PMI to work with srun, I have to set
Is there a comparable environment variable that must be set to enable
`srun` to work?
Am I missing a build option or misspecifying one?
-- bennet
Source of hello-mpi.c
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
int main(int argc, char **argv){
int rank; /* rank of process */
int numprocs; /* size of COMM_WORLD */
int namelen;
int tag=10; /* expected tag */
int message; /* Recv'd message */
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Status status; /* status of recv */
/* call Init, size, and rank */
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);
printf("Process %d on %s out of %d\n", rank, processor_name, numprocs);
if(rank != 0){
MPI_Recv(&message, /*buffer for message */
1, /*MAX count to recv */
MPI_INT, /*type to recv */
0, /*recv from 0 only */
tag, /*tag of messgae */
MPI_COMM_WORLD, /*communicator to use */
&status); /*status object */
printf("Hello from process %d!\n",rank);
/* rank 0 ONLY executes this */
printf("MPI_COMM_WORLD is %d processes big!\n", numprocs);
int x;
for(x=1; x<numprocs; x++){
MPI_Send(&x, /*send x to process x */
1, /*number to send */
MPI_INT, /*type to send */
x, /*rank to send to */
tag, /*tag for message */
MPI_COMM_WORLD); /*communicator to use */
} /* end else */
/* always call at end */
return 0;
users mailing list
users mailing list
users mailing list
users mailing list
users mailing list
users mailing list
Bennet Fauber
2017-12-09 19:28:46 UTC
I made a good deal of progress, and I now have OpenMPI 3.0.0 capable
of being run.

I do have one final point of confusion, however. It appears that
--with-pmi=<external directory> fails because configure is testing for
a non-existent major version of PMIx, 3. Perhaps that is the major
version of the OMPI distribution, and it should really be testing
whether the PMIX_VERSION_MAJOR is 2, instead?

When I use

srun ./hello.mpi

in my SLURM submit script, I get an error that leads with
The application appears to have been direct launched using "srun",
but OMPI was not built with SLURM's PMI support and therefore cannot
execute. There are several options for building PMI support under
SLURM, depending upon the SLURM version you are using:
. . . .
Please configure as appropriate and try again.

If I instead use

srun --mpi=pmix ./hello-mpi

it runs.

I installed SLURM using Here is some information about my
installation and my investigation of that message.

$ ../slurm-17.11.0/configure --prefix=/sw/arcts/centos7/slurm/17.11.0 \

and the config.log indicates that it is found:

configure:21524: checking for pmix installation
configure:21559: gcc -o conftest -g -O2 -pthread -I/opt/pmix/2.0.2/include \
conftest.c -L/opt/pmix/2.0.2/lib64 -lpmix >&5
configure:21559: $? = 0
configure:21590: gcc -E -I/opt/pmix/2.0.2/include conftest.c
configure:21590: $? = 0
configure:21642: result: /opt/pmix/2.0.2

I configure OpenMPI with

$ ./configure --prefix=/sw/arcts/centos7/apps/gcc_4_8_5/openmpi/3.0.0 \
--mandir=/sw/arcts/centos7/apps/gcc_4_8_5/openmpi/3.0.0/share/man \
--with-pmix=/opt/pmix/2.0.2 \
--with-libevent=external \
--with-hwloc=external \
--with-slurm --with-verbs \
--disable-dlopen --enable-shared \
CC=gcc CXX=g++ FC=gfortran F77=gfortran

Looking down in configure.log, I find this block, which seems to be
checking whether I asked for PMIx externally and whether it is PMIx
version 3:

configure:12341: checking if user requested external PMIx
configure:12348: result: yes
configure:12359: checking --with-external-pmix value
configure:12379: result: sanity check ok (/opt/pmix/2.0.2/include)
configure:12391: checking --with-external-libpmix value
configure:12411: result: sanity check ok (/opt/pmix/2.0.2/lib)
configure:12430: checking PMIx version
configure:12439: result: version file found
configure:12447: checking version 3x
configure:12465: gcc -E -I/opt/pmix/2.0.2/include conftest.c
conftest.c:94:56: error: #error "not version 3"
#error "not version 3"
configure:12465: $? = 1
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME "Open MPI"
| #define PACKAGE_TARNAME "openmpi"
| #define PACKAGE_VERSION "3.0.0"

. . . .

| #include
| #if
| #error "not version 3"
| #endif

I find this very disconcerting.

When I look in openmpi-3.0.0/opal/mca/pmix/pmix2x/pmix/VERSION, I
find..., well, what appears to my naive eye contradictory information.
Near the top of that file, I find,

# major, minor, and release are generally combined in the form
# <major>.<minor>.<release>.


then some greek and a repo version (I think to the pmix repository version)


That leads me to believe that the bundled version is 2.0.1, Release
Candidate 1. Then comes the possibly contradictory part I really
don't understand:

# 1. Since these version numbers are associated with *releases*, the
# version numbers maintained on the PMIx Github trunk (and developer
# branches) is always 0:0:0 for all libraries.

# 2. The version number of libpmix refers to the public pmix interfaces.
# It does not refer to any internal interfaces.

# Version numbers are described in the Libtool current:revision:age
# format.


The library version is different from the software version? OK, maybe
that is true, but that isn't what's being tested by the configure

I am further befuddled because, as near as I can tell, the bundled
version of PMIx that comes with OMPI 3.0.0 will fail that test, too.
I look in


and I see

/* define PMIx version */

which is the same as in my installed version.

The error message I get when I run with a bare `srun ./hello-mpi`
tells me that I have misconfigured the OMPI build without SLURM's PMI

The test that fails to identify the proper version of
PMIX_VERSION_MAJOR also fails when using the bundled PMIx.

$ gcc -I/tmp/build/openmpi-3.0.0/opal/mca/pmix/pmix2x/pmix/include \
pmix-test.c:95:2: error: #error "not version 3"
#error "not version 3"

But the config.log generated when using the internal version of PMIx
seems to completely bypass the test that fails when using an external

Shouldn't the test be for PMIX_VERSION_MAJOR != 2L? The only version
that has a 3 in it is in the VERSION file at the root of the PMIx
source, and I don't see that as being used by configure.

Sorry if this is a long and winding path taken by the ignorant.

What am I missing?

Thanks, -- bennet
Post by Charles A Taylor
Hi Bennet
I suspect the problem here lies in the slurm PMIx plugin. Slurm 17.11 supports PMIx v2.0 as well as (I believe) PMIx v1.2. I’m not sure if slurm is somehow finding one of those on your system and building the plugin or not, but it looks like OMPI is picking up signs of PMIx being active and trying to use it - and hitting an incompatibility.
You can test this out by adding --mpi=pmi2 to your srun cmd line and see if that solves the problem (you may also need to add OMPI_MCA_pmix=s2 to your environment as slurm has a tendency to publish envars even when they aren’t being used).
Post by Bennet Fauber
Thanks very much for the help identifying what information I should provide.
This is some information about our SLURM version
$ srun --mpi list
srun: MPI types are...
srun: pmi2
srun: pmix_v1
srun: openmpi
srun: pmix
srun: none
$ srun --version
slurm 17.11.0-0rc3
This is the output from my build script, which should show all the
configure options I used.
Checking compilers and things
OMPI is ompi
COMP_NAME is gcc_4_8_5
SRC_ROOT is /sw/src/arcts
PREFIX_ROOT is /sw/arcts/centos7/apps
PREFIX is /sw/arcts/centos7/apps/gcc_4_8_5/openmpi/2.1.2
CONFIGURE_FLAGS are --disable-dlopen --enable-shared
COMPILERS are CC=gcc CXX=g++ FC=gfortran F77=gfortran
No modules loaded
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11)
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
--disable-dlopen --enable-shared
CC=gcc CXX=g++ FC=gfortran F77=gfortran
I remove the build directory and re-expand from the source tarball for
each build, so there should not be lingering configuration files from
prior trials.
Here is the output of
ompi_info | grep pmix
MCA pmix: s2 (MCA v2.1.0, API v2.0.0, Component v2.1.2)
MCA pmix: s1 (MCA v2.1.0, API v2.0.0, Component v2.1.2)
MCA pmix: pmix112 (MCA v2.1.0, API v2.0.0, Component v2.1.2)
MCA pmix base: ---------------------------------------------------
MCA pmix base: parameter "pmix" (current value: "", data
source: default, level: 2 user/detail, type: string)
Default selection set of components for the
pmix framework (<none> means use all components that can be found)
MCA pmix base: ---------------------------------------------------
MCA pmix base: parameter "pmix_base_verbose" (current
value: "error", data source: default, level: 8 dev/detail, type: int)
Verbosity level for the pmix framework (default: 0)
MCA pmix base: parameter "pmix_base_async_modex" (current
value: "false", data source: default, level: 9 dev/all, type: bool)
MCA pmix base: parameter "pmix_base_collect_data" (current
value: "true", data source: default, level: 9 dev/all, type: bool)
MCA pmix s2: ---------------------------------------------------
"20", data source: default, level: 9 dev/all, type: int)
Priority of the pmix s2 component (default: 20)
MCA pmix s1: ---------------------------------------------------
"10", data source: default, level: 9 dev/all, type: int)
Priority of the pmix s1 component (default: 10)
I also attach the hello-mpi.c file I am using as a test. I compiled it using
$ mpicc -o hello-mpi hello-mpi.c
and this is the information about the actual compile command
$ mpicc --showme -o hello-mpi hello-mpi.c
gcc -o hello-mpi hello-mpi.c
-I/sw/arcts/centos7/apps/gcc_4_8_5/openmpi/2.1.2/include -pthread
-L/usr/lib64 -Wl,-rpath -Wl,/usr/lib64 -Wl,-rpath
-L/sw/arcts/centos7/apps/gcc_4_8_5/openmpi/2.1.2/lib -lmpi
I use some variation on the following submit script
$ cat test.slurm
#SBATCH --mail-type=NONE
#SBATCH --ntasks-per-node=1
#SBATCH --mem-per-cpu=1g
#SBATCH --cpus-per-task=1
#SBATCH -A hpcstaff
#SBATCH -p standard
#Your code here
cd /home/bennet/hello
srun ./hello-mpi
The results are attached as slurm-114.out, where it looks to me like
it is trying to invoke pmi2 instead of pmix.
If I use `srun --mpi pmix ./hello-mpi` in the file submitted to SLURM,
I get a core dump.
[bn1.stage.arc-ts.umich.edu:34722] PMIX ERROR: BAD-PARAM in file
src/dstore/pmix_esh.c at line 996
[bn2.stage.arc-ts.umich.edu:04597] PMIX ERROR: BAD-PARAM in file
src/dstore/pmix_esh.c at line 996
[bn1:34722] *** Process received signal ***
[bn1:34722] Signal: Segmentation fault (11)
[bn1:34722] Signal code: Invalid permissions (2)
[bn1:34722] Failing at address: 0xcf73a0
[bn1:34722] [ 0] /usr/lib64/libpthread.so.0(+0xf370)[0x2b2420b1d370]
[bn1:34722] [ 1] [0xcf73a0]
[bn1:34722] *** End of error message ***
[bn2:04597] *** Process received signal ***
[bn2:04597] Signal: Segmentation fault (11)
[bn2:04597] Signal code: (128)
[bn2:04597] Failing at address: (nil)
[bn2:04597] [ 0] /usr/lib64/libpthread.so.0(+0xf370)[0x2ab526447370]
[bn2:04597] [ 1]
[bn2:04597] [ 2]
[bn2:04597] [ 3]
[bn2:04597] [ 4]
[bn2:04597] [ 5]
[bn2:04597] [ 6]
[bn2:04597] [ 7]
[bn2:04597] [ 8] /home/bennet/hello/./hello-mpi[0x4009d5]
[bn2:04597] [ 9] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2ab526675b35]
[bn2:04597] [10] /home/bennet/hello/./hello-mpi[0x4008d9]
[bn2:04597] *** End of error message ***
srun: error: bn1: task 0: Segmentation fault (core dumped)
srun: error: bn2: task 1: Segmentation fault (core dumped)
If I use `srun --mpi openmpi` in the submit script, the job hangs, and
when I cancel it, I get
PMI is not initialized
PMI is not initialized
[warn] opal_libevent2022_event_active: event has no event_base set.
[warn] opal_libevent2022_event_active: event has no event_base set.
slurmstepd: error: *** STEP 116.0 ON bn1 CANCELLED AT 2017-11-29T08:42:54 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
slurmstepd: error: *** JOB 116 ON bn1 CANCELLED AT 2017-11-29T08:42:54 ***
Any thoughts you might have on this would be very much appreciated.
Thanks, -- bennet
Post by Howard Pritchard
Hello Bennet,
What you are trying to do using srun as the job launcher should work. Could
you post the contents
of /etc/slurm/slurm.conf for your system?
ompi_info --all | grep pmix
to the mail list.
the config.log from your build would also be useful.
Post by r***@open-mpi.org
What Charles said was true but not quite complete. We still support the
older PMI libraries but you likely have to point us to wherever slurm put
However,we definitely recommend using PMIx as you will get a faster launch
Sent from my iPad
Post by Bennet Fauber
Thanks a ton! Yes, we are missing two of the three steps.
Will report back after we get pmix installed and after we rebuild
Slurm. We do have a new enough version of it, at least, so we might
have missed the target, but we did at least hit the barn. ;-)
Post by Charles A Taylor
Hi Bennet,
Three things...
1. OpenMPI 2.x requires PMIx in lieu of pmi1/pmi2.
2. You will need slurm 16.05 or greater built with —with-pmix
2a. You will need pmix 1.1.5 which you can get from github.
3. then, to launch your mpi tasks on the allocated resources,
srun —mpi=pmix ./hello-mpi
I’m replying to the list because,
a) this information is harder to find than you might think.
b) someone/anyone can correct me if I’’m giving a bum steer.
Hope this helps,
Charlie Taylor
University of Florida
I think that OpenMPI is supposed to support SLURM integration such that
srun ./hello-mpi
should work? I built OMPI 2.1.2 with
export CONFIGURE_FLAGS='--disable-dlopen --enable-shared'
export COMPILERS='CC=gcc CXX=g++ FC=gfortran F77=gfortran'
CMD="./configure \
--prefix=${PREFIX} \
--mandir=${PREFIX}/share/man \
--with-slurm \
--with-pmi \
--with-lustre \
--with-verbs \
I have a simple hello-mpi.c (source included below), which compiles
and runs with mpirun, both on the login node and in a job. However,
when I try to use srun in place of mpirun, I get instead a hung job,
which upon cancellation produces this output.
PMI is not initialized
PMI is not initialized
[warn] opal_libevent2022_event_active: event has no event_base set.
[warn] opal_libevent2022_event_active: event has no event_base set.
slurmstepd: error: *** STEP 86.0 ON bn1 CANCELLED AT
2017-11-16T10:03:24 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
slurmstepd: error: *** JOB 86 ON bn1 CANCELLED AT 2017-11-16T10:03:24 ***
The SLURM web page suggests that OMPI 2.x and later support PMIx, and
to use `srun --mpi=pimx`, however that no longer seems to be an
option, and using the `openmpi` type isn't working (neither is pmi2).
srun: MPI types are...
srun: mpi/pmi2
srun: mpi/lam
srun: mpi/openmpi
srun: mpi/mpich1_shmem
srun: mpi/none
srun: mpi/mvapich
srun: mpi/mpich1_p4
srun: mpi/mpichgm
srun: mpi/mpichmx
To get the Intel PMI to work with srun, I have to set
Is there a comparable environment variable that must be set to enable
`srun` to work?
Am I missing a build option or misspecifying one?
-- bennet
Source of hello-mpi.c
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
int main(int argc, char **argv){
int rank; /* rank of process */
int numprocs; /* size of COMM_WORLD */
int namelen;
int tag=10; /* expected tag */
int message; /* Recv'd message */
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Status status; /* status of recv */
/* call Init, size, and rank */
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);
printf("Process %d on %s out of %d\n", rank, processor_name, numprocs);
if(rank != 0){
MPI_Recv(&message, /*buffer for message */
1, /*MAX count to recv */
MPI_INT, /*type to recv */
0, /*recv from 0 only */
tag, /*tag of messgae */
MPI_COMM_WORLD, /*communicator to use */
&status); /*status object */
printf("Hello from process %d!\n",rank);
/* rank 0 ONLY executes this */
printf("MPI_COMM_WORLD is %d processes big!\n", numprocs);
int x;
for(x=1; x<numprocs; x++){
MPI_Send(&x, /*send x to process x */
1, /*number to send */
MPI_INT, /*type to send */
x, /*rank to send to */
tag, /*tag for message */
MPI_COMM_WORLD); /*communicator to use */
} /* end else */
/* always call at end */
return 0;
users mailing list
users mailing list
users mailing list
users mailing list
users mailing list
users mailing list
users mailing list
2017-12-11 19:51:48 UTC
As you note, there is no correlation between release number and libtool versioning. The comment in VERSION explains the libtool rules and you can see why the two values differ.

The PMIx master is at release version 3.0 as it includes new APIs that have not yet been released. The configure logic in OMPI was updated to protect against attempts to build OMPI v3.0 versus the PMIx master as the OMPI 3.0 release branch isn’t compatible. There are additional checks in that configury for the PMIx v2 series, and so it should be picking up the right thing.

The slurm --mpi=pmix option is required because (a) your slurm.conf doesn’t specify pmix as the default MPI value, and (b) you didn’t build OMPI against the slurm PMI-1 or PMI-2 libraries. I suppose we could/should update the OMPI configury to support simultaneous building of all three PMI options, but that hasn’t been done at this point.

Post by Bennet Fauber
I made a good deal of progress, and I now have OpenMPI 3.0.0 capable
of being run.
I do have one final point of confusion, however. It appears that
--with-pmi=<external directory> fails because configure is testing for
a non-existent major version of PMIx, 3. Perhaps that is the major
version of the OMPI distribution, and it should really be testing
whether the PMIX_VERSION_MAJOR is 2, instead?
When I use
srun ./hello.mpi
in my SLURM submit script, I get an error that leads with
The application appears to have been direct launched using "srun",
but OMPI was not built with SLURM's PMI support and therefore cannot
execute. There are several options for building PMI support under
. . . .
Please configure as appropriate and try again.
If I instead use
srun --mpi=pmix ./hello-mpi
it runs.
I installed SLURM using Here is some information about my
installation and my investigation of that message.
$ ../slurm-17.11.0/configure --prefix=/sw/arcts/centos7/slurm/17.11.0 \
configure:21524: checking for pmix installation
configure:21559: gcc -o conftest -g -O2 -pthread -I/opt/pmix/2.0.2/include \
conftest.c -L/opt/pmix/2.0.2/lib64 -lpmix >&5
configure:21559: $? = 0
configure:21590: gcc -E -I/opt/pmix/2.0.2/include conftest.c
configure:21590: $? = 0
configure:21642: result: /opt/pmix/2.0.2
I configure OpenMPI with
$ ./configure --prefix=/sw/arcts/centos7/apps/gcc_4_8_5/openmpi/3.0.0 \
--mandir=/sw/arcts/centos7/apps/gcc_4_8_5/openmpi/3.0.0/share/man \
--with-pmix=/opt/pmix/2.0.2 \
--with-libevent=external \
--with-hwloc=external \
--with-slurm --with-verbs \
--disable-dlopen --enable-shared \
CC=gcc CXX=g++ FC=gfortran F77=gfortran
Looking down in configure.log, I find this block, which seems to be
checking whether I asked for PMIx externally and whether it is PMIx
configure:12341: checking if user requested external PMIx
configure:12348: result: yes
configure:12359: checking --with-external-pmix value
configure:12379: result: sanity check ok (/opt/pmix/2.0.2/include)
configure:12391: checking --with-external-libpmix value
configure:12411: result: sanity check ok (/opt/pmix/2.0.2/lib)
configure:12430: checking PMIx version
configure:12439: result: version file found
configure:12447: checking version 3x
configure:12465: gcc -E -I/opt/pmix/2.0.2/include conftest.c
conftest.c:94:56: error: #error "not version 3"
#error "not version 3"
configure:12465: $? = 1
| /* confdefs.h */
| #define PACKAGE_NAME "Open MPI"
| #define PACKAGE_TARNAME "openmpi"
| #define PACKAGE_VERSION "3.0.0"
. . . .
| #include
| #if
| #error "not version 3"
| #endif
I find this very disconcerting.
When I look in openmpi-3.0.0/opal/mca/pmix/pmix2x/pmix/VERSION, I
find..., well, what appears to my naive eye contradictory information.
Near the top of that file, I find,
# major, minor, and release are generally combined in the form
# <major>.<minor>.<release>.
then some greek and a repo version (I think to the pmix repository version)
That leads me to believe that the bundled version is 2.0.1, Release
Candidate 1. Then comes the possibly contradictory part I really
# 1. Since these version numbers are associated with *releases*, the
# version numbers maintained on the PMIx Github trunk (and developer
# branches) is always 0:0:0 for all libraries.
# 2. The version number of libpmix refers to the public pmix interfaces.
# It does not refer to any internal interfaces.
# Version numbers are described in the Libtool current:revision:age
# format.
The library version is different from the software version? OK, maybe
that is true, but that isn't what's being tested by the configure
I am further befuddled because, as near as I can tell, the bundled
version of PMIx that comes with OMPI 3.0.0 will fail that test, too.
I look in
and I see
/* define PMIx version */
which is the same as in my installed version.
The error message I get when I run with a bare `srun ./hello-mpi`
tells me that I have misconfigured the OMPI build without SLURM's PMI
The test that fails to identify the proper version of
PMIX_VERSION_MAJOR also fails when using the bundled PMIx.
$ gcc -I/tmp/build/openmpi-3.0.0/opal/mca/pmix/pmix2x/pmix/include \
pmix-test.c:95:2: error: #error "not version 3"
#error "not version 3"
But the config.log generated when using the internal version of PMIx
seems to completely bypass the test that fails when using an external
Shouldn't the test be for PMIX_VERSION_MAJOR != 2L? The only version
that has a 3 in it is in the VERSION file at the root of the PMIx
source, and I don't see that as being used by configure.
Sorry if this is a long and winding path taken by the ignorant.
What am I missing?
Thanks, -- bennet
Post by Charles A Taylor
Hi Bennet
I suspect the problem here lies in the slurm PMIx plugin. Slurm 17.11 supports PMIx v2.0 as well as (I believe) PMIx v1.2. I’m not sure if slurm is somehow finding one of those on your system and building the plugin or not, but it looks like OMPI is picking up signs of PMIx being active and trying to use it - and hitting an incompatibility.
You can test this out by adding --mpi=pmi2 to your srun cmd line and see if that solves the problem (you may also need to add OMPI_MCA_pmix=s2 to your environment as slurm has a tendency to publish envars even when they aren’t being used).
Post by Bennet Fauber
Thanks very much for the help identifying what information I should provide.
This is some information about our SLURM version
$ srun --mpi list
srun: MPI types are...
srun: pmi2
srun: pmix_v1
srun: openmpi
srun: pmix
srun: none
$ srun --version
slurm 17.11.0-0rc3
This is the output from my build script, which should show all the
configure options I used.
Checking compilers and things
OMPI is ompi
COMP_NAME is gcc_4_8_5
SRC_ROOT is /sw/src/arcts
PREFIX_ROOT is /sw/arcts/centos7/apps
PREFIX is /sw/arcts/centos7/apps/gcc_4_8_5/openmpi/2.1.2
CONFIGURE_FLAGS are --disable-dlopen --enable-shared
COMPILERS are CC=gcc CXX=g++ FC=gfortran F77=gfortran
No modules loaded
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11)
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
--disable-dlopen --enable-shared
CC=gcc CXX=g++ FC=gfortran F77=gfortran
I remove the build directory and re-expand from the source tarball for
each build, so there should not be lingering configuration files from
prior trials.
Here is the output of
ompi_info | grep pmix
MCA pmix: s2 (MCA v2.1.0, API v2.0.0, Component v2.1.2)
MCA pmix: s1 (MCA v2.1.0, API v2.0.0, Component v2.1.2)
MCA pmix: pmix112 (MCA v2.1.0, API v2.0.0, Component v2.1.2)
MCA pmix base: ---------------------------------------------------
MCA pmix base: parameter "pmix" (current value: "", data
source: default, level: 2 user/detail, type: string)
Default selection set of components for the
pmix framework (<none> means use all components that can be found)
MCA pmix base: ---------------------------------------------------
MCA pmix base: parameter "pmix_base_verbose" (current
value: "error", data source: default, level: 8 dev/detail, type: int)
Verbosity level for the pmix framework (default: 0)
MCA pmix base: parameter "pmix_base_async_modex" (current
value: "false", data source: default, level: 9 dev/all, type: bool)
MCA pmix base: parameter "pmix_base_collect_data" (current
value: "true", data source: default, level: 9 dev/all, type: bool)
MCA pmix s2: ---------------------------------------------------
"20", data source: default, level: 9 dev/all, type: int)
Priority of the pmix s2 component (default: 20)
MCA pmix s1: ---------------------------------------------------
"10", data source: default, level: 9 dev/all, type: int)
Priority of the pmix s1 component (default: 10)
I also attach the hello-mpi.c file I am using as a test. I compiled it using
$ mpicc -o hello-mpi hello-mpi.c
and this is the information about the actual compile command
$ mpicc --showme -o hello-mpi hello-mpi.c
gcc -o hello-mpi hello-mpi.c
-I/sw/arcts/centos7/apps/gcc_4_8_5/openmpi/2.1.2/include -pthread
-L/usr/lib64 -Wl,-rpath -Wl,/usr/lib64 -Wl,-rpath
-L/sw/arcts/centos7/apps/gcc_4_8_5/openmpi/2.1.2/lib -lmpi
I use some variation on the following submit script
$ cat test.slurm
#SBATCH --mail-type=NONE
#SBATCH --ntasks-per-node=1
#SBATCH --mem-per-cpu=1g
#SBATCH --cpus-per-task=1
#SBATCH -A hpcstaff
#SBATCH -p standard
#Your code here
cd /home/bennet/hello
srun ./hello-mpi
The results are attached as slurm-114.out, where it looks to me like
it is trying to invoke pmi2 instead of pmix.
If I use `srun --mpi pmix ./hello-mpi` in the file submitted to SLURM,
I get a core dump.
[bn1.stage.arc-ts.umich.edu:34722] PMIX ERROR: BAD-PARAM in file
src/dstore/pmix_esh.c at line 996
[bn2.stage.arc-ts.umich.edu:04597] PMIX ERROR: BAD-PARAM in file
src/dstore/pmix_esh.c at line 996
[bn1:34722] *** Process received signal ***
[bn1:34722] Signal: Segmentation fault (11)
[bn1:34722] Signal code: Invalid permissions (2)
[bn1:34722] Failing at address: 0xcf73a0
[bn1:34722] [ 0] /usr/lib64/libpthread.so.0(+0xf370)[0x2b2420b1d370]
[bn1:34722] [ 1] [0xcf73a0]
[bn1:34722] *** End of error message ***
[bn2:04597] *** Process received signal ***
[bn2:04597] Signal: Segmentation fault (11)
[bn2:04597] Signal code: (128)
[bn2:04597] Failing at address: (nil)
[bn2:04597] [ 0] /usr/lib64/libpthread.so.0(+0xf370)[0x2ab526447370]
[bn2:04597] [ 1]
[bn2:04597] [ 2]
[bn2:04597] [ 3]
[bn2:04597] [ 4]
[bn2:04597] [ 5]
[bn2:04597] [ 6]
[bn2:04597] [ 7]
[bn2:04597] [ 8] /home/bennet/hello/./hello-mpi[0x4009d5]
[bn2:04597] [ 9] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2ab526675b35]
[bn2:04597] [10] /home/bennet/hello/./hello-mpi[0x4008d9]
[bn2:04597] *** End of error message ***
srun: error: bn1: task 0: Segmentation fault (core dumped)
srun: error: bn2: task 1: Segmentation fault (core dumped)
If I use `srun --mpi openmpi` in the submit script, the job hangs, and
when I cancel it, I get
PMI is not initialized
PMI is not initialized
[warn] opal_libevent2022_event_active: event has no event_base set.
[warn] opal_libevent2022_event_active: event has no event_base set.
slurmstepd: error: *** STEP 116.0 ON bn1 CANCELLED AT 2017-11-29T08:42:54 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
slurmstepd: error: *** JOB 116 ON bn1 CANCELLED AT 2017-11-29T08:42:54 ***
Any thoughts you might have on this would be very much appreciated.
Thanks, -- bennet
Post by Howard Pritchard
Hello Bennet,
What you are trying to do using srun as the job launcher should work. Could
you post the contents
of /etc/slurm/slurm.conf for your system?
ompi_info --all | grep pmix
to the mail list.
the config.log from your build would also be useful.
Post by r***@open-mpi.org
What Charles said was true but not quite complete. We still support the
older PMI libraries but you likely have to point us to wherever slurm put
However,we definitely recommend using PMIx as you will get a faster launch
Sent from my iPad
Post by Bennet Fauber
Thanks a ton! Yes, we are missing two of the three steps.
Will report back after we get pmix installed and after we rebuild
Slurm. We do have a new enough version of it, at least, so we might
have missed the target, but we did at least hit the barn. ;-)
Post by Charles A Taylor
Hi Bennet,
Three things...
1. OpenMPI 2.x requires PMIx in lieu of pmi1/pmi2.
2. You will need slurm 16.05 or greater built with —with-pmix
2a. You will need pmix 1.1.5 which you can get from github.
3. then, to launch your mpi tasks on the allocated resources,
srun —mpi=pmix ./hello-mpi
I’m replying to the list because,
a) this information is harder to find than you might think.
b) someone/anyone can correct me if I’’m giving a bum steer.
Hope this helps,
Charlie Taylor
University of Florida
I think that OpenMPI is supposed to support SLURM integration such that
srun ./hello-mpi
should work? I built OMPI 2.1.2 with
export CONFIGURE_FLAGS='--disable-dlopen --enable-shared'
export COMPILERS='CC=gcc CXX=g++ FC=gfortran F77=gfortran'
CMD="./configure \
--prefix=${PREFIX} \
--mandir=${PREFIX}/share/man \
--with-slurm \
--with-pmi \
--with-lustre \
--with-verbs \
I have a simple hello-mpi.c (source included below), which compiles
and runs with mpirun, both on the login node and in a job. However,
when I try to use srun in place of mpirun, I get instead a hung job,
which upon cancellation produces this output.
PMI is not initialized
PMI is not initialized
[warn] opal_libevent2022_event_active: event has no event_base set.
[warn] opal_libevent2022_event_active: event has no event_base set.
slurmstepd: error: *** STEP 86.0 ON bn1 CANCELLED AT
2017-11-16T10:03:24 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
slurmstepd: error: *** JOB 86 ON bn1 CANCELLED AT 2017-11-16T10:03:24 ***
The SLURM web page suggests that OMPI 2.x and later support PMIx, and
to use `srun --mpi=pimx`, however that no longer seems to be an
option, and using the `openmpi` type isn't working (neither is pmi2).
srun: MPI types are...
srun: mpi/pmi2
srun: mpi/lam
srun: mpi/openmpi
srun: mpi/mpich1_shmem
srun: mpi/none
srun: mpi/mvapich
srun: mpi/mpich1_p4
srun: mpi/mpichgm
srun: mpi/mpichmx
To get the Intel PMI to work with srun, I have to set
Is there a comparable environment variable that must be set to enable
`srun` to work?
Am I missing a build option or misspecifying one?
-- bennet
Source of hello-mpi.c
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
int main(int argc, char **argv){
int rank; /* rank of process */
int numprocs; /* size of COMM_WORLD */
int namelen;
int tag=10; /* expected tag */
int message; /* Recv'd message */
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Status status; /* status of recv */
/* call Init, size, and rank */
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);
printf("Process %d on %s out of %d\n", rank, processor_name, numprocs);
if(rank != 0){
MPI_Recv(&message, /*buffer for message */
1, /*MAX count to recv */
MPI_INT, /*type to recv */
0, /*recv from 0 only */
tag, /*tag of messgae */
MPI_COMM_WORLD, /*communicator to use */
&status); /*status object */
printf("Hello from process %d!\n",rank);
/* rank 0 ONLY executes this */
printf("MPI_COMM_WORLD is %d processes big!\n", numprocs);
int x;
for(x=1; x<numprocs; x++){
MPI_Send(&x, /*send x to process x */
1, /*number to send */
MPI_INT, /*type to send */
x, /*rank to send to */
tag, /*tag for message */
MPI_COMM_WORLD); /*communicator to use */
} /* end else */
/* always call at end */
return 0;
users mailing list
users mailing list
users mailing list
users mailing list
users mailing list
users mailing list
users mailing list
users mailing list