[OMPI users] openmpi-2.0.2
Jim Edwards
2017-04-19 17:53:09 UTC

I have openmpi-2.0.2 builds on two different machines and I have a test
code which works on one machine and does not on the other machine. I'm
struggling to understand why and I hope that by posting here someone may
have some insight.

The test is using mpi derived data types and mpi_alltoallw on 4 tasks. On
the machine that fails it appears to ignore the displacement in the derived
datatype defined on task 0 and just send 0-3 to all tasks. The failing
machine is built against gcc 5.4.0, the working machine has both intel
16.0.3 and gcc 6.3.0 builds.

#include "mpi.h"

#include <stdio.h>

int main(int argc, char *argv[])


int rank, size;

MPI_Datatype type[4], type2[4];

int displacement[1];

int sbuffer[16];

int rbuffer[4];

MPI_Status status;

int scnts[4], sdispls[4], rcnts[4], rdispls[4];

MPI_Init(&argc, &argv);

MPI_Comm_size(MPI_COMM_WORLD, &size);

if (size < 4)


printf("Please run with 4 processes.\n");


return 1;


MPI_Comm_rank(MPI_COMM_WORLD, &rank);

/* task 0 has sbuffer of size 16 and we are going to send 4 values to
each of tasks 0-3, offsetting in each

case so that the expected result is

task[0] 0-3

task[1] 4-7

task[2] 8-11

task[3] 12-15


for( int i=0; i<size; i++){

if (rank == 0){

scnts[i] = 1;


scnts[i] = 0;


sdispls[i] = 0;

rcnts[i] = 0;

rdispls[i] = 0;


rcnts[0] = 1;

for (int i=0; i<size; i++){

type[i] = MPI_INT;

type2[i] = MPI_INT;

rbuffer[i] = -1;


/* on the recv side we create a data type which is a single block of
4 integers for the recv from 0

otherwise we use MPI_INT as a placeholder for the type

(openmpi does not want us to use MPI_DATATYPE_NULL a stupid
misinterpretation of the standard imho )*/

displacement[0] = 0;

MPI_Type_create_indexed_block(1, 4, displacement, MPI_INT, type2);


if (rank == 0)


for( int i=0; i<size; i++){

displacement[0] = i*4;

/* we create a datatype which is a single block of 4 integers with offset 4
from the start of sbuffer */

MPI_Type_create_indexed_block(1, 4, displacement, MPI_INT, type + i);



for (int i=0; i<16; i++)

sbuffer[i] = i;


for (int i=0; i<size; i++)

printf("rank %d i=%d: scnts %d sdispls %d stype %d rcnts %d rdispls
%d rtype %d\n", rank, i, scnts[i], sdispls[i], type[i], rcnts[i],
rdispls[i], type2[i]);

MPI_Alltoallw(sbuffer, scnts, sdispls, type, rbuffer, rcnts, rdispls,

for (int i=0; i<4; i++)

printf("rbuffer[%d] = %d\n", i, rbuffer[i]);



return 0;

Jim Edwards

CESM Software Engineer
National Center for Atmospheric Research
Boulder, CO
Gilles Gouaillardet
2017-04-20 04:40:00 UTC

can you please post your configure command line and test output on both
systems ?

fwiw, Open MPI strictly sticks to the (current) MPI standard regarding


there have been some attempts to deviate from the MPI standard

(e.g. implement what the standard "should" be versus what the standard

and they were all crushed at a very early stage in Open MPI.


Post by Jim Edwards
I have openmpi-2.0.2 builds on two different machines and I have a
test code which works on one machine and does not on the other
machine. I'm struggling to understand why and I hope that by posting
here someone may have some insight.
The test is using mpi derived data types and mpi_alltoallw on 4
tasks. On the machine that fails it appears to ignore the
displacement in the derived datatype defined on task 0 and just send
0-3 to all tasks. The failing machine is built against gcc 5.4.0,
the working machine has both intel 16.0.3 and gcc 6.3.0 builds.
#include "mpi.h"
#include <stdio.h>
int main(int argc, char *argv[])
int rank, size;
MPI_Datatype type[4], type2[4];
int displacement[1];
int sbuffer[16];
int rbuffer[4];
MPI_Status status;
int scnts[4], sdispls[4], rcnts[4], rdispls[4];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
if (size < 4)
printf("Please run with 4 processes.\n");
return 1;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
/* task 0 has sbuffer of size 16 and we are going to send 4 values
to each of tasks 0-3, offsetting in each
case so that the expected result is
task[0] 0-3
task[1] 4-7
task[2] 8-11
task[3] 12-15
for( int i=0; i<size; i++){
if (rank == 0){
scnts[i] = 1;
scnts[i] = 0;
sdispls[i] = 0;
rcnts[i] = 0;
rdispls[i] = 0;
rcnts[0] = 1;
for (int i=0; i<size; i++){
type[i] = MPI_INT;
type2[i] = MPI_INT;
rbuffer[i] = -1;
/* on the recv side we create a data type which is a single
block of 4 integers for the recv from 0
otherwise we use MPI_INT as a placeholder for the type
(openmpi does not want us to use MPI_DATATYPE_NULL a stupid
misinterpretation of the standard imho )*/
displacement[0] = 0;
MPI_Type_create_indexed_block(1, 4, displacement, MPI_INT, type2);
if (rank == 0)
for( int i=0; i<size; i++){
displacement[0] = i*4;
/* we create a datatype which is a single block of 4 integers with
offset 4 from the start of sbuffer */
MPI_Type_create_indexed_block(1, 4, displacement, MPI_INT, type + i);
for (int i=0; i<16; i++)
sbuffer[i] = i;
for (int i=0; i<size; i++)
printf("rank %d i=%d: scnts %d sdispls %d stype %d rcnts %d rdispls %d
rtype %d\n", rank, i, scnts[i], sdispls[i], type[i], rcnts[i],
rdispls[i], type2[i]);
MPI_Alltoallw(sbuffer, scnts, sdispls, type, rbuffer, rcnts, rdispls,
for (int i=0; i<4; i++)
printf("rbuffer[%d] = %d\n", i, rbuffer[i]);
return 0;
Jim Edwards
CESM Software Engineer
National Center for Atmospheric Research
Boulder, CO
users mailing list
Jim Edwards
2017-04-20 17:22:00 UTC
Hi Gilles,

I have to get this info from system admins on both systems - the working
one is

./configure --prefix=/glade/u/apps/ch/opt/openmpi/2.0.2/gnu/6.3.0
--with-tm=/glade/u/apps/ch/opt/pbs_copy --disable-shared
--enable-static --with-verbs

and this is the failing one:

./configure --prefix=/cluster/openmpi-2.0.2-gcc-g++-gfortran-5.4.0
--with-psm --with-verbs --without-udapl --disable-openib-ib
cm --with-tm=/cluster/torque

I haven't been able to get test output from either group, but both
assure me that all tests passed.
Post by Gilles Gouaillardet
can you please post your configure command line and test output on both
systems ?
fwiw, Open MPI strictly sticks to the (current) MPI standard regarding
(see http://lists.mpi-forum.org/pipermail/mpi-forum/2016-January/
there have been some attempts to deviate from the MPI standard
(e.g. implement what the standard "should" be versus what the standard
and they were all crushed at a very early stage in Open MPI.
Post by Jim Edwards
I have openmpi-2.0.2 builds on two different machines and I have a test
code which works on one machine and does not on the other machine. I'm
struggling to understand why and I hope that by posting here someone may
have some insight.
The test is using mpi derived data types and mpi_alltoallw on 4 tasks.
On the machine that fails it appears to ignore the displacement in the
derived datatype defined on task 0 and just send 0-3 to all tasks. The
failing machine is built against gcc 5.4.0, the working machine has both
intel 16.0.3 and gcc 6.3.0 builds.
#include "mpi.h"
#include <stdio.h>
int main(int argc, char *argv[])
int rank, size;
MPI_Datatype type[4], type2[4];
int displacement[1];
int sbuffer[16];
int rbuffer[4];
MPI_Status status;
int scnts[4], sdispls[4], rcnts[4], rdispls[4];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
if (size < 4)
printf("Please run with 4 processes.\n");
return 1;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
/* task 0 has sbuffer of size 16 and we are going to send 4 values to
each of tasks 0-3, offsetting in each
case so that the expected result is
task[0] 0-3
task[1] 4-7
task[2] 8-11
task[3] 12-15
for( int i=0; i<size; i++){
if (rank == 0){
scnts[i] = 1;
scnts[i] = 0;
sdispls[i] = 0;
rcnts[i] = 0;
rdispls[i] = 0;
rcnts[0] = 1;
for (int i=0; i<size; i++){
type[i] = MPI_INT;
type2[i] = MPI_INT;
rbuffer[i] = -1;
/* on the recv side we create a data type which is a single block
of 4 integers for the recv from 0
otherwise we use MPI_INT as a placeholder for the type
(openmpi does not want us to use MPI_DATATYPE_NULL a stupid
misinterpretation of the standard imho )*/
displacement[0] = 0;
MPI_Type_create_indexed_block(1, 4, displacement, MPI_INT, type2);
if (rank == 0)
for( int i=0; i<size; i++){
displacement[0] = i*4;
/* we create a datatype which is a single block of 4 integers with offset
4 from the start of sbuffer */
MPI_Type_create_indexed_block(1, 4, displacement, MPI_INT, type + i);
for (int i=0; i<16; i++)
sbuffer[i] = i;
for (int i=0; i<size; i++)
printf("rank %d i=%d: scnts %d sdispls %d stype %d rcnts %d rdispls %d
rtype %d\n", rank, i, scnts[i], sdispls[i], type[i], rcnts[i], rdispls[i],
MPI_Alltoallw(sbuffer, scnts, sdispls, type, rbuffer, rcnts, rdispls,
for (int i=0; i<4; i++)
printf("rbuffer[%d] = %d\n", i, rbuffer[i]);
return 0;
Jim Edwards
CESM Software Engineer
National Center for Atmospheric Research
Boulder, CO
users mailing list
users mailing list
Jim Edwards

CESM Software Engineer
National Center for Atmospheric Research
Boulder, CO
2017-04-20 19:42:39 UTC
Due to the last post in this thread this copy I suggested seems not to be possible, but I also want to test whether this post goes through to the list now.

-- Reuti


I have openmpi-2.0.2 builds on two different machines and I have a test code which works on one machine and does not on the other machine. I'm struggling to understand why and I hope that by posting here someone may have some insight.
The test is using mpi derived data types and mpi_alltoallw on 4 tasks. On the machine that fails it appears to ignore the displacement in the derived datatype defined on task 0 and just send 0-3 to all tasks. The failing machine is built against gcc 5.4.0, the working machine has both intel 16.0.3 and gcc 6.3.0 builds.
And what happens when you copy one compilation from one node to the other (including all addressed shared libraries)?

-- Reuti
#include "mpi.h"
#include <stdio.h>
int main(int argc, char *argv[])
int rank, size;
MPI_Datatype type[4], type2[4];
int displacement[1];
int sbuffer[16];
int rbuffer[4];
MPI_Status status;
int scnts[4], sdispls[4], rcnts[4], rdispls[4];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
if (size < 4)
printf("Please run with 4 processes.\n");
return 1;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
/* task 0 has sbuffer of size 16 and we are going to send 4 values to each of tasks 0-3, offsetting in each
case so that the expected result is
task[0] 0-3
task[1] 4-7
task[2] 8-11
task[3] 12-15
for( int i=0; i<size; i++){
if (rank == 0){
scnts[i] = 1;
scnts[i] = 0;
sdispls[i] = 0;
rcnts[i] = 0;
rdispls[i] = 0;
rcnts[0] = 1;
for (int i=0; i<size; i++){
type[i] = MPI_INT;
type2[i] = MPI_INT;
rbuffer[i] = -1;
/* on the recv side we create a data type which is a single block of 4 integers for the recv from 0
otherwise we use MPI_INT as a placeholder for the type
(openmpi does not want us to use MPI_DATATYPE_NULL a stupid misinterpretation of the standard imho )*/
displacement[0] = 0;
MPI_Type_create_indexed_block(1, 4, displacement, MPI_INT, type2);
if (rank == 0)
for( int i=0; i<size; i++){
displacement[0] = i*4;
/* we create a datatype which is a single block of 4 integers with offset 4 from the start of sbuffer */
MPI_Type_create_indexed_block(1, 4, displacement, MPI_INT, type + i);
for (int i=0; i<16; i++)
sbuffer[i] = i;
for (int i=0; i<size; i++)
printf("rank %d i=%d: scnts %d sdispls %d stype %d rcnts %d rdispls %d rtype %d\n", rank, i, scnts[i], sdispls[i], type[i], rcnts[i], rdispls[i], type2[i]);
MPI_Alltoallw(sbuffer, scnts, sdispls, type, rbuffer, rcnts, rdispls, type2, MPI_COMM_WORLD);
for (int i=0; i<4; i++)
printf("rbuffer[%d] = %d\n", i, rbuffer[i]);
return 0;
Jim Edwards
CESM Software Engineer
National Center for Atmospheric Research
Boulder, CO
users mailing list