Discussion:
[OMPI users] local communicator and crash of the code
Diego Avesani
2018-08-03 17:23:47 UTC
Permalink
Dear all,

I am experiencing a strange error.

In my code I use three group communications:
MPI_COMM_WORLD
MPI_MASTERS_COMM
LOCAL_COMM

which have in common some CPUs.

when I run my code as
mpirun -np 4 --oversubscribe ./MPIHyperStrem

I have no problem, while when I run it as

mpirun -np 4 --oversubscribe ./MPIHyperStrem

sometimes it crushes and sometimes not.

It seems that all is linked to
CALL MPI_REDUCE(QTS(tstep,:), QTS(tstep,:), nNode, MPI_DOUBLE_PRECISION,
MPI_SUM, 0, MPI_LOCAL_COMM, iErr)

which works with in local.

What do you think? Can you please suggestion some debug test?
Is a problem related to local communications?

Thanks



Diego
Ralph H Castain
2018-08-03 17:47:43 UTC
Permalink
Those two command lines look exactly the same to me - what am I missing?
Post by Diego Avesani
Dear all,
I am experiencing a strange error.
MPI_COMM_WORLD
MPI_MASTERS_COMM
LOCAL_COMM
which have in common some CPUs.
when I run my code as
mpirun -np 4 --oversubscribe ./MPIHyperStrem
I have no problem, while when I run it as
mpirun -np 4 --oversubscribe ./MPIHyperStrem
sometimes it crushes and sometimes not.
It seems that all is linked to
CALL MPI_REDUCE(QTS(tstep,:), QTS(tstep,:), nNode, MPI_DOUBLE_PRECISION, MPI_SUM, 0, MPI_LOCAL_COMM, iErr)
which works with in local.
What do you think? Can you please suggestion some debug test?
Is a problem related to local communications?
Thanks
Diego
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Diego Avesani
2018-08-03 17:57:06 UTC
Permalink
Dear R, Dear all,

I do not know.
I have isolated the issues. It seem that I have some problem with:
CALL
MPI_COMM_SPLIT(MPI_COMM_WORLD,colorl,MPIworld%rank,MPI_LOCAL_COMM,MPIworld%iErr)
CALL MPI_COMM_RANK(MPI_LOCAL_COMM, MPIlocal%rank,MPIlocal%iErr)
CALL MPI_COMM_SIZE(MPI_LOCAL_COMM, MPIlocal%nCPU,MPIlocal%iErr)

openMPI seems not able to create properly MPIlocal%rank.

what should be? a bug?

thanks again

Diego
Post by Ralph H Castain
Those two command lines look exactly the same to me - what am I missing?
Dear all,
I am experiencing a strange error.
MPI_COMM_WORLD
MPI_MASTERS_COMM
LOCAL_COMM
which have in common some CPUs.
when I run my code as
mpirun -np 4 --oversubscribe ./MPIHyperStrem
I have no problem, while when I run it as
mpirun -np 4 --oversubscribe ./MPIHyperStrem
sometimes it crushes and sometimes not.
It seems that all is linked to
CALL MPI_REDUCE(QTS(tstep,:), QTS(tstep,:), nNode, MPI_DOUBLE_PRECISION,
MPI_SUM, 0, MPI_LOCAL_COMM, iErr)
which works with in local.
What do you think? Can you please suggestion some debug test?
Is a problem related to local communications?
Thanks
Diego
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Diego Avesani
2018-08-03 18:24:18 UTC
Permalink
Deal all,
probably I have found the error.
Let's me check. Probably I have not properly set-up colors.

Thanks a lot,
I hope that you have not lost too much time for me,
I will let you know If that was the problem.

Thanks again

Diego
Post by Diego Avesani
Dear R, Dear all,
I do not know.
CALL MPI_COMM_SPLIT(MPI_COMM_WORLD,colorl,MPIworld%rank,MPI_
LOCAL_COMM,MPIworld%iErr)
CALL MPI_COMM_RANK(MPI_LOCAL_COMM, MPIlocal%rank,MPIlocal%iErr)
CALL MPI_COMM_SIZE(MPI_LOCAL_COMM, MPIlocal%nCPU,MPIlocal%iErr)
openMPI seems not able to create properly MPIlocal%rank.
what should be? a bug?
thanks again
Diego
Post by Ralph H Castain
Those two command lines look exactly the same to me - what am I missing?
Dear all,
I am experiencing a strange error.
MPI_COMM_WORLD
MPI_MASTERS_COMM
LOCAL_COMM
which have in common some CPUs.
when I run my code as
mpirun -np 4 --oversubscribe ./MPIHyperStrem
I have no problem, while when I run it as
mpirun -np 4 --oversubscribe ./MPIHyperStrem
sometimes it crushes and sometimes not.
It seems that all is linked to
CALL MPI_REDUCE(QTS(tstep,:), QTS(tstep,:), nNode, MPI_DOUBLE_PRECISION,
MPI_SUM, 0, MPI_LOCAL_COMM, iErr)
which works with in local.
What do you think? Can you please suggestion some debug test?
Is a problem related to local communications?
Thanks
Diego
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Nathan Hjelm via users
2018-08-03 18:48:14 UTC
Permalink
If your are trying to create a communicator containing all node local processes then use MPI_Comm_split_type.
Post by Diego Avesani
Deal all,
probably I have found the error.
Let's me check. Probably I have not properly set-up colors.
Thanks a lot,
I hope that you have not lost too much time for me,
I will let you know If that was the problem.
Thanks again
Diego
Post by Diego Avesani
Dear R, Dear all,
I do not know.
CALL MPI_COMM_SPLIT(MPI_COMM_WORLD,colorl,MPIworld%rank,MPI_LOCAL_COMM,MPIworld%iErr)
CALL MPI_COMM_RANK(MPI_LOCAL_COMM, MPIlocal%rank,MPIlocal%iErr)
CALL MPI_COMM_SIZE(MPI_LOCAL_COMM, MPIlocal%nCPU,MPIlocal%iErr)
openMPI seems not able to create properly MPIlocal%rank.
what should be? a bug?
thanks again
Diego
Post by Ralph H Castain
Those two command lines look exactly the same to me - what am I missing?
Post by Diego Avesani
Dear all,
I am experiencing a strange error.
MPI_COMM_WORLD
MPI_MASTERS_COMM
LOCAL_COMM
which have in common some CPUs.
when I run my code as
mpirun -np 4 --oversubscribe ./MPIHyperStrem
I have no problem, while when I run it as
mpirun -np 4 --oversubscribe ./MPIHyperStrem
sometimes it crushes and sometimes not.
It seems that all is linked to
CALL MPI_REDUCE(QTS(tstep,:), QTS(tstep,:), nNode, MPI_DOUBLE_PRECISION, MPI_SUM, 0, MPI_LOCAL_COMM, iErr)
which works with in local.
What do you think? Can you please suggestion some debug test?
Is a problem related to local communications?
Thanks
Diego
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Loading...