Discussion:
[OMPI users] MPI group and stuck in communication
Diego Avesani
2018-08-10 15:52:16 UTC
Permalink
Dear all,

I have a MPI program with three groups with some CPUs in common.

I have some problem with MPI_barrier.

I try to make my self clear. I have three communicator:
INTEGER :: MPI_GROUP_WORLD
INTEGER :: MPI_LOCAL_COMM
INTEGER :: MPI_MASTER_COMM

when I apply:
IF(MPIworld%rank.EQ.0) WRITE(*,*)

CALL MPI_Barrier(MPI_COMM_WORLD,MPIworld%iErr)

IF(MPI_COMM_NULL .NE. MPI_MASTER_COMM)THEN
WRITE(*,'(A12,I3,A4,F10.5)') 'master rank',MPImaster%rank,'eff',eff
ENDIF

CALL MPI_Barrier(MPI_COMM_WORLD,MPIworld%iErr)

IF(MPIworld%rank.EQ.0) WRITE(*,*)

What could be the problem?
Thanks a lot,
Diego
Jeff Squyres (jsquyres) via users
2018-08-10 17:49:54 UTC
Permalink
I'm not quite clear what the problem is that you're running in to -- you just said that there is "some problem with MPI_barrier".

What problem, exactly, is happening with your code? Be as precise and specific as possible.

It's kinda hard to tell what is happening in the code snippet below because there's a lot of variables used that are not defined in your snippet -- so we have no way of knowing what is going on just from these few lines of code.
Post by Diego Avesani
Dear all,
I have a MPI program with three groups with some CPUs in common.
I have some problem with MPI_barrier.
INTEGER :: MPI_GROUP_WORLD
INTEGER :: MPI_LOCAL_COMM
INTEGER :: MPI_MASTER_COMM
IF(MPIworld%rank.EQ.0) WRITE(*,*)
CALL MPI_Barrier(MPI_COMM_WORLD,MPIworld%iErr)
IF(MPI_COMM_NULL .NE. MPI_MASTER_COMM)THEN
WRITE(*,'(A12,I3,A4,F10.5)') 'master rank',MPImaster%rank,'eff',eff
ENDIF
CALL MPI_Barrier(MPI_COMM_WORLD,MPIworld%iErr)
IF(MPIworld%rank.EQ.0) WRITE(*,*)
What could be the problem?
Thanks a lot,
Diego
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
--
Jeff Squyres
***@cisco.com
Diego Avesani
2018-08-10 22:27:34 UTC
Permalink
Dear Jeff,
you are right.

The question is:
Is it possible to have a barrier for all CPUs despite they belong to
different group?
If the answer is yes I will go in more details.

Thank a lot

Diego


On 10 August 2018 at 19:49, Jeff Squyres (jsquyres) via users <
Post by Jeff Squyres (jsquyres) via users
I'm not quite clear what the problem is that you're running in to -- you
just said that there is "some problem with MPI_barrier".
What problem, exactly, is happening with your code? Be as precise and
specific as possible.
It's kinda hard to tell what is happening in the code snippet below
because there's a lot of variables used that are not defined in your
snippet -- so we have no way of knowing what is going on just from these
few lines of code.
Post by Diego Avesani
Dear all,
I have a MPI program with three groups with some CPUs in common.
I have some problem with MPI_barrier.
INTEGER :: MPI_GROUP_WORLD
INTEGER :: MPI_LOCAL_COMM
INTEGER :: MPI_MASTER_COMM
IF(MPIworld%rank.EQ.0) WRITE(*,*)
CALL MPI_Barrier(MPI_COMM_WORLD,MPIworld%iErr)
IF(MPI_COMM_NULL .NE. MPI_MASTER_COMM)THEN
WRITE(*,'(A12,I3,A4,F10.5)') 'master
rank',MPImaster%rank,'eff',eff
Post by Diego Avesani
ENDIF
CALL MPI_Barrier(MPI_COMM_WORLD,MPIworld%iErr)
IF(MPIworld%rank.EQ.0) WRITE(*,*)
What could be the problem?
Thanks a lot,
Diego
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
--
Jeff Squyres
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Jeff Squyres (jsquyres) via users
2018-08-11 12:03:34 UTC
Permalink
Is it possible to have a barrier for all CPUs despite they belong to different group?
If the answer is yes I will go in more details.
By "CPUs", I assume you mean "MPI processes", right? (i.e., not threads inside an individual MPI process)

Again, this is not quite a specific-enough question. Do the different groups (and I assume you really mean communicators) overlap? Are they disjoint? Is there a reason MPI_COMM_WORLD is not sufficient?

There are two typical ways to barrier a set of MPI processes.

1. Write your own algorithm to do sends / receives -- and possibly even collectives -- to ensure that no process leaves the barrier before every process enters the barrier.

2. Make sure that you have a communicator that includes exactly the set of processes that you want (and if you don't have a communicator fitting this description, make one), and then call MPI_BARRIER on it.

#2 is typically the easier solution.
--
Jeff Squyres
***@cisco.com
Diego Avesani
2018-08-12 18:18:55 UTC
Permalink
Dear all, Dear Jeff,
I have three communicator:

the standard one:
MPI_COMM_WORLD

and other two:
MPI_LOCAL_COMM
MPI_MASTER_COMM

a sort of two-level MPI.

Suppose to have 8 threats,
I use 4 threats for run the same problem with different value. These are
the LOCAL_COMM.
In addition I have a MPI_MASTER_COMM to allow the master of each group to
communicate.

These give me some problem.


For example, I have to exit to a cycle, according to a check:

IF(counter.GE.npercstop*nParticles)THEN
flag2exit=1
WRITE(*,*) '-Warning PSO has been exit'
EXIT pso_cycle
ENDIF

But this is difficult to do since I have to exit only after all the threats
inside a set have finish their task.

Do you have some suggestions?
Do you need other information?

Really Really thanks



Diego
Post by Diego Avesani
Post by Diego Avesani
Is it possible to have a barrier for all CPUs despite they belong to
different group?
Post by Diego Avesani
If the answer is yes I will go in more details.
By "CPUs", I assume you mean "MPI processes", right? (i.e., not threads
inside an individual MPI process)
Again, this is not quite a specific-enough question. Do the different
groups (and I assume you really mean communicators) overlap? Are they
disjoint? Is there a reason MPI_COMM_WORLD is not sufficient?
There are two typical ways to barrier a set of MPI processes.
1. Write your own algorithm to do sends / receives -- and possibly even
collectives -- to ensure that no process leaves the barrier before every
process enters the barrier.
2. Make sure that you have a communicator that includes exactly the set of
processes that you want (and if you don't have a communicator fitting this
description, make one), and then call MPI_BARRIER on it.
#2 is typically the easier solution.
--
Jeff Squyres
Jeff Squyres (jsquyres) via users
2018-08-13 17:06:19 UTC
Permalink
Post by Diego Avesani
Dear all, Dear Jeff,
MPI_COMM_WORLD
MPI_LOCAL_COMM
MPI_MASTER_COMM
a sort of two-level MPI.
Suppose to have 8 threats,
I use 4 threats for run the same problem with different value. These are the LOCAL_COMM.
In addition I have a MPI_MASTER_COMM to allow the master of each group to communicate.
I don't understand what you're trying to convey here, sorry. Can you draw it, perhaps?

(I am assuming you mean "threads", not "threats")
Post by Diego Avesani
These give me some problem.
IF(counter.GE.npercstop*nParticles)THEN
flag2exit=1
WRITE(*,*) '-Warning PSO has been exit'
EXIT pso_cycle
ENDIF
But this is difficult to do since I have to exit only after all the threats inside a set have finish their task.
Do you have some suggestions?
Do you need other information?
--
Jeff Squyres
***@cisco.com
Diego Avesani
2018-08-13 21:07:25 UTC
Permalink
dear Jeff, dear all,

its my fault.

Can I send an attachment?
thanks

Diego
Post by Diego Avesani
Post by Diego Avesani
Dear all, Dear Jeff,
MPI_COMM_WORLD
MPI_LOCAL_COMM
MPI_MASTER_COMM
a sort of two-level MPI.
Suppose to have 8 threats,
I use 4 threats for run the same problem with different value. These are
the LOCAL_COMM.
Post by Diego Avesani
In addition I have a MPI_MASTER_COMM to allow the master of each group
to communicate.
I don't understand what you're trying to convey here, sorry. Can you draw it, perhaps?
(I am assuming you mean "threads", not "threats")
Post by Diego Avesani
These give me some problem.
IF(counter.GE.npercstop*nParticles)THEN
flag2exit=1
WRITE(*,*) '-Warning PSO has been exit'
EXIT pso_cycle
ENDIF
But this is difficult to do since I have to exit only after all the
threats inside a set have finish their task.
Post by Diego Avesani
Do you have some suggestions?
Do you need other information?
--
Jeff Squyres
Gilles Gouaillardet
2018-08-13 21:25:09 UTC
Permalink
Diego,

Since this question is not Open MPI specific, Stack Overflow (or similar
forum) is a better place to ask.
Make sure you first read https://stackoverflow.com/help/mcve

Feel free to post us a link to your question.


Cheers,

Gilles
Post by Diego Avesani
dear Jeff, dear all,
its my fault.
Can I send an attachment?
thanks
Diego
Post by Diego Avesani
Post by Diego Avesani
Dear all, Dear Jeff,
MPI_COMM_WORLD
MPI_LOCAL_COMM
MPI_MASTER_COMM
a sort of two-level MPI.
Suppose to have 8 threats,
I use 4 threats for run the same problem with different value. These
are the LOCAL_COMM.
Post by Diego Avesani
In addition I have a MPI_MASTER_COMM to allow the master of each group
to communicate.
I don't understand what you're trying to convey here, sorry. Can you draw it, perhaps?
(I am assuming you mean "threads", not "threats")
Post by Diego Avesani
These give me some problem.
IF(counter.GE.npercstop*nParticles)THEN
flag2exit=1
WRITE(*,*) '-Warning PSO has been exit'
EXIT pso_cycle
ENDIF
But this is difficult to do since I have to exit only after all the
threats inside a set have finish their task.
Post by Diego Avesani
Do you have some suggestions?
Do you need other information?
--
Jeff Squyres
George Reeke
2018-08-13 21:44:36 UTC
Permalink
On Aug 12, 2018, at 2:18 PM, Diego Avesani
Post by Diego Avesani
For example, I have to exit to a cycle, according to a
IF(counter.GE.npercstop*nParticles)THEN
flag2exit=1
WRITE(*,*) '-Warning PSO has been exit'
EXIT pso_cycle
ENDIF
But this is difficult to do since I have to exit only after
all the threats inside a set have finish their task.
Post by Diego Avesani
Do you have some suggestions?
Do you need other information?
Dear Diego et al,
Assuming I understand your problem:
The way I do this is set up one process that is responsible for normal
and error exits. It sits looking for messages from all the other ranks
that are doing work. Certain messages are defined to indicate an error
exit with an error number or some text. The exit process is spawned by
the master process at startup and is told how many working processes are
there. Each process either sends an OK exit when it is done or an error
message. The exit process counts these exit messages and when the count
equals the number of working processes, it prints any/all errors, then
sends messages back to all the working processes, which, at this time,
should be waiting for these and they can terminate with MPI_Finalize.
Of course it is more complicated than that to handle special cases
like termination before everything has really started or when the
protocol is not followed, debug messages that do not initiate
termination, etc. but maybe this will give you an idea for one
way to deal with this issue.
George Reeke
Diego Avesani
2018-08-20 06:56:58 UTC
Permalink
Dear George, Dear Gilles, Dear Jeff, Deal all,

Thank for all the suggestions.
The problem is that I do not want to FINALIZE, but only to exit from a
cycle.
This is my code:
I have:
master_group;
each master sends to its slaves only some values;
the slaves perform something;
according to a counter, every processor has to leave a cycle.

Here an example, if you want I can give you more details.

DO iRun=1,nRun
!
IF(MPI_COMM_NULL .NE. MPI_MASTER_COMM)THEN
VARS(1) = REAL(iRun+1)
VARS(2) = REAL(iRun+100)
VARS(3) = REAL(iRun+200)
VARS(4) = REAL(iRun+300)
ENDIF
!
CALL MPI_BCAST(VARS,4,MPI_DOUBLE_PRECISION,0,MPI_LOCAL_COMM,iErr)
!
test = SUM(VARS)
!
CALL MPI_ALLREDUCE(test, test, 1, MPI_DOUBLE_PRECISION, MPI_SUM,
MPI_LOCAL_COMM,iErr)
!
!
counter = test
!
CALL MPI_ALLREDUCE(counter, counter, 1, MPI_DOUBLE_PRECISION, MPI_SUM,
MPI_MASTER_COMM,iErr)
!
IF(counter.GT.10000)THEN
EXIT
ENDIF
ENDDO

My original code stucks on the cycle and I do not know why.

Thanks





Diego
Post by George Reeke
On Aug 12, 2018, at 2:18 PM, Diego Avesani
Post by Diego Avesani
For example, I have to exit to a cycle, according to a
IF(counter.GE.npercstop*nParticles)THEN
flag2exit=1
WRITE(*,*) '-Warning PSO has been exit'
EXIT pso_cycle
ENDIF
But this is difficult to do since I have to exit only after
all the threats inside a set have finish their task.
Post by Diego Avesani
Do you have some suggestions?
Do you need other information?
Dear Diego et al,
The way I do this is set up one process that is responsible for normal
and error exits. It sits looking for messages from all the other ranks
that are doing work. Certain messages are defined to indicate an error
exit with an error number or some text. The exit process is spawned by
the master process at startup and is told how many working processes are
there. Each process either sends an OK exit when it is done or an error
message. The exit process counts these exit messages and when the count
equals the number of working processes, it prints any/all errors, then
sends messages back to all the working processes, which, at this time,
should be waiting for these and they can terminate with MPI_Finalize.
Of course it is more complicated than that to handle special cases
like termination before everything has really started or when the
protocol is not followed, debug messages that do not initiate
termination, etc. but maybe this will give you an idea for one
way to deal with this issue.
George Reeke
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Gilles Gouaillardet
2018-08-20 07:17:09 UTC
Permalink
Diego,


first, try using MPI_IN_PLACE when sendbuffer and recvbuffer are identical


at first glance, the second allreduce should be in MPI_COMM_WORLD (with
counter=0 when master_comm is null),

or you have to add an extra broadcast in local_comm


Cheers,


Gilles
Post by Diego Avesani
Dear George, Dear Gilles, Dear Jeff, Deal all,
Thank for all the suggestions.
The problem is that I do not want to FINALIZE, but only to exit from a
cycle.
master_group;
each master sends to its slaves only some values;
the slaves perform something;
according to a counter, every processor has to leave a cycle.
Here an example, if you want I can give you more details.
DO iRun=1,nRun
   !
   IF(MPI_COMM_NULL .NE. MPI_MASTER_COMM)THEN
      VARS(1) = REAL(iRun+1)
      VARS(2) = REAL(iRun+100)
      VARS(3) = REAL(iRun+200)
      VARS(4) = REAL(iRun+300)
   ENDIF
   !
   CALL MPI_BCAST(VARS,4,MPI_DOUBLE_PRECISION,0,MPI_LOCAL_COMM,iErr)
   !
   test = SUM(VARS)
   !
   CALL MPI_ALLREDUCE(test, test, 1, MPI_DOUBLE_PRECISION, MPI_SUM,
MPI_LOCAL_COMM,iErr)
   !
   !
   counter = test
   !
   CALL MPI_ALLREDUCE(counter, counter, 1, MPI_DOUBLE_PRECISION,
MPI_SUM, MPI_MASTER_COMM,iErr)
   !
   IF(counter.GT.10000)THEN
      EXIT
   ENDIF
ENDDO
My original code stucks on the cycle and I do not know why.
Thanks
Diego
         On Aug 12, 2018, at 2:18 PM, Diego Avesani
         >
         > For example, I have to exit to a cycle, according to a
         >
         > IF(counter.GE.npercstop*nParticles)THEN
         >         flag2exit=1
         >         WRITE(*,*) '-Warning PSO has been exit'
         >         EXIT pso_cycle
         >      ENDIF
         >
         > But this is difficult to do since I have to exit only
after
         all the threats inside a set have finish their task.
         >
         > Do you have some suggestions?
         > Do you need other information?
Dear Diego et al,
The way I do this is set up one process that is responsible for normal
and error exits.  It sits looking for messages from all the other
ranks
that are doing work.  Certain messages are defined to indicate an
error
exit with an error number or some text.  The exit process is
spawned by
the master process at startup and is told how many working
processes are
there.  Each process either sends an OK exit when it is done or an
error
message.  The exit process counts these exit messages and when the
count
equals the number of working processes, it prints any/all errors, then
sends messages back to all the working processes, which, at this time,
should be waiting for these and they can terminate with MPI_Finalize.
   Of course it is more complicated than that to handle special cases
like termination before everything has really started or when the
protocol is not followed, debug messages that do not initiate
termination, etc. but maybe this will give you an idea for one
way to deal with this issue.
George Reeke
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
<https://lists.open-mpi.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Diego Avesani
2018-08-21 16:01:22 UTC
Permalink
dear all,

"allreduce should be in MPI_COMM_WORLD"

I think that you have find the problem.
However, in my original code, the counter information belongs only to the
master group.
should I share that information with the slaves of each masters?

thanks again



Diego
Post by Gilles Gouaillardet
Diego,
first, try using MPI_IN_PLACE when sendbuffer and recvbuffer are identical
at first glance, the second allreduce should be in MPI_COMM_WORLD (with
counter=0 when master_comm is null),
or you have to add an extra broadcast in local_comm
Cheers,
Gilles
Post by Diego Avesani
Dear George, Dear Gilles, Dear Jeff, Deal all,
Thank for all the suggestions.
The problem is that I do not want to FINALIZE, but only to exit from a
cycle.
master_group;
each master sends to its slaves only some values;
the slaves perform something;
according to a counter, every processor has to leave a cycle.
Here an example, if you want I can give you more details.
DO iRun=1,nRun
!
IF(MPI_COMM_NULL .NE. MPI_MASTER_COMM)THEN
VARS(1) = REAL(iRun+1)
VARS(2) = REAL(iRun+100)
VARS(3) = REAL(iRun+200)
VARS(4) = REAL(iRun+300)
ENDIF
!
CALL MPI_BCAST(VARS,4,MPI_DOUBLE_PRECISION,0,MPI_LOCAL_COMM,iErr)
!
test = SUM(VARS)
!
CALL MPI_ALLREDUCE(test, test, 1, MPI_DOUBLE_PRECISION, MPI_SUM,
MPI_LOCAL_COMM,iErr)
!
!
counter = test
!
CALL MPI_ALLREDUCE(counter, counter, 1, MPI_DOUBLE_PRECISION, MPI_SUM,
MPI_MASTER_COMM,iErr)
!
IF(counter.GT.10000)THEN
EXIT
ENDIF
ENDDO
My original code stucks on the cycle and I do not know why.
Thanks
Diego
On Aug 12, 2018, at 2:18 PM, Diego Avesani
Post by Diego Avesani
For example, I have to exit to a cycle, according to a
IF(counter.GE.npercstop*nParticles)THEN
flag2exit=1
WRITE(*,*) '-Warning PSO has been exit'
EXIT pso_cycle
ENDIF
But this is difficult to do since I have to exit only
after
all the threats inside a set have finish their task.
Post by Diego Avesani
Do you have some suggestions?
Do you need other information?
Dear Diego et al,
The way I do this is set up one process that is responsible for normal
and error exits. It sits looking for messages from all the other ranks
that are doing work. Certain messages are defined to indicate an error
exit with an error number or some text. The exit process is spawned by
the master process at startup and is told how many working processes are
there. Each process either sends an OK exit when it is done or an error
message. The exit process counts these exit messages and when the count
equals the number of working processes, it prints any/all errors, then
sends messages back to all the working processes, which, at this time,
should be waiting for these and they can terminate with MPI_Finalize.
Of course it is more complicated than that to handle special cases
like termination before everything has really started or when the
protocol is not followed, debug messages that do not initiate
termination, etc. but maybe this will give you an idea for one
way to deal with this issue.
George Reeke
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
<https://lists.open-mpi.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Loading...