Discussion:
[OMPI users] Forcing MPI processes to end
Adam Sylvester
2017-11-16 17:37:49 UTC
Permalink
I'm using Open MPI 2.1.0 for this but I'm not sure if this is more of an
Open MPI-specific implementation question or what the MPI standard
guarantees.

I have an application which runs across multiple ranks, eventually reaching
an MPI_Gather() call. Along the way, if one of the ranks encounters an
error, it will call report the error to a log, call MPI_Finalize(), and
exit with a non-zero return code. If this happens prior to the other ranks
making it to the gather, it seems like mpirun notices this and the process
ends on all ranks. This is what I want to happen - it's a legitimate
error, so all processes should be freed up so the next job can run. It
seems like if the other ranks make it into the MPI_Gather() before the one
rank reports an error, the other ranks wait in the MPI_Gather() forever.

Is there something simple I can do to guarantee that if any process calls
MPI_Finalize(), all my ranks terminate?

Thanks.
-Adam
Aurelien Bouteiller
2017-11-16 18:27:02 UTC
Permalink
Adam. Your MPI program is incorrect. You need to replace the finalize on
the process that found the error with MPIAbort
Post by Adam Sylvester
I'm using Open MPI 2.1.0 for this but I'm not sure if this is more of an
Open MPI-specific implementation question or what the MPI standard
guarantees.
I have an application which runs across multiple ranks, eventually
reaching an MPI_Gather() call. Along the way, if one of the ranks
encounters an error, it will call report the error to a log, call
MPI_Finalize(), and exit with a non-zero return code. If this happens
prior to the other ranks making it to the gather, it seems like mpirun
notices this and the process ends on all ranks. This is what I want to
happen - it's a legitimate error, so all processes should be freed up so
the next job can run. It seems like if the other ranks make it into the
MPI_Gather() before the one rank reports an error, the other ranks wait in
the MPI_Gather() forever.
Is there something simple I can do to guarantee that if any process calls
MPI_Finalize(), all my ranks terminate?
Thanks.
-Adam
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Adam Sylvester
2017-11-17 16:04:35 UTC
Permalink
Thanks - that's exactly what I needed! Works as advertised. :o)
Post by Aurelien Bouteiller
Adam. Your MPI program is incorrect. You need to replace the finalize on
the process that found the error with MPIAbort
Post by Adam Sylvester
I'm using Open MPI 2.1.0 for this but I'm not sure if this is more of an
Open MPI-specific implementation question or what the MPI standard
guarantees.
I have an application which runs across multiple ranks, eventually
reaching an MPI_Gather() call. Along the way, if one of the ranks
encounters an error, it will call report the error to a log, call
MPI_Finalize(), and exit with a non-zero return code. If this happens
prior to the other ranks making it to the gather, it seems like mpirun
notices this and the process ends on all ranks. This is what I want to
happen - it's a legitimate error, so all processes should be freed up so
the next job can run. It seems like if the other ranks make it into the
MPI_Gather() before the one rank reports an error, the other ranks wait in
the MPI_Gather() forever.
Is there something simple I can do to guarantee that if any process calls
MPI_Finalize(), all my ranks terminate?
Thanks.
-Adam
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Loading...