Discussion:
[OMPI users] Unable to spawn MPI processes on multiple nodes with recent version of OpenMPI
Andrew Benson
2018-09-15 20:11:40 UTC
Permalink
I'm running into problems trying to spawn MPI processes across multiple nodes
on a cluster using recent versions of OpenMPI. Specifically, using the attached
Fortan code, compiled using OpenMPI 3.1.2 with:

mpif90 test.F90 -o test.exe

and run via a PBS scheduler using the attached test1.pbs, it fails as can be
seen in the attached testFAIL.err file.

If I do the same but using OpenMPI v1.10.3 then it works successfully, giving
me the output in the attached testSUCCESS.err file.

From testing a few different versions of OpenMPI it seems that the behavior
changed between v1.10.7 and v2.0.4.

Is there some change in options needed to make this work with newer OpenMPIs?

Thanks for any help you can offer!

-Andrew
Andrew Benson
2018-09-15 20:46:15 UTC
Permalink
I'm running into problems trying to spawn MPI processes across multiple nodes
on a cluster using recent versions of OpenMPI. Specifically, using the attached
Fortan code, compiled using OpenMPI 3.1.2 with:

mpif90 test.F90 -o test.exe

and run via a PBS scheduler using the attached test1.pbs, it fails as can be
seen in the attached testFAIL.err file.

If I do the same but using OpenMPI v1.10.3 then it works successfully, giving
me the output in the attached testSUCCESS.err file.

From testing a few different versions of OpenMPI it seems that the behavior
changed between v1.10.7 and v2.0.4.

Is there some change in options needed to make this work with newer OpenMPIs?

Output from omp_info --all is attached. config.log can be found here:

http://users.obs.carnegiescience.edu/abenson/config.log.bz2

Thanks for any help you can offer!

-Andrew
Ralph H Castain
2018-09-16 14:03:15 UTC
Permalink
I see you are using “preconnect_all” - that is the source of the trouble. I don’t believe we have tested that option in years and the code is almost certainly dead. I’d suggest removing that option and things should work.
Post by Andrew Benson
I'm running into problems trying to spawn MPI processes across multiple nodes
on a cluster using recent versions of OpenMPI. Specifically, using the attached
mpif90 test.F90 -o test.exe
and run via a PBS scheduler using the attached test1.pbs, it fails as can be
seen in the attached testFAIL.err file.
If I do the same but using OpenMPI v1.10.3 then it works successfully, giving
me the output in the attached testSUCCESS.err file.
From testing a few different versions of OpenMPI it seems that the behavior
changed between v1.10.7 and v2.0.4.
Is there some change in options needed to make this work with newer OpenMPIs?
http://users.obs.carnegiescience.edu/abenson/config.log.bz2
Thanks for any help you can offer!
-Andrew<ompi_info.log.bz2><test.F90><test1.pbs><testFAIL.err.bz2><testSUCCESS.err.bz2>_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Andrew Benson
2018-09-16 19:33:28 UTC
Permalink
Thanks - I'll try removing that option.
Post by Ralph H Castain
I see you are using “preconnect_all” - that is the source of the trouble. I
don’t believe we have tested that option in years and the code is almost
certainly dead. I’d suggest removing that option and things should work.
Post by Andrew Benson
I'm running into problems trying to spawn MPI processes across multiple
nodes on a cluster using recent versions of OpenMPI. Specifically, using
mpif90 test.F90 -o test.exe
and run via a PBS scheduler using the attached test1.pbs, it fails as can
be seen in the attached testFAIL.err file.
If I do the same but using OpenMPI v1.10.3 then it works successfully,
giving me the output in the attached testSUCCESS.err file.
From testing a few different versions of OpenMPI it seems that the behavior
changed between v1.10.7 and v2.0.4.
Is there some change in options needed to make this work with newer OpenMPIs?
http://users.obs.carnegiescience.edu/abenson/config.log.bz2
Thanks for any help you can offer!
-Andrew<ompi_info.log.bz2><test.F90><test1.pbs><testFAIL.err.bz2><testSUCC
ESS.err.bz2>_______________________________________________ users mailing
list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
--
* Andrew Benson: http://users.obs.carnegiescience.edu/abenson/contact.html

* Galacticus: https://bitbucket.org/abensonca/galacticus
Andrew Benson
2018-09-17 00:01:07 UTC
Permalink
Removing the preconnect_all option didn't resolve the problem unfortunately.

I tried changing a few of the other options that I pass to mpirun. What does
seem to make a difference is the "--map-by node" option. If I remove that
option that my test code runs successfully - the output is in the attached
test.err file.

Ideally I'd like to be able to use "--map-by node" so that the initial
processes are distributed across the available resources. Is there some reason
why the child processes would be unable to communicate when "--map-by node" is
used?

-Andrew
I see you are using “preconnect_all” - that is the source of the trouble. I
don’t believe we have tested that option in years and the code is almost
certainly dead. I’d suggest removing that option and things should work.
Post by Andrew Benson
I'm running into problems trying to spawn MPI processes across multiple
nodes on a cluster using recent versions of OpenMPI. Specifically, using
mpif90 test.F90 -o test.exe
and run via a PBS scheduler using the attached test1.pbs, it fails as can
be seen in the attached testFAIL.err file.
If I do the same but using OpenMPI v1.10.3 then it works successfully,
giving me the output in the attached testSUCCESS.err file.
From testing a few different versions of OpenMPI it seems that the behavior
changed between v1.10.7 and v2.0.4.
Is there some change in options needed to make this work with newer OpenMPIs?
http://users.obs.carnegiescience.edu/abenson/config.log.bz2
Thanks for any help you can offer!
-Andrew<ompi_info.log.bz2><test.F90><test1.pbs><testFAIL.err.bz2><testSUCC
ESS.err.bz2>_______________________________________________ users mailing
list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
--
* Andrew Benson: http://users.obs.carnegiescience.edu/abenson/contact.html

* Galacticus: https://bitbucket.org/abensonca/galacticus
Andrew Benson
2018-09-19 14:59:44 UTC
Permalink
On further investigation removing the "preconnect_all" option does change the
problem at least. Without "preconnect_all" I no longer see:

--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications. This means that no Open MPI device has indicated
that it can be used to communicate between these processes. This is
an error; Open MPI requires that all MPI processes be able to reach
each other. This error can sometimes be the result of forgetting to
specify the "self" BTL.

Process 1 ([[32179,2],15]) is on host: node092
Process 2 ([[32179,2],0]) is on host: unknown!
BTLs attempted: self tcp vader

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------


Instead it hangs for several minutes and finally aborts with:

--------------------------------------------------------------------------
A request has timed out and will therefore fail:

Operation: LOOKUP: orted/pmix/pmix_server_pub.c:345

Your job may terminate as a result of this problem. You may want to
adjust the MCA parameter pmix_server_max_wait and try again. If this
occurred during a connect/accept operation, you can adjust that time
using the pmix_base_exchange_timeout parameter.
--------------------------------------------------------------------------
[node091:19470] *** An error occurred in MPI_Comm_spawn
[node091:19470] *** reported by process [1614086145,0]
[node091:19470] *** on communicator MPI_COMM_WORLD
[node091:19470] *** MPI_ERR_UNKNOWN: unknown error
[node091:19470] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will
now abort,
[node091:19470] *** and potentially your MPI job)

I've tried increasing both pmix_server_max_wait and pmix_base_exchange_timeout
as suggested in the error message, but the result is unchanged (it just takes
longer to time out).

Once again, if I remove "--map-by node" it runs successfully.

-Andrew
Post by Ralph H Castain
I see you are using “preconnect_all” - that is the source of the trouble. I
don’t believe we have tested that option in years and the code is almost
certainly dead. I’d suggest removing that option and things should work.
Post by Andrew Benson
I'm running into problems trying to spawn MPI processes across multiple
nodes on a cluster using recent versions of OpenMPI. Specifically, using
mpif90 test.F90 -o test.exe
and run via a PBS scheduler using the attached test1.pbs, it fails as can
be seen in the attached testFAIL.err file.
If I do the same but using OpenMPI v1.10.3 then it works successfully,
giving me the output in the attached testSUCCESS.err file.
From testing a few different versions of OpenMPI it seems that the behavior
changed between v1.10.7 and v2.0.4.
Is there some change in options needed to make this work with newer OpenMPIs?
http://users.obs.carnegiescience.edu/abenson/config.log.bz2
Thanks for any help you can offer!
-Andrew<ompi_info.log.bz2><test.F90><test1.pbs><testFAIL.err.bz2><testSUCC
ESS.err.bz2>_______________________________________________ users mailing
list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
--
* Andrew Benson: http://users.obs.carnegiescience.edu/abenson/contact.html

* Galacticus: https://bitbucket.org/abensonca/galacticus
Andrew Benson
2018-09-19 15:00:27 UTC
Permalink
On further investigation removing the "preconnect_all" option does change the
problem at least. Without "preconnect_all" I no longer see:

--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications. This means that no Open MPI device has indicated
that it can be used to communicate between these processes. This is
an error; Open MPI requires that all MPI processes be able to reach
each other. This error can sometimes be the result of forgetting to
specify the "self" BTL.

Process 1 ([[32179,2],15]) is on host: node092
Process 2 ([[32179,2],0]) is on host: unknown!
BTLs attempted: self tcp vader

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------


Instead it hangs for several minutes and finally aborts with:

--------------------------------------------------------------------------
A request has timed out and will therefore fail:

Operation: LOOKUP: orted/pmix/pmix_server_pub.c:345

Your job may terminate as a result of this problem. You may want to
adjust the MCA parameter pmix_server_max_wait and try again. If this
occurred during a connect/accept operation, you can adjust that time
using the pmix_base_exchange_timeout parameter.
--------------------------------------------------------------------------
[node091:19470] *** An error occurred in MPI_Comm_spawn
[node091:19470] *** reported by process [1614086145,0]
[node091:19470] *** on communicator MPI_COMM_WORLD
[node091:19470] *** MPI_ERR_UNKNOWN: unknown error
[node091:19470] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will
now abort,
[node091:19470] *** and potentially your MPI job)

I've tried increasing both pmix_server_max_wait and pmix_base_exchange_timeout
as suggested in the error message, but the result is unchanged (it just takes
longer to time out).

Once again, if I remove "--map-by node" it runs successfully.

-Andrew
Post by Ralph H Castain
I see you are using “preconnect_all” - that is the source of the trouble. I
don’t believe we have tested that option in years and the code is almost
certainly dead. I’d suggest removing that option and things should work.
Post by Andrew Benson
I'm running into problems trying to spawn MPI processes across multiple
nodes on a cluster using recent versions of OpenMPI. Specifically, using
mpif90 test.F90 -o test.exe
and run via a PBS scheduler using the attached test1.pbs, it fails as can
be seen in the attached testFAIL.err file.
If I do the same but using OpenMPI v1.10.3 then it works successfully,
giving me the output in the attached testSUCCESS.err file.
From testing a few different versions of OpenMPI it seems that the behavior
changed between v1.10.7 and v2.0.4.
Is there some change in options needed to make this work with newer OpenMPIs?
http://users.obs.carnegiescience.edu/abenson/config.log.bz2
Thanks for any help you can offer!
-Andrew<ompi_info.log.bz2><test.F90><test1.pbs><testFAIL.err.bz2><testSUCC
ESS.err.bz2>_______________________________________________ users mailing
list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
--
* Andrew Benson: http://users.obs.carnegiescience.edu/abenson/contact.html

* Galacticus: https://bitbucket.org/abensonca/galacticus
Ralph H Castain
2018-10-06 16:02:47 UTC
Permalink
Sorry for delay - this should be fixed by https://github.com/open-mpi/ompi/pull/5854
Post by Andrew Benson
On further investigation removing the "preconnect_all" option does change the
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications. This means that no Open MPI device has indicated
that it can be used to communicate between these processes. This is
an error; Open MPI requires that all MPI processes be able to reach
each other. This error can sometimes be the result of forgetting to
specify the "self" BTL.
Process 1 ([[32179,2],15]) is on host: node092
Process 2 ([[32179,2],0]) is on host: unknown!
BTLs attempted: self tcp vader
Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Operation: LOOKUP: orted/pmix/pmix_server_pub.c:345
Your job may terminate as a result of this problem. You may want to
adjust the MCA parameter pmix_server_max_wait and try again. If this
occurred during a connect/accept operation, you can adjust that time
using the pmix_base_exchange_timeout parameter.
--------------------------------------------------------------------------
[node091:19470] *** An error occurred in MPI_Comm_spawn
[node091:19470] *** reported by process [1614086145,0]
[node091:19470] *** on communicator MPI_COMM_WORLD
[node091:19470] *** MPI_ERR_UNKNOWN: unknown error
[node091:19470] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will
now abort,
[node091:19470] *** and potentially your MPI job)
I've tried increasing both pmix_server_max_wait and pmix_base_exchange_timeout
as suggested in the error message, but the result is unchanged (it just takes
longer to time out).
Once again, if I remove "--map-by node" it runs successfully.
-Andrew
Post by Ralph H Castain
I see you are using “preconnect_all” - that is the source of the trouble. I
don’t believe we have tested that option in years and the code is almost
certainly dead. I’d suggest removing that option and things should work.
Post by Andrew Benson
I'm running into problems trying to spawn MPI processes across multiple
nodes on a cluster using recent versions of OpenMPI. Specifically, using
mpif90 test.F90 -o test.exe
and run via a PBS scheduler using the attached test1.pbs, it fails as can
be seen in the attached testFAIL.err file.
If I do the same but using OpenMPI v1.10.3 then it works successfully,
giving me the output in the attached testSUCCESS.err file.
From testing a few different versions of OpenMPI it seems that the behavior
changed between v1.10.7 and v2.0.4.
Is there some change in options needed to make this work with newer OpenMPIs?
http://users.obs.carnegiescience.edu/abenson/config.log.bz2
Thanks for any help you can offer!
-Andrew<ompi_info.log.bz2><test.F90><test1.pbs><testFAIL.err.bz2><testSUCC
ESS.err.bz2>_______________________________________________ users mailing
list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
--
* Andrew Benson: http://users.obs.carnegiescience.edu/abenson/contact.html
* Galacticus: https://bitbucket.org/abensonca/galacticus
Ralph H Castain
2018-10-06 17:02:49 UTC
Permalink
Just FYI: on master (and perhaps 4.0), child jobs do not inherit their parent's mapping policy by default. You have to add “-mca rmaps_base_inherit 1” to your mpirun cmd line.
Thanks, I'll try this right away.
Thanks,
Andrew
--
* Andrew Benson: http://users.obs.carnegiescience.edu/abenson/contact.html <http://users.obs.carnegiescience.edu/abenson/contact.html>
* Galacticus: http://sites.google.com/site/galacticusmodel <http://sites.google.com/site/galacticusmodel>
Sorry for delay - this should be fixed by https://github.com/open-mpi/ompi/pull/5854 <https://github.com/open-mpi/ompi/pull/5854>
Post by Andrew Benson
On further investigation removing the "preconnect_all" option does change the
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications. This means that no Open MPI device has indicated
that it can be used to communicate between these processes. This is
an error; Open MPI requires that all MPI processes be able to reach
each other. This error can sometimes be the result of forgetting to
specify the "self" BTL.
Process 1 ([[32179,2],15]) is on host: node092
Process 2 ([[32179,2],0]) is on host: unknown!
BTLs attempted: self tcp vader
Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Operation: LOOKUP: orted/pmix/pmix_server_pub.c:345
Your job may terminate as a result of this problem. You may want to
adjust the MCA parameter pmix_server_max_wait and try again. If this
occurred during a connect/accept operation, you can adjust that time
using the pmix_base_exchange_timeout parameter.
--------------------------------------------------------------------------
[node091:19470] *** An error occurred in MPI_Comm_spawn
[node091:19470] *** reported by process [1614086145,0]
[node091:19470] *** on communicator MPI_COMM_WORLD
[node091:19470] *** MPI_ERR_UNKNOWN: unknown error
[node091:19470] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will
now abort,
[node091:19470] *** and potentially your MPI job)
I've tried increasing both pmix_server_max_wait and pmix_base_exchange_timeout
as suggested in the error message, but the result is unchanged (it just takes
longer to time out).
Once again, if I remove "--map-by node" it runs successfully.
-Andrew
I see you are using “preconnect_all” - that is the source of the trouble. I
don’t believe we have tested that option in years and the code is almost
certainly dead. I’d suggest removing that option and things should work.
Post by Andrew Benson
I'm running into problems trying to spawn MPI processes across multiple
nodes on a cluster using recent versions of OpenMPI. Specifically, using
mpif90 test.F90 -o test.exe
and run via a PBS scheduler using the attached test1.pbs, it fails as can
be seen in the attached testFAIL.err file.
If I do the same but using OpenMPI v1.10.3 then it works successfully,
giving me the output in the attached testSUCCESS.err file.
From testing a few different versions of OpenMPI it seems that the behavior
changed between v1.10.7 and v2.0.4.
Is there some change in options needed to make this work with newer OpenMPIs?
http://users.obs.carnegiescience.edu/abenson/config.log.bz2 <http://users.obs.carnegiescience.edu/abenson/config.log.bz2>
Thanks for any help you can offer!
-Andrew<ompi_info.log.bz2><test.F90><test1.pbs><testFAIL.err.bz2><testSUCC
ESS.err.bz2>_______________________________________________ users mailing
list
https://lists.open-mpi.org/mailman/listinfo/users <https://lists.open-mpi.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users <https://lists.open-mpi.org/mailman/listinfo/users>
--
* Andrew Benson: http://users.obs.carnegiescience.edu/abenson/contact.html <http://users.obs.carnegiescience.edu/abenson/contact.html>
* Galacticus: https://bitbucket.org/abensonca/galacticus <https://bitbucket.org/abensonca/galacticus>
Andrew Benson
2018-10-06 17:04:38 UTC
Permalink
Ok, thanks - that's good to know.

-Andrew


--

* Andrew Benson: http://users.obs.carnegiescience.edu/abenson/contact.html

* Galacticus: http://sites.google.com/site/galacticusmodel
Post by Ralph H Castain
Just FYI: on master (and perhaps 4.0), child jobs do not inherit their
parent's mapping policy by default. You have to add “-mca
rmaps_base_inherit 1” to your mpirun cmd line.
Thanks, I'll try this right away.
Thanks,
Andrew
--
* Andrew Benson: http://users.obs.carnegiescience.edu/abenson/contact.html
* Galacticus: http://sites.google.com/site/galacticusmodel
Post by Ralph H Castain
Sorry for delay - this should be fixed by
https://github.com/open-mpi/ompi/pull/5854
Post by Andrew Benson
On further investigation removing the "preconnect_all" option does
change the
--------------------------------------------------------------------------
Post by Andrew Benson
At least one pair of MPI processes are unable to reach each other for
MPI communications. This means that no Open MPI device has indicated
that it can be used to communicate between these processes. This is
an error; Open MPI requires that all MPI processes be able to reach
each other. This error can sometimes be the result of forgetting to
specify the "self" BTL.
Process 1 ([[32179,2],15]) is on host: node092
Process 2 ([[32179,2],0]) is on host: unknown!
BTLs attempted: self tcp vader
Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Post by Andrew Benson
Operation: LOOKUP: orted/pmix/pmix_server_pub.c:345
Your job may terminate as a result of this problem. You may want to
adjust the MCA parameter pmix_server_max_wait and try again. If this
occurred during a connect/accept operation, you can adjust that time
using the pmix_base_exchange_timeout parameter.
--------------------------------------------------------------------------
Post by Andrew Benson
[node091:19470] *** An error occurred in MPI_Comm_spawn
[node091:19470] *** reported by process [1614086145,0]
[node091:19470] *** on communicator MPI_COMM_WORLD
[node091:19470] *** MPI_ERR_UNKNOWN: unknown error
[node091:19470] *** MPI_ERRORS_ARE_FATAL (processes in this
communicator will
Post by Andrew Benson
now abort,
[node091:19470] *** and potentially your MPI job)
I've tried increasing both pmix_server_max_wait and
pmix_base_exchange_timeout
Post by Andrew Benson
as suggested in the error message, but the result is unchanged (it just
takes
Post by Andrew Benson
longer to time out).
Once again, if I remove "--map-by node" it runs successfully.
-Andrew
I see you are using “preconnect_all” - that is the source of the
trouble. I
Post by Andrew Benson
don’t believe we have tested that option in years and the code is
almost
Post by Andrew Benson
certainly dead. I’d suggest removing that option and things should
work.
Post by Andrew Benson
Post by Andrew Benson
I'm running into problems trying to spawn MPI processes across
multiple
Post by Andrew Benson
Post by Andrew Benson
nodes on a cluster using recent versions of OpenMPI. Specifically,
using
Post by Andrew Benson
Post by Andrew Benson
mpif90 test.F90 -o test.exe
and run via a PBS scheduler using the attached test1.pbs, it fails as
can
Post by Andrew Benson
Post by Andrew Benson
be seen in the attached testFAIL.err file.
If I do the same but using OpenMPI v1.10.3 then it works successfully,
giving me the output in the attached testSUCCESS.err file.
From testing a few different versions of OpenMPI it seems that the behavior
changed between v1.10.7 and v2.0.4.
Is there some change in options needed to make this work with newer OpenMPIs?
http://users.obs.carnegiescience.edu/abenson/config.log.bz2
Thanks for any help you can offer!
-Andrew<ompi_info.log.bz2><test.F90><test1.pbs><testFAIL.err.bz2><testSUCC
Post by Andrew Benson
Post by Andrew Benson
ESS.err.bz2>_______________________________________________ users
mailing
Post by Andrew Benson
Post by Andrew Benson
list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
--
http://users.obs.carnegiescience.edu/abenson/contact.html
Post by Andrew Benson
* Galacticus: https://bitbucket.org/abensonca/galacticus
Loading...