Discussion:
[OMPI users] clarity on Comm_connect
Marlborough, Rick
2016-10-12 20:47:05 UTC
Permalink
Designation: Non-Export Controlled Content
Folks;
Trying to do an MPI_Lookup_name. The call is surrounded by a try catch block. Even with the try catch block the calling process will still abort if the publishing process has not published the name. Is there a way to configure/code to cause MPI to throw a trappable exception?

Thanx
Rick

3.1.1001
Marlborough, Rick
2016-10-12 21:44:28 UTC
Permalink
Designation: Non-Export Controlled Content
...forgot to mention...

I have a group of processes called sensors and a group of processes called proxies. A central dispatch process launches all of the sensors followed by all of the proxies. The sensors publish named ports and wait on MPI_Comm_accept. The proxies look up the named port and to a MPI_Comm_connect. If this all occurs on the same node as the dispatcher then all proxies connect their respective sensor and all is well. If I configure my slots to force proxies or sensors onto other nodes(I have 20) then the connections fail. There is full connectivity between all of these nodes. We are testing various forms of middleware. Some use tcp, some use udp, some use multi-cast. All work. Full ssh connectivity is setup between all of these nodes. Oddly enough the sensors all perform a Comm_connect to the dispatcher. This always works! The sensors and proxies are all spawned in 2 batches using Comm_spawn_multiple. Error message below. Is there some configuration to enable this?

[cid:***@01D224B0.446AA710]


3.1.1001
From: users [mailto:users-***@lists.open-mpi.org] On Behalf Of Marlborough, Rick
Sent: Wednesday, October 12, 2016 4:47 PM
To: ***@lists.open-mpi.org
Subject: [OMPI users] clarity on Comm_connect


Designation: Non-Export Controlled Content
Folks;
Trying to do an MPI_Lookup_name. The call is surrounded by a try catch block. Even with the try catch block the calling process will still abort if the publishing process has not published the name. Is there a way to configure/code to cause MPI to throw a trappable exception?

Thanx
Rick

3.1.1001
Marlborough, Rick
2016-10-12 22:01:07 UTC
Permalink
Designation: Non-Export Controlled Content
Another follow up. If I run all proxies on the same node as the dispatcher then it works. Even with all sensors spread to different nodes. If I force the proxies to another node, they all fail. Here is some more error output.

[cid:***@01D224B2.985D36B0]


3.1.1001
From: users [mailto:users-***@lists.open-mpi.org] On Behalf Of Marlborough, Rick
Sent: Wednesday, October 12, 2016 5:44 PM
To: Open MPI Users
Subject: Re: [OMPI users] clarity on Comm_connect


Designation: Non-Export Controlled Content
...forgot to mention...

I have a group of processes called sensors and a group of processes called proxies. A central dispatch process launches all of the sensors followed by all of the proxies. The sensors publish named ports and wait on MPI_Comm_accept. The proxies look up the named port and to a MPI_Comm_connect. If this all occurs on the same node as the dispatcher then all proxies connect their respective sensor and all is well. If I configure my slots to force proxies or sensors onto other nodes(I have 20) then the connections fail. There is full connectivity between all of these nodes. We are testing various forms of middleware. Some use tcp, some use udp, some use multi-cast. All work. Full ssh connectivity is setup between all of these nodes. Oddly enough the sensors all perform a Comm_connect to the dispatcher. This always works! The sensors and proxies are all spawned in 2 batches using Comm_spawn_multiple. Error message below. Is there some configuration to enable this?

[cid:***@01D224B0.446AA710]


3.1.1001
From: users [mailto:users-***@lists.open-mpi.org] On Behalf Of Marlborough, Rick
Sent: Wednesday, October 12, 2016 4:47 PM
To: ***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>
Subject: [OMPI users] clarity on Comm_connect


Designation: Non-Export Controlled Content
Folks;
Trying to do an MPI_Lookup_name. The call is surrounded by a try catch block. Even with the try catch block the calling process will still abort if the publishing process has not published the name. Is there a way to configure/code to cause MPI to throw a trappable exception?

Thanx
Rick

3.1.1001

Loading...