Discussion:
[OMPI users] problems with client server scenario using MPI_Comm_connect
Marlborough, Rick
2016-10-03 21:39:50 UTC
Permalink
Folks;
I have been trying to get a test case up and running using a client server scenario with a server waiting on MPI_Comm_accept and the client trying to connect via MPI_Comm_connect. The port value is written to a file. The client opens the file and reads the port value. I run the server, followed by the client. They both appear to sit there for a time, but eventually they both timeout and abort. They are both running a separate machines. All other communications between these 2 machines appears to be OK. Is there some intermediate service that needs to be run? I am using OpenMPI v2.01 on Red Hat linux v6.5 64 bit running on a 1 gig network.

Thanks
Rick
Gilles Gouaillardet
2016-10-04 11:13:29 UTC
Permalink
Rick,

I do not think ompi_server is required here.
Can you please post a trimmed version of your client and server, and your two mpirun command lines.
You also need to make sure all ranks have the same root parameter when invoking MPI_Comm_accept and MPI_Comm_connect

Cheers,

Gilles
Post by Marlborough, Rick
Folks;
                I have been trying to get a test case up and running using a client server scenario with a server waiting on MPI_Comm_accept and the client trying to connect via MPI_Comm_connect. The port value is written to a file. The client opens the file and reads the port value. I run the server, followed by the client. They both appear to sit there for a time, but eventually they both timeout and abort. They are both running a separate machines. All other communications between these 2 machines appears to be OK. Is there some intermediate service that needs to be run? I am using OpenMPI v2.01 on Red Hat linux v6.5 64 bit running on a 1 gig network.
 
Thanks
Rick
 
Marlborough, Rick
2016-10-04 12:28:45 UTC
Permalink
Gilles;
Here is the client side code. The start command is “mpirun –n 1 client 10” where 10 is used to size a buffer.

int numtasks, rank, dest, source, rc, count, tag=1;
MPI_Init(&argc,&argv);
if(argc > 1)
{
bufsize = atoi(argv[1]);
}
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm server;
if(1)
{
char port_name[MPI_MAX_PORT_NAME + 1];

std::ifstream file("./portfile");
file.getline(port_name,MPI_MAX_PORT_NAME) ;
file.close();
//Lookup_name does not work.
//MPI_Lookup_name("test_service", MPI_INFO_NULL, port_name);
std::cout << "Established port name is " << port_name << std::endl;
MPI_Comm_connect(port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &server);
MPI_Comm_remote_size(server,&num_procs);
std::cout << "Number of running processes is " << num_procs << std::endl;
MPI_Finalize();
exit(0);
}


Here is the server code. This is started on a different machine. The command line is “mpirun –n 1 sendrec 10” where 10 is used to size a buffer.

int numtasks, rank, dest, source, rc, count, tag=1;
MPI_Init(&argc,&argv);
if(argc > 1)
{
bufsize = atoi(argv[1]);
}
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);


MPI_Comm remote_clients;
MPI_Info pub_global;

std::cout << "This process rank is " << rank << std::endl;
std::cout << "Number of current processes is " << numtasks << std::endl;
char port_name[MPI_MAX_PORT_NAME];
mpi_error = MPI_Open_port(MPI_INFO_NULL, port_name);
MPI_Info_create(&pub_global);
MPI_Info_set(pub_global, "ompi_global_scope", "true");
mpi_error = MPI_Publish_name("test_service", pub_global, port_name);
if(mpi_error)
{
...
}
std::cout << "Established port name is " << port_name << std::endl;
std::ofstream file("./portfile",std::ofstream::trunc);
file << port_name;
file.close();
MPI_Comm_accept(port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &remote_clients);



The server error looks like this


[cid:***@01D21E19.521F3BC0]


The client error look like so


[cid:***@01D21E19.521F3BC0]


Thanks
Rick
From: users [mailto:users-***@lists.open-mpi.org] On Behalf Of Gilles Gouaillardet
Sent: Tuesday, October 04, 2016 7:13 AM
To: Open MPI Users
Subject: Re: [OMPI users] problems with client server scenario using MPI_Comm_connect

Rick,

I do not think ompi_server is required here.
Can you please post a trimmed version of your client and server, and your two mpirun command lines.
You also need to make sure all ranks have the same root parameter when invoking MPI_Comm_accept and MPI_Comm_connect

Cheers,

Gilles

"Marlborough, Rick" <***@aaccorp.com<mailto:***@aaccorp.com>> wrote:
Folks;
I have been trying to get a test case up and running using a client server scenario with a server waiting on MPI_Comm_accept and the client trying to connect via MPI_Comm_connect. The port value is written to a file. The client opens the file and reads the port value. I run the server, followed by the client. They both appear to sit there for a time, but eventually they both timeout and abort. They are both running a separate machines. All other communications between these 2 machines appears to be OK. Is there some intermediate service that needs to be run? I am using OpenMPI v2.01 on Red Hat linux v6.5 64 bit running on a 1 gig network.

Thanks
Rick
Gilles Gouaillardet
2016-10-04 12:38:46 UTC
Permalink
Rick,

How long does it take between the test fails ?
There were a bug that caused a failure if no connection was received after
2 (3?) seconds, but I think it was fixed in v2.0.1
That being said, you might want to try a nightly snapshot of the v2.0.x
branch

Cheers,

Gilles
Post by Marlborough, Rick
Gilles;
Here is the client side code. The start command is “mpirun
–n 1 client 10” where 10 is used to size a buffer.
int numtasks, rank, dest, source, rc, count, tag=1;
MPI_Init(&argc,&argv);
if(argc > 1)
{
bufsize = atoi(argv[1]);
}
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm server;
if(1)
{
char port_name[MPI_MAX_PORT_NAME + 1];
std::ifstream file("./portfile");
file.getline(port_name,MPI_MAX_PORT_NAME) ;
file.close();
//Lookup_name does not work.
//MPI_Lookup_name("test_service",
MPI_INFO_NULL, port_name);
std::cout << "Established port name is "
<< port_name << std::endl;
MPI_Comm_connect(port_name,
MPI_INFO_NULL, 0, MPI_COMM_WORLD, &server);
MPI_Comm_remote_size(server,&num_procs);
std::cout << "Number of running processes
is " << num_procs << std::endl;
MPI_Finalize();
exit(0);
}
Here is the server code. This is started on a different machine. The
command line is “mpirun –n 1 sendrec 10” where 10 is used to size a buffer.
int numtasks, rank, dest, source, rc, count, tag=1;
MPI_Init(&argc,&argv);
if(argc > 1)
{
bufsize = atoi(argv[1]);
}
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm remote_clients;
MPI_Info pub_global;
std::cout << "This process rank is " << rank << std::endl;
std::cout << "Number of current processes is " << numtasks << std::endl;
char port_name[MPI_MAX_PORT_NAME];
mpi_error = MPI_Open_port(MPI_INFO_NULL, port_name);
MPI_Info_create(&pub_global);
MPI_Info_set(pub_global, "ompi_global_scope", "true");
mpi_error = MPI_Publish_name("test_service", pub_global, port_name);
if(mpi_error)
{
...
}
std::cout << "Established port name is " << port_name << std::endl;
std::ofstream file("./portfile",std::ofstream::trunc);
file << port_name;
file.close();
MPI_Comm_accept(port_name, MPI_INFO_NULL, 0,
MPI_COMM_WORLD, &remote_clients);
The server error looks like this

The client error look like so

Thanks
Rick
Behalf Of *Gilles Gouaillardet
*Sent:* Tuesday, October 04, 2016 7:13 AM
*To:* Open MPI Users
*Subject:* Re: [OMPI users] problems with client server scenario using
MPI_Comm_connect
Rick,
I do not think ompi_server is required here.
Can you please post a trimmed version of your client and server, and your
two mpirun command lines.
You also need to make sure all ranks have the same root parameter when
invoking MPI_Comm_accept and MPI_Comm_connect
Cheers,
Gilles
Folks;
I have been trying to get a test case up and running using
a client server scenario with a server waiting on MPI_Comm_accept and the
client trying to connect via MPI_Comm_connect. The port value is written to
a file. The client opens the file and reads the port value. I run the
server, followed by the client. They both appear to sit there for a time,
but eventually they both timeout and abort. They are both running a
separate machines. All other communications between these 2 machines
appears to be OK. Is there some intermediate service that needs to be run?
I am using OpenMPI v2.01 on Red Hat linux v6.5 64 bit running on a 1 gig
network.
Thanks
Rick
Marlborough, Rick
2016-10-04 12:46:51 UTC
Permalink
Gilles;
The abort occurs somewhere between 30 and 60 seconds. Is there some configuration setting that could influence this?

Rick

From: users [mailto:users-***@lists.open-mpi.org] On Behalf Of Gilles Gouaillardet
Sent: Tuesday, October 04, 2016 8:39 AM
To: Open MPI Users
Subject: Re: [OMPI users] problems with client server scenario using MPI_Comm_connect

Rick,

How long does it take between the test fails ?
There were a bug that caused a failure if no connection was received after 2 (3?) seconds, but I think it was fixed in v2.0.1
That being said, you might want to try a nightly snapshot of the v2.0.x branch

Cheers,

Gilles

On Tuesday, October 4, 2016, Marlborough, Rick <***@aaccorp.com<mailto:***@aaccorp.com>> wrote:
Gilles;
Here is the client side code. The start command is “mpirun –n 1 client 10” where 10 is used to size a buffer.

int numtasks, rank, dest, source, rc, count, tag=1;
MPI_Init(&argc,&argv);
if(argc > 1)
{
bufsize = atoi(argv[1]);
}
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm server;
if(1)
{
char port_name[MPI_MAX_PORT_NAME + 1];

std::ifstream file("./portfile");
file.getline(port_name,MPI_MAX_PORT_NAME) ;
file.close();
//Lookup_name does not work.
//MPI_Lookup_name("test_service", MPI_INFO_NULL, port_name);
std::cout << "Established port name is " << port_name << std::endl;
MPI_Comm_connect(port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &server);
MPI_Comm_remote_size(server,&num_procs);
std::cout << "Number of running processes is " << num_procs << std::endl;
MPI_Finalize();
exit(0);
}


Here is the server code. This is started on a different machine. The command line is “mpirun –n 1 sendrec 10” where 10 is used to size a buffer.

int numtasks, rank, dest, source, rc, count, tag=1;
MPI_Init(&argc,&argv);
if(argc > 1)
{
bufsize = atoi(argv[1]);
}
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);


MPI_Comm remote_clients;
MPI_Info pub_global;

std::cout << "This process rank is " << rank << std::endl;
std::cout << "Number of current processes is " << numtasks << std::endl;
char port_name[MPI_MAX_PORT_NAME];
mpi_error = MPI_Open_port(MPI_INFO_NULL, port_name);
MPI_Info_create(&pub_global);
MPI_Info_set(pub_global, "ompi_global_scope", "true");
mpi_error = MPI_Publish_name("test_service", pub_global, port_name);
if(mpi_error)
{
...
}
std::cout << "Established port name is " << port_name << std::endl;
std::ofstream file("./portfile",std::ofstream::trunc);
file << port_name;
file.close();
MPI_Comm_accept(port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &remote_clients);



The server error looks like this


[cid:***@01D21E1B.D97A31E0]


The client error look like so


[cid:***@01D21E1B.D97A31E0]


Thanks
Rick
From: users [mailto:users-***@lists.open-mpi.org<javascript:_e(%7B%7D,'cvml','users-***@lists.open-mpi.org');>] On Behalf Of Gilles Gouaillardet
Sent: Tuesday, October 04, 2016 7:13 AM
To: Open MPI Users
Subject: Re: [OMPI users] problems with client server scenario using MPI_Comm_connect

Rick,

I do not think ompi_server is required here.
Can you please post a trimmed version of your client and server, and your two mpirun command lines.
You also need to make sure all ranks have the same root parameter when invoking MPI_Comm_accept and MPI_Comm_connect

Cheers,

Gilles

"Marlborough, Rick" <***@aaccorp.com<javascript:_e(%7B%7D,'cvml','***@aaccorp.com');>> wrote:
Folks;
I have been trying to get a test case up and running using a client server scenario with a server waiting on MPI_Comm_accept and the client trying to connect via MPI_Comm_connect. The port value is written to a file. The client opens the file and reads the port value. I run the server, followed by the client. They both appear to sit there for a time, but eventually they both timeout and abort. They are both running a separate machines. All other communications between these 2 machines appears to be OK. Is there some intermediate service that needs to be run? I am using OpenMPI v2.01 on Red Hat linux v6.5 64 bit running on a 1 gig network.

Thanks
Rick
Gilles Gouaillardet
2016-10-04 12:59:37 UTC
Permalink
Rick,

v2.0.x uses a 60 seconds hard coded timeout (vs 600 seconds in master)
in ompi/dpm/dpm.c, see OPAL_PMIX_EXCHANGE

I will check your test and likely have the value bumped to 600 seconds

Cheers,

Gilles
Post by Marlborough, Rick
Gilles;
The abort occurs somewhere between 30 and 60 seconds. Is
there some configuration setting that could influence this?
Rick
Behalf Of *Gilles Gouaillardet
*Sent:* Tuesday, October 04, 2016 8:39 AM
*To:* Open MPI Users
*Subject:* Re: [OMPI users] problems with client server scenario using
MPI_Comm_connect
Rick,
How long does it take between the test fails ?
There were a bug that caused a failure if no connection was received after
2 (3?) seconds, but I think it was fixed in v2.0.1
That being said, you might want to try a nightly snapshot of the v2.0.x branch
Cheers,
Gilles
Gilles;
Here is the client side code. The start command is “mpirun
–n 1 client 10” where 10 is used to size a buffer.
int numtasks, rank, dest, source, rc, count, tag=1;
MPI_Init(&argc,&argv);
if(argc > 1)
{
bufsize = atoi(argv[1]);
}
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm server;
if(1)
{
char port_name[MPI_MAX_PORT_NAME + 1];
std::ifstream file("./portfile");
file.getline(port_name,MPI_MAX_PORT_NAME) ;
file.close();
//Lookup_name does not work.
//MPI_Lookup_name("test_service",
MPI_INFO_NULL, port_name);
std::cout << "Established port name is "
<< port_name << std::endl;
MPI_Comm_connect(port_name,
MPI_INFO_NULL, 0, MPI_COMM_WORLD, &server);
MPI_Comm_remote_size(server,&num_procs);
std::cout << "Number of running processes
is " << num_procs << std::endl;
MPI_Finalize();
exit(0);
}
Here is the server code. This is started on a different machine. The
command line is “mpirun –n 1 sendrec 10” where 10 is used to size a buffer.
int numtasks, rank, dest, source, rc, count, tag=1;
MPI_Init(&argc,&argv);
if(argc > 1)
{
bufsize = atoi(argv[1]);
}
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm remote_clients;
MPI_Info pub_global;
std::cout << "This process rank is " << rank << std::endl;
std::cout << "Number of current processes is " << numtasks << std::endl;
char port_name[MPI_MAX_PORT_NAME];
mpi_error = MPI_Open_port(MPI_INFO_NULL, port_name);
MPI_Info_create(&pub_global);
MPI_Info_set(pub_global, "ompi_global_scope", "true");
mpi_error = MPI_Publish_name("test_service", pub_global, port_name);
if(mpi_error)
{
...
}
std::cout << "Established port name is " << port_name << std::endl;
std::ofstream file("./portfile",std::ofstream::trunc);
file << port_name;
file.close();
MPI_Comm_accept(port_name, MPI_INFO_NULL, 0,
MPI_COMM_WORLD, &remote_clients);
The server error looks like this

The client error look like so

Thanks
Rick
Gouaillardet
*Sent:* Tuesday, October 04, 2016 7:13 AM
*To:* Open MPI Users
*Subject:* Re: [OMPI users] problems with client server scenario using
MPI_Comm_connect
Rick,
I do not think ompi_server is required here.
Can you please post a trimmed version of your client and server, and your
two mpirun command lines.
You also need to make sure all ranks have the same root parameter when
invoking MPI_Comm_accept and MPI_Comm_connect
Cheers,
Gilles
Folks;
I have been trying to get a test case up and running using
a client server scenario with a server waiting on MPI_Comm_accept and the
client trying to connect via MPI_Comm_connect. The port value is written to
a file. The client opens the file and reads the port value. I run the
server, followed by the client. They both appear to sit there for a time,
but eventually they both timeout and abort. They are both running a
separate machines. All other communications between these 2 machines
appears to be OK. Is there some intermediate service that needs to be run?
I am using OpenMPI v2.01 on Red Hat linux v6.5 64 bit running on a 1 gig
network.
Thanks
Rick
Loading...