Florian Lindner
2017-11-03 14:48:45 UTC
Hello,
I'm working on a sample program to connect two MPI communicators launched with mpirun using Ports.
Firstly, I use MPI_Open_port to obtain a name and write that to a file:
if (options.participant == A) { // A publishes the port
if (options.commType == single and rank == 0)
openPublishPort(options);
if (options.commType == many)
openPublishPort(options);
}
MPI_Barrier(MPI_COMM_WORLD);
participant is a command line argument and defines the role of A as server. B is the client.
void openPublishPort(Options options)
{
using namespace boost::filesystem;
int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
char p[MPI_MAX_PORT_NAME];
MPI_Open_port(MPI_INFO_NULL, p);
std::string portName(p);
create_directory(options.publishDirectory);
std::string filename;
if (options.commType == many)
filename = "A-" + std::to_string(rank) + ".address";
if (options.commType == single)
filename = "intercomm.address";
auto path = options.publishDirectory / filename;
DEBUG << "Writing address " << portName << " to " << path;
std::ofstream ofs(path.string(), std::ofstream::out);
ofs << portName;
}
This works fine as far as I see. Next, I try to connect:
MPI_Comm icomm;
std::string portName;
if (options.participant == A) { // receives connections
if (options.commType == single) {
if (rank == 0)
portName = readPort(options);
INFO << "Accepting connection on " << portName;
MPI_Comm_accept(portName.c_str(), MPI_INFO_NULL, 0, MPI_COMM_WORLD, &icomm);
INFO << "Received connection";
}
}
if (options.participant == B) { // connects to the intercomms
if (options.commType == single) {
if (rank == 0)
portName = readPort(options);
INFO << "Trying to connect to " << portName;
MPI_Comm_connect(portName.c_str(), MPI_INFO_NULL, 0, MPI_COMM_WORLD, &icomm);
INFO << "Connected";
}
}
options.single says that I want to use a single communicator that contains all ranks on both participants, A and B.
readPort reads the port name from the file that was written before.
Now, when I first launch A and, in another terminal, B, nothing happens until a timeout occurs.
% mpirun -n 1 ./mpiports --commType="single" --participant="A"
[2017-11-03 15:29:55.469891] [debug] Writing address 3048013825.0:1069313090 to "./publish/intercomm.address"
[2017-11-03 15:29:55.470169] [debug] Read address 3048013825.0:1069313090 from "./publish/intercomm.address"
[2017-11-03 15:29:55.470185] [info] Accepting connection on 3048013825.0:1069313090
[asaru:16199] OPAL ERROR: Timeout in file base/pmix_base_fns.c at line 195
[...]
and on the other site:
% mpirun -n 1 ./mpiports --commType="single" --participant="B"
[2017-11-03 15:29:59.698921] [debug] Read address 3048013825.0:1069313090 from "./publish/intercomm.address"
[2017-11-03 15:29:59.698947] [info] Trying to connect to 3048013825.0:1069313090
[asaru:16238] OPAL ERROR: Timeout in file base/pmix_base_fns.c at line 195
[...]
The complete code, including cmake build script can be downloaded at:
https://www.dropbox.com/s/azo5ti4kjg12zjy/MPI_Ports.tar.gz?dl=0
Why is the connection not working?
Thanks a lot,
Florian
I'm working on a sample program to connect two MPI communicators launched with mpirun using Ports.
Firstly, I use MPI_Open_port to obtain a name and write that to a file:
if (options.participant == A) { // A publishes the port
if (options.commType == single and rank == 0)
openPublishPort(options);
if (options.commType == many)
openPublishPort(options);
}
MPI_Barrier(MPI_COMM_WORLD);
participant is a command line argument and defines the role of A as server. B is the client.
void openPublishPort(Options options)
{
using namespace boost::filesystem;
int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
char p[MPI_MAX_PORT_NAME];
MPI_Open_port(MPI_INFO_NULL, p);
std::string portName(p);
create_directory(options.publishDirectory);
std::string filename;
if (options.commType == many)
filename = "A-" + std::to_string(rank) + ".address";
if (options.commType == single)
filename = "intercomm.address";
auto path = options.publishDirectory / filename;
DEBUG << "Writing address " << portName << " to " << path;
std::ofstream ofs(path.string(), std::ofstream::out);
ofs << portName;
}
This works fine as far as I see. Next, I try to connect:
MPI_Comm icomm;
std::string portName;
if (options.participant == A) { // receives connections
if (options.commType == single) {
if (rank == 0)
portName = readPort(options);
INFO << "Accepting connection on " << portName;
MPI_Comm_accept(portName.c_str(), MPI_INFO_NULL, 0, MPI_COMM_WORLD, &icomm);
INFO << "Received connection";
}
}
if (options.participant == B) { // connects to the intercomms
if (options.commType == single) {
if (rank == 0)
portName = readPort(options);
INFO << "Trying to connect to " << portName;
MPI_Comm_connect(portName.c_str(), MPI_INFO_NULL, 0, MPI_COMM_WORLD, &icomm);
INFO << "Connected";
}
}
options.single says that I want to use a single communicator that contains all ranks on both participants, A and B.
readPort reads the port name from the file that was written before.
Now, when I first launch A and, in another terminal, B, nothing happens until a timeout occurs.
% mpirun -n 1 ./mpiports --commType="single" --participant="A"
[2017-11-03 15:29:55.469891] [debug] Writing address 3048013825.0:1069313090 to "./publish/intercomm.address"
[2017-11-03 15:29:55.470169] [debug] Read address 3048013825.0:1069313090 from "./publish/intercomm.address"
[2017-11-03 15:29:55.470185] [info] Accepting connection on 3048013825.0:1069313090
[asaru:16199] OPAL ERROR: Timeout in file base/pmix_base_fns.c at line 195
[...]
and on the other site:
% mpirun -n 1 ./mpiports --commType="single" --participant="B"
[2017-11-03 15:29:59.698921] [debug] Read address 3048013825.0:1069313090 from "./publish/intercomm.address"
[2017-11-03 15:29:59.698947] [info] Trying to connect to 3048013825.0:1069313090
[asaru:16238] OPAL ERROR: Timeout in file base/pmix_base_fns.c at line 195
[...]
The complete code, including cmake build script can be downloaded at:
https://www.dropbox.com/s/azo5ti4kjg12zjy/MPI_Ports.tar.gz?dl=0
Why is the connection not working?
Thanks a lot,
Florian