Discussion:
[OMPI users] MPI_CANCEL for nonblocking collective communication
Markus
2017-06-09 11:33:15 UTC
Permalink
Dear MPI Users and Maintainers,

I am using openMPI in version 1.10.4 with enabled multithread support and
java bindings. I use MPI in java, having one process per machine and
multiple threads per process.

I was trying to build a broadcast listener thread which calls MPI_iBcast,
followed by MPI_WAIT.

I use the request object, which is returned by MPI_iBcast, to shut the
listener down, calling MPI-CANCEL for that request from the main thread.
This results in

[fe-402-1:2972] *** An error occurred in MPI_Cancel
[fe-402-1:2972] *** reported by process [1275002881,17179869185
<(717)%20986-9185>]
[fe-402-1:2972] *** on communicator MPI_COMM_WORLD
[fe-402-1:2972] *** MPI_ERR_REQUEST: invalid request
[fe-402-1:2972] *** MPI_ERRORS_ARE_FATAL (processes in this communicator
will now abort,
[fe-402-1:2972] *** and potentially your MPI job)


Which indicates that the request is invalid in some fashion. I already
checked that it is not null (MPI_REQUEST_NULL). I have also set up a simple
testbed, where nothing else happens, except that one broadcast. The request
object is always invalid, no matter from where i call cancel().

As far as I understand the MPI specifications, cancel is also supposed to
work for collective nonblocking communication (which includes my
broadcasts). I haven't found any advice yet, so I hope to find some help in
this mailing list.

Kind regards,
Markus Jeromin

PS: Testbed for calling mpi cancel, written in Java.
_______

package distributed.mpi;

import java.nio.ByteBuffer;

import mpi.MPI;
import mpi.MPIException;
import mpi.Request;

/**
* Testing MPI_CANCEL on MPI_iBcast.<br>
* Program does not terminate because the listeners are still running and
* waiting for the java native call MPI_WAIT to return. MPI_CANCEL is
called, but
* the listener never unblocks (i.e. the MPI_WAIT never returns)
*
* @author mjeromin
*
*/
public class BroadcastTestCancel {

static int myrank;

/**
* Listener that waits for incoming broadcasts from specified root. Uses
* asynchronous MPI_iBcast and MPI_WAIT
*
*/
static class Listener extends Thread {

ByteBuffer b = ByteBuffer.allocateDirect(100);
public Request req = null;

@Override
public void run() {
super.run();
try {
req = MPI.COMM_WORLD.iBcast(b, b.limit(), MPI.BYTE, 0);
System.out.println(myrank + ": waiting for bcast (that will never come)");
req.waitFor();
} catch (MPIException e) {
e.printStackTrace();
}
System.out.println(myrank + ": listener unblocked");
}
}

public static void main(String[] args) throws MPIException,
InterruptedException {

// we need full thread support
int threadSupport = MPI.InitThread(args, MPI.THREAD_MULTIPLE);
if (threadSupport != MPI.THREAD_MULTIPLE) {
System.out.println(myrank + ": no multithread support. Aborting.");
MPI.Finalize();
return;
}

// disable or enable exceptions, it does not matter at all.
MPI.COMM_WORLD.setErrhandler(MPI.ERRORS_RETURN);

myrank = MPI.COMM_WORLD.getRank();

// start receiving listeners, but no sender (which would be node 0)
if (myrank > 0) {
Listener l = new Listener();
l.start();

// let the listener reach at waitFor()
Thread.sleep(5000);

// call MPI_CANCEL (matching send will never arrive)
try {
l.req.cancel();
} catch (MPIException e) {
// depends on error handler
System.out.println(myrank + ": MPI Exception \n" + e.toString());
}
}

// don't call MPI_FINISH too early. (not that necessary to wait here, but
just to be sure)
Thread.sleep(15000);

System.out.println(myrank + ": calling finish");
MPI.Finalize();
System.out.println(myrank + ": finished");
}

}
Nathan Hjelm
2017-06-09 13:07:33 UTC
Permalink
MPI 3.1 5.12 is pretty clear on the matter:

"It is erroneous to call MPI_REQUEST_FREE or MPI_CANCEL for a request associated with a nonblocking collective operation."

-Nathan
Post by Markus
Dear MPI Users and Maintainers,
I am using openMPI in version 1.10.4 with enabled multithread support and java bindings. I use MPI in java, having one process per machine and multiple threads per process.
I was trying to build a broadcast listener thread which calls MPI_iBcast, followed by MPI_WAIT.
I use the request object, which is returned by MPI_iBcast, to shut the listener down, calling MPI-CANCEL for that request from the main thread. This results in
[fe-402-1:2972] *** An error occurred in MPI_Cancel
[fe-402-1:2972] *** reported by process [1275002881,17179869185]
[fe-402-1:2972] *** on communicator MPI_COMM_WORLD
[fe-402-1:2972] *** MPI_ERR_REQUEST: invalid request
[fe-402-1:2972] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[fe-402-1:2972] *** and potentially your MPI job)
Which indicates that the request is invalid in some fashion. I already checked that it is not null (MPI_REQUEST_NULL). I have also set up a simple testbed, where nothing else happens, except that one broadcast. The request object is always invalid, no matter from where i call cancel().
As far as I understand the MPI specifications, cancel is also supposed to work for collective nonblocking communication (which includes my broadcasts). I haven't found any advice yet, so I hope to find some help in this mailing list.
Kind regards,
Markus Jeromin
PS: Testbed for calling mpi cancel, written in Java.
_______
package distributed.mpi;
import java.nio.ByteBuffer;
import mpi.MPI;
import mpi.MPIException;
import mpi.Request;
/**
* Testing MPI_CANCEL on MPI_iBcast.<br>
* Program does not terminate because the listeners are still running and
* waiting for the java native call MPI_WAIT to return. MPI_CANCEL is called, but
* the listener never unblocks (i.e. the MPI_WAIT never returns)
*
*
*/
public class BroadcastTestCancel {
static int myrank;
/**
* Listener that waits for incoming broadcasts from specified root. Uses
* asynchronous MPI_iBcast and MPI_WAIT
*
*/
static class Listener extends Thread {
ByteBuffer b = ByteBuffer.allocateDirect(100);
public Request req = null;
@Override
public void run() {
super.run();
try {
req = MPI.COMM_WORLD.iBcast(b, b.limit(), MPI.BYTE, 0);
System.out.println(myrank + ": waiting for bcast (that will never come)");
req.waitFor();
} catch (MPIException e) {
e.printStackTrace();
}
System.out.println(myrank + ": listener unblocked");
}
}
public static void main(String[] args) throws MPIException, InterruptedException {
// we need full thread support
int threadSupport = MPI.InitThread(args, MPI.THREAD_MULTIPLE);
if (threadSupport != MPI.THREAD_MULTIPLE) {
System.out.println(myrank + ": no multithread support. Aborting.");
MPI.Finalize();
return;
}
// disable or enable exceptions, it does not matter at all.
MPI.COMM_WORLD.setErrhandler(MPI.ERRORS_RETURN);
myrank = MPI.COMM_WORLD.getRank();
// start receiving listeners, but no sender (which would be node 0)
if (myrank > 0) {
Listener l = new Listener();
l.start();
// let the listener reach at waitFor()
Thread.sleep(5000);
// call MPI_CANCEL (matching send will never arrive)
try {
l.req.cancel();
} catch (MPIException e) {
// depends on error handler
System.out.println(myrank + ": MPI Exception \n" + e.toString());
}
}
// don't call MPI_FINISH too early. (not that necessary to wait here, but just to be sure)
Thread.sleep(15000);
System.out.println(myrank + ": calling finish");
MPI.Finalize();
System.out.println(myrank + ": finished");
}
}
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Loading...