Alberto Ortiz
2017-02-27 16:23:59 UTC
Hi,
I am interested in using OpenMPI to manage the distribution on a MicroZed
cluster. This MicroZed boards come with a Zynq device, which has a
dual-core ARM cortex A9. One of the objectives of the project I am working
on is resilience, so I am trully interested in the fault tolerance provided
by OpenMPI.
The thing I want to know is if there is any implementation for run-time
migration. For instance, if I have an octa-MicroZed cluster running an MPI
job and I unplug the Ethernet cable of one of them or I reboot another one,
is there any support in OpenMPI to detect these failures and migrate the
ranks to other processors on run-time execution?
Thank you in advance,
Alberto.
I am interested in using OpenMPI to manage the distribution on a MicroZed
cluster. This MicroZed boards come with a Zynq device, which has a
dual-core ARM cortex A9. One of the objectives of the project I am working
on is resilience, so I am trully interested in the fault tolerance provided
by OpenMPI.
The thing I want to know is if there is any implementation for run-time
migration. For instance, if I have an octa-MicroZed cluster running an MPI
job and I unplug the Ethernet cable of one of them or I reboot another one,
is there any support in OpenMPI to detect these failures and migrate the
ranks to other processors on run-time execution?
Thank you in advance,
Alberto.