Pablo Lopez Rios
2011-04-23 02:31:23 UTC
Hi,
I'm having a bit of a problem with wrapping mpirun in a script. The
script needs to run an MPI job in the background and tail -f the output.
Pressing Ctrl+C should stop tail -f, and the MPI job should continue.
However mpirun seems to detect the SIGINT that was meant for tail, and
kills the job immediately. I've tried workarounds involving nohup,
disown, trap, subshells (including calling the script from within
itself), etc, to no avail.
The problem is that this doesn't happen if I run the command directly
instead, without mpirun. Attached is a script that reproduces the
problem. It runs a simple counting script in the background which takes
10 seconds to run, and tails the output. If called with "nompi" as first
argument, it will simply run bash -c "$SCRIPT" >& "$out" &, and with
"mpi" it will do the same with 'mpirun -np 1' prepended. The output I
get is:
$ ./ompi_bug.sh mpi
mpi:
1
2
3
4
^C
$ ./ompi_bug.sh nompi
nompi:
1
2
3
4
^C
$ cat output.*
mpi:
1
2
3
4
mpirun: killing job...
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 1222 on node pablomme exited
on signal 0 (Unknown signal 0).
--------------------------------------------------------------------------
mpirun: clean termination accomplished
nompi:
1
2
3
4
5
6
7
8
9
10
Done
This convinces me that there is something strange with OpenMPI, since I
expect no difference in signal handling when running a simple command
with or without mpirun in the middle.
I've tried looking for options to change this behaviour, but I don't
seem to find any. Is there one, preferably in the form of an environment
variable? Or is this a bug?
I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also
v1.2.8 as distributed with OpenSUSE 11.3.
Thanks,
Pablo
I'm having a bit of a problem with wrapping mpirun in a script. The
script needs to run an MPI job in the background and tail -f the output.
Pressing Ctrl+C should stop tail -f, and the MPI job should continue.
However mpirun seems to detect the SIGINT that was meant for tail, and
kills the job immediately. I've tried workarounds involving nohup,
disown, trap, subshells (including calling the script from within
itself), etc, to no avail.
The problem is that this doesn't happen if I run the command directly
instead, without mpirun. Attached is a script that reproduces the
problem. It runs a simple counting script in the background which takes
10 seconds to run, and tails the output. If called with "nompi" as first
argument, it will simply run bash -c "$SCRIPT" >& "$out" &, and with
"mpi" it will do the same with 'mpirun -np 1' prepended. The output I
get is:
$ ./ompi_bug.sh mpi
mpi:
1
2
3
4
^C
$ ./ompi_bug.sh nompi
nompi:
1
2
3
4
^C
$ cat output.*
mpi:
1
2
3
4
mpirun: killing job...
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 1222 on node pablomme exited
on signal 0 (Unknown signal 0).
--------------------------------------------------------------------------
mpirun: clean termination accomplished
nompi:
1
2
3
4
5
6
7
8
9
10
Done
This convinces me that there is something strange with OpenMPI, since I
expect no difference in signal handling when running a simple command
with or without mpirun in the middle.
I've tried looking for options to change this behaviour, but I don't
seem to find any. Is there one, preferably in the form of an environment
variable? Or is this a bug?
I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also
v1.2.8 as distributed with OpenSUSE 11.3.
Thanks,
Pablo