Discussion:
[OMPI users] OpenMPI exits when subsequent tail -f in script is interrupted
Pablo Lopez Rios
2011-04-23 02:31:23 UTC
Permalink
Hi,

I'm having a bit of a problem with wrapping mpirun in a script. The
script needs to run an MPI job in the background and tail -f the output.
Pressing Ctrl+C should stop tail -f, and the MPI job should continue.
However mpirun seems to detect the SIGINT that was meant for tail, and
kills the job immediately. I've tried workarounds involving nohup,
disown, trap, subshells (including calling the script from within
itself), etc, to no avail.

The problem is that this doesn't happen if I run the command directly
instead, without mpirun. Attached is a script that reproduces the
problem. It runs a simple counting script in the background which takes
10 seconds to run, and tails the output. If called with "nompi" as first
argument, it will simply run bash -c "$SCRIPT" >& "$out" &, and with
"mpi" it will do the same with 'mpirun -np 1' prepended. The output I
get is:


$ ./ompi_bug.sh mpi
mpi:
1
2
3
4
^C
$ ./ompi_bug.sh nompi
nompi:
1
2
3
4
^C
$ cat output.*
mpi:
1
2
3
4
mpirun: killing job...

--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 1222 on node pablomme exited
on signal 0 (Unknown signal 0).
--------------------------------------------------------------------------
mpirun: clean termination accomplished

nompi:
1
2
3
4
5
6
7
8
9
10
Done


This convinces me that there is something strange with OpenMPI, since I
expect no difference in signal handling when running a simple command
with or without mpirun in the middle.

I've tried looking for options to change this behaviour, but I don't
seem to find any. Is there one, preferably in the form of an environment
variable? Or is this a bug?

I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also
v1.2.8 as distributed with OpenSUSE 11.3.

Thanks,
Pablo
Reuti
2011-04-23 12:20:34 UTC
Permalink
Hi,
I'm having a bit of a problem with wrapping mpirun in a script. The script needs to run an MPI job in the background and tail -f the output. Pressing Ctrl+C should stop tail -f, and the MPI job should continue. However mpirun seems to detect the SIGINT that was meant for tail, and kills the job immediately. I've tried workarounds involving nohup, disown, trap, subshells (including calling the script from within itself), etc, to no avail.
what about:

( trap "" sigint; exec mpiexec ...) &

i.e. replace the subshell with changed interrupt handling with the mpiexec. Well, maybe mpiexec is adjusting it on its own again. This can be checked in /proc/<pid>/status

-- Reuti
$ ./ompi_bug.sh mpi
1
2
3
4
^C
$ ./ompi_bug.sh nompi
1
2
3
4
^C
$ cat output.*
1
2
3
4
mpirun: killing job...
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 1222 on node pablomme exited on signal 0 (Unknown signal 0).
--------------------------------------------------------------------------
mpirun: clean termination accomplished
1
2
3
4
5
6
7
8
9
10
Done
This convinces me that there is something strange with OpenMPI, since I expect no difference in signal handling when running a simple command with or without mpirun in the middle.
I've tried looking for options to change this behaviour, but I don't seem to find any. Is there one, preferably in the form of an environment variable? Or is this a bug?
I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also v1.2.8 as distributed with OpenSUSE 11.3.
Thanks,
Pablo
<ompi_bug.sh.gz>_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
Ralph Castain
2011-04-23 14:12:02 UTC
Permalink
Post by Reuti
Hi,
I'm having a bit of a problem with wrapping mpirun in a script. The script needs to run an MPI job in the background and tail -f the output. Pressing Ctrl+C should stop tail -f, and the MPI job should continue.
I don't think that is true at all. When you hit ctrl-C, every process executing in the script receives it. Mpirun traps the ctrl-c and immediately terminates all running MPI procs.
Post by Reuti
However mpirun seems to detect the SIGINT that was meant for tail, and kills the job immediately. I've tried workarounds involving nohup, disown, trap, subshells (including calling the script from within itself), etc, to no avail.
( trap "" sigint; exec mpiexec ...) &
i.e. replace the subshell with changed interrupt handling with the mpiexec. Well, maybe mpiexec is adjusting it on its own again. This can be checked in /proc/<pid>/status
-- Reuti
$ ./ompi_bug.sh mpi
1
2
3
4
^C
$ ./ompi_bug.sh nompi
1
2
3
4
^C
$ cat output.*
1
2
3
4
mpirun: killing job...
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 1222 on node pablomme exited on signal 0 (Unknown signal 0).
--------------------------------------------------------------------------
mpirun: clean termination accomplished
1
2
3
4
5
6
7
8
9
10
Done
This convinces me that there is something strange with OpenMPI, since I expect no difference in signal handling when running a simple command with or without mpirun in the middle.
I've tried looking for options to change this behaviour, but I don't seem to find any. Is there one, preferably in the form of an environment variable? Or is this a bug?
I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also v1.2.8 as distributed with OpenSUSE 11.3.
Thanks,
Pablo
<ompi_bug.sh.gz>_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
Pablo Lopez Rios
2011-04-23 15:11:32 UTC
Permalink
Post by Ralph Castain
Post by Pablo Lopez Rios
Pressing Ctrl+C should stop tail -f, and the MPI job
should continue.
I don't think that is true at all. When you hit ctrl-C,
every process executing in the script receives it. Mpirun
traps the ctrl-c and immediately terminates all running
MPI procs.
By "Ctrl+C should stop tail -f" I mean that this is the
desired behaviour of the script, not that this is what ought
to happen in general. My question is how to achieve this
behaviour, since I'm having trouble working around mpirun
catching sigint.

Thanks,
Pablo
Post by Ralph Castain
Post by Pablo Lopez Rios
Hi,
I'm having a bit of a problem with wrapping mpirun in a script. The script needs to run an MPI job in the background and tail -f the output. Pressing Ctrl+C should stop tail -f, and the MPI job should continue.
I don't think that is true at all. When you hit ctrl-C, every process executing in the script receives it. Mpirun traps the ctrl-c and immediately terminates all running MPI procs.
Post by Pablo Lopez Rios
However mpirun seems to detect the SIGINT that was meant for tail, and kills the job immediately. I've tried workarounds involving nohup, disown, trap, subshells (including calling the script from within itself), etc, to no avail.
( trap "" sigint; exec mpiexec ...)&
i.e. replace the subshell with changed interrupt handling with the mpiexec. Well, maybe mpiexec is adjusting it on its own again. This can be checked in /proc/<pid>/status
-- Reuti
$ ./ompi_bug.sh mpi
1
2
3
4
^C
$ ./ompi_bug.sh nompi
1
2
3
4
^C
$ cat output.*
1
2
3
4
mpirun: killing job...
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 1222 on node pablomme exited on signal 0 (Unknown signal 0).
--------------------------------------------------------------------------
mpirun: clean termination accomplished
1
2
3
4
5
6
7
8
9
10
Done
This convinces me that there is something strange with OpenMPI, since I expect no difference in signal handling when running a simple command with or without mpirun in the middle.
I've tried looking for options to change this behaviour, but I don't seem to find any. Is there one, preferably in the form of an environment variable? Or is this a bug?
I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also v1.2.8 as distributed with OpenSUSE 11.3.
Thanks,
Pablo
<ompi_bug.sh.gz>_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
Ralph Castain
2011-04-23 15:27:31 UTC
Permalink
Post by Pablo Lopez Rios
Post by Ralph Castain
Post by Pablo Lopez Rios
Pressing Ctrl+C should stop tail -f, and the MPI job
should continue.
I don't think that is true at all. When you hit ctrl-C,
every process executing in the script receives it. Mpirun
traps the ctrl-c and immediately terminates all running
MPI procs.
By "Ctrl+C should stop tail -f" I mean that this is the
desired behaviour of the script, not that this is what ought
to happen in general. My question is how to achieve this
behaviour, since I'm having trouble working around mpirun
catching sigint.
Like I said in my other response, you can't - mpirun automatically traps sigint and terminates the job in order to ensure proper cleanup during abnormal terminations.

I'm not sure what you are actually trying to accomplish, but there are other signals that don't cause termination. For example, we trap and forward SIGUSR1 and SIGUSR2 to your application procs, if that is of use.

But ctrl-c has a special meaning ("die"), and you can't tell mpirun to ignore it.
Post by Pablo Lopez Rios
Thanks,
Pablo
Post by Ralph Castain
Post by Pablo Lopez Rios
Hi,
I'm having a bit of a problem with wrapping mpirun in a script. The script needs to run an MPI job in the background and tail -f the output. Pressing Ctrl+C should stop tail -f, and the MPI job should continue.
I don't think that is true at all. When you hit ctrl-C, every process executing in the script receives it. Mpirun traps the ctrl-c and immediately terminates all running MPI procs.
Post by Pablo Lopez Rios
However mpirun seems to detect the SIGINT that was meant for tail, and kills the job immediately. I've tried workarounds involving nohup, disown, trap, subshells (including calling the script from within itself), etc, to no avail.
( trap "" sigint; exec mpiexec ...)&
i.e. replace the subshell with changed interrupt handling with the mpiexec. Well, maybe mpiexec is adjusting it on its own again. This can be checked in /proc/<pid>/status
-- Reuti
$ ./ompi_bug.sh mpi
1
2
3
4
^C
$ ./ompi_bug.sh nompi
1
2
3
4
^C
$ cat output.*
1
2
3
4
mpirun: killing job...
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 1222 on node pablomme exited on signal 0 (Unknown signal 0).
--------------------------------------------------------------------------
mpirun: clean termination accomplished
1
2
3
4
5
6
7
8
9
10
Done
This convinces me that there is something strange with OpenMPI, since I expect no difference in signal handling when running a simple command with or without mpirun in the middle.
I've tried looking for options to change this behaviour, but I don't seem to find any. Is there one, preferably in the form of an environment variable? Or is this a bug?
I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also v1.2.8 as distributed with OpenSUSE 11.3.
Thanks,
Pablo
<ompi_bug.sh.gz>_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
Pablo Lopez Rios
2011-04-23 16:40:55 UTC
Permalink
Post by Ralph Castain
I'm not sure what you are actually trying to accomplish
I simply want a script that runs the equivalent of:

mpirun command>& out&
tail -f out

such that hitting Ctrl+C stops tail but leaves mpirun running. I can certainly do this without mpirun, it's not unreasonable to expect to be able to do the same with mpirun. I need mpirun to either ignore the SIGINT or not receive it at all -- and as per your comments, ignoring it is not an option.

Let me rephrase my question then. With the following script:

mpirun command>& out&
tail -f out

SIGINT stops tail AND mpirun. That's OK. The following:

(
trap : SIGINT
mpirun command>& out&
)
tail -f out

has the same effect, idicating that mpirun overrides previous traps in the same subshell. That's OK too. However the following:

(
trap : SIGINT
(
mpirun command>& out&
)
)
tail -f out

also has the same effect. How is mpirun overriding the trap in the *parent* subshell so that it ends up getting the SIGINT that was supposedly blocked at that level? Am I missing something trivial? How can I avoid this?

Thanks,
Pablo
Post by Ralph Castain
Post by Pablo Lopez Rios
Post by Ralph Castain
Post by Pablo Lopez Rios
Pressing Ctrl+C should stop tail -f, and the MPI job
should continue.
I don't think that is true at all. When you hit ctrl-C,
every process executing in the script receives it. Mpirun
traps the ctrl-c and immediately terminates all running
MPI procs.
By "Ctrl+C should stop tail -f" I mean that this is the
desired behaviour of the script, not that this is what ought
to happen in general. My question is how to achieve this
behaviour, since I'm having trouble working around mpirun
catching sigint.
Like I said in my other response, you can't - mpirun automatically traps sigint and terminates the job in order to ensure proper cleanup during abnormal terminations.
I'm not sure what you are actually trying to accomplish, but there are other signals that don't cause termination. For example, we trap and forward SIGUSR1 and SIGUSR2 to your application procs, if that is of use.
But ctrl-c has a special meaning ("die"), and you can't tell mpirun to ignore it.
Post by Pablo Lopez Rios
Thanks,
Pablo
Post by Ralph Castain
Post by Pablo Lopez Rios
Hi,
I'm having a bit of a problem with wrapping mpirun in a script. The script needs to run an MPI job in the background and tail -f the output. Pressing Ctrl+C should stop tail -f, and the MPI job should continue.
I don't think that is true at all. When you hit ctrl-C, every process executing in the script receives it. Mpirun traps the ctrl-c and immediately terminates all running MPI procs.
Post by Pablo Lopez Rios
However mpirun seems to detect the SIGINT that was meant for tail, and kills the job immediately. I've tried workarounds involving nohup, disown, trap, subshells (including calling the script from within itself), etc, to no avail.
( trap "" sigint; exec mpiexec ...)&
i.e. replace the subshell with changed interrupt handling with the mpiexec. Well, maybe mpiexec is adjusting it on its own again. This can be checked in /proc/<pid>/status
-- Reuti
$ ./ompi_bug.sh mpi
1
2
3
4
^C
$ ./ompi_bug.sh nompi
1
2
3
4
^C
$ cat output.*
1
2
3
4
mpirun: killing job...
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 1222 on node pablomme exited on signal 0 (Unknown signal 0).
--------------------------------------------------------------------------
mpirun: clean termination accomplished
1
2
3
4
5
6
7
8
9
10
Done
This convinces me that there is something strange with OpenMPI, since I expect no difference in signal handling when running a simple command with or without mpirun in the middle.
I've tried looking for options to change this behaviour, but I don't seem to find any. Is there one, preferably in the form of an environment variable? Or is this a bug?
I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also v1.2.8 as distributed with OpenSUSE 11.3.
Thanks,
Pablo
<ompi_bug.sh.gz>_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
Ralph Castain
2011-04-23 17:33:25 UTC
Permalink
Post by Pablo Lopez Rios
Post by Ralph Castain
I'm not sure what you are actually trying to accomplish
mpirun command>& out&
tail -f out
such that hitting Ctrl+C stops tail but leaves mpirun running. I can certainly do this without mpirun,
I don't think that's true. If both commands are in a script, then at least for me, a ctrl-c of the -script- will cause ctrl-c to be sent to -both- processes.

At least when I test it, even non-mpirun processes will abort.
Post by Pablo Lopez Rios
it's not unreasonable to expect to be able to do the same with mpirun.
I'm afraid it won't work, per my earlier comments.
Post by Pablo Lopez Rios
I need mpirun to either ignore the SIGINT or not receive it at all -- and as per your comments, ignoring it is not an option.
mpirun command>& out&
tail -f out
(
trap : SIGINT
mpirun command>& out&
)
tail -f out
(
trap : SIGINT
(
mpirun command>& out&
)
)
tail -f out
also has the same effect. How is mpirun overriding the trap in the *parent* subshell so that it ends up getting the SIGINT that was supposedly blocked at that level? Am I missing something trivial? How can I avoid this?
I keep telling you - you can't. The better way to do this is to execute mpirun, and then run tail in a -separate- command. Now you can ctrl-c tail without mpirun seeing it.

But you are welcome to not believe me and continue thrashing... :-/
Post by Pablo Lopez Rios
Thanks,
Pablo
Post by Ralph Castain
Post by Pablo Lopez Rios
Post by Ralph Castain
Post by Pablo Lopez Rios
Pressing Ctrl+C should stop tail -f, and the MPI job
should continue.
I don't think that is true at all. When you hit ctrl-C,
every process executing in the script receives it. Mpirun
traps the ctrl-c and immediately terminates all running
MPI procs.
By "Ctrl+C should stop tail -f" I mean that this is the
desired behaviour of the script, not that this is what ought
to happen in general. My question is how to achieve this
behaviour, since I'm having trouble working around mpirun
catching sigint.
Like I said in my other response, you can't - mpirun automatically traps sigint and terminates the job in order to ensure proper cleanup during abnormal terminations.
I'm not sure what you are actually trying to accomplish, but there are other signals that don't cause termination. For example, we trap and forward SIGUSR1 and SIGUSR2 to your application procs, if that is of use.
But ctrl-c has a special meaning ("die"), and you can't tell mpirun to ignore it.
Post by Pablo Lopez Rios
Thanks,
Pablo
Post by Ralph Castain
Post by Pablo Lopez Rios
Hi,
I'm having a bit of a problem with wrapping mpirun in a script. The script needs to run an MPI job in the background and tail -f the output. Pressing Ctrl+C should stop tail -f, and the MPI job should continue.
I don't think that is true at all. When you hit ctrl-C, every process executing in the script receives it. Mpirun traps the ctrl-c and immediately terminates all running MPI procs.
Post by Pablo Lopez Rios
However mpirun seems to detect the SIGINT that was meant for tail, and kills the job immediately. I've tried workarounds involving nohup, disown, trap, subshells (including calling the script from within itself), etc, to no avail.
( trap "" sigint; exec mpiexec ...)&
i.e. replace the subshell with changed interrupt handling with the mpiexec. Well, maybe mpiexec is adjusting it on its own again. This can be checked in /proc/<pid>/status
-- Reuti
$ ./ompi_bug.sh mpi
1
2
3
4
^C
$ ./ompi_bug.sh nompi
1
2
3
4
^C
$ cat output.*
1
2
3
4
mpirun: killing job...
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 1222 on node pablomme exited on signal 0 (Unknown signal 0).
--------------------------------------------------------------------------
mpirun: clean termination accomplished
1
2
3
4
5
6
7
8
9
10
Done
This convinces me that there is something strange with OpenMPI, since I expect no difference in signal handling when running a simple command with or without mpirun in the middle.
I've tried looking for options to change this behaviour, but I don't seem to find any. Is there one, preferably in the form of an environment variable? Or is this a bug?
I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also v1.2.8 as distributed with OpenSUSE 11.3.
Thanks,
Pablo
<ompi_bug.sh.gz>_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
Reuti
2011-04-23 17:39:33 UTC
Permalink
Post by Ralph Castain
Post by Pablo Lopez Rios
Post by Ralph Castain
I'm not sure what you are actually trying to accomplish
mpirun command>& out&
tail -f out
such that hitting Ctrl+C stops tail but leaves mpirun running. I can certainly do this without mpirun,
I don't think that's true. If both commands are in a script, then at least for me, a ctrl-c of the -script- will cause ctrl-c to be sent to -both- processes.
What about setsid and pushing it in a new seesion instead of using & in the script?

-- Reuti
Post by Ralph Castain
At least when I test it, even non-mpirun processes will abort.
Post by Pablo Lopez Rios
it's not unreasonable to expect to be able to do the same with mpirun.
I'm afraid it won't work, per my earlier comments.
Post by Pablo Lopez Rios
I need mpirun to either ignore the SIGINT or not receive it at all -- and as per your comments, ignoring it is not an option.
mpirun command>& out&
tail -f out
(
trap : SIGINT
mpirun command>& out&
)
tail -f out
(
trap : SIGINT
(
mpirun command>& out&
)
)
tail -f out
also has the same effect. How is mpirun overriding the trap in the *parent* subshell so that it ends up getting the SIGINT that was supposedly blocked at that level? Am I missing something trivial? How can I avoid this?
I keep telling you - you can't. The better way to do this is to execute mpirun, and then run tail in a -separate- command. Now you can ctrl-c tail without mpirun seeing it.
But you are welcome to not believe me and continue thrashing... :-/
Post by Pablo Lopez Rios
Thanks,
Pablo
Post by Ralph Castain
Post by Pablo Lopez Rios
Post by Ralph Castain
Post by Pablo Lopez Rios
Pressing Ctrl+C should stop tail -f, and the MPI job
should continue.
I don't think that is true at all. When you hit ctrl-C,
every process executing in the script receives it. Mpirun
traps the ctrl-c and immediately terminates all running
MPI procs.
By "Ctrl+C should stop tail -f" I mean that this is the
desired behaviour of the script, not that this is what ought
to happen in general. My question is how to achieve this
behaviour, since I'm having trouble working around mpirun
catching sigint.
Like I said in my other response, you can't - mpirun automatically traps sigint and terminates the job in order to ensure proper cleanup during abnormal terminations.
I'm not sure what you are actually trying to accomplish, but there are other signals that don't cause termination. For example, we trap and forward SIGUSR1 and SIGUSR2 to your application procs, if that is of use.
But ctrl-c has a special meaning ("die"), and you can't tell mpirun to ignore it.
Post by Pablo Lopez Rios
Thanks,
Pablo
Post by Ralph Castain
Post by Pablo Lopez Rios
Hi,
I'm having a bit of a problem with wrapping mpirun in a script. The script needs to run an MPI job in the background and tail -f the output. Pressing Ctrl+C should stop tail -f, and the MPI job should continue.
I don't think that is true at all. When you hit ctrl-C, every process executing in the script receives it. Mpirun traps the ctrl-c and immediately terminates all running MPI procs.
Post by Pablo Lopez Rios
However mpirun seems to detect the SIGINT that was meant for tail, and kills the job immediately. I've tried workarounds involving nohup, disown, trap, subshells (including calling the script from within itself), etc, to no avail.
( trap "" sigint; exec mpiexec ...)&
i.e. replace the subshell with changed interrupt handling with the mpiexec. Well, maybe mpiexec is adjusting it on its own again. This can be checked in /proc/<pid>/status
-- Reuti
$ ./ompi_bug.sh mpi
1
2
3
4
^C
$ ./ompi_bug.sh nompi
1
2
3
4
^C
$ cat output.*
1
2
3
4
mpirun: killing job...
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 1222 on node pablomme exited on signal 0 (Unknown signal 0).
--------------------------------------------------------------------------
mpirun: clean termination accomplished
1
2
3
4
5
6
7
8
9
10
Done
This convinces me that there is something strange with OpenMPI, since I expect no difference in signal handling when running a simple command with or without mpirun in the middle.
I've tried looking for options to change this behaviour, but I don't seem to find any. Is there one, preferably in the form of an environment variable? Or is this a bug?
I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also v1.2.8 as distributed with OpenSUSE 11.3.
Thanks,
Pablo
<ompi_bug.sh.gz>_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
Pablo Lopez Rios
2011-04-23 17:55:22 UTC
Permalink
Post by Reuti
What about setsid and pushing it in a new
seesion instead of using& in the script?
:-) That works. Thanks!

NB, the working script looks like:

setsid bash -c "mpirun command>& out"&
tail -f out

Thanks,
Pablo
Post by Reuti
Post by Ralph Castain
Post by Ralph Castain
I'm not sure what you are actually trying to accomplish
mpirun command>& out&
tail -f out
such that hitting Ctrl+C stops tail but leaves mpirun running. I can certainly do this without mpirun,
I don't think that's true. If both commands are in a script, then at least for me, a ctrl-c of the -script- will cause ctrl-c to be sent to -both- processes.
What about setsid and pushing it in a new seesion instead of using& in the script?
-- Reuti
Post by Ralph Castain
At least when I test it, even non-mpirun processes will abort.
it's not unreasonable to expect to be able to do the same with mpirun.
I'm afraid it won't work, per my earlier comments.
I need mpirun to either ignore the SIGINT or not receive it at all -- and as per your comments, ignoring it is not an option.
mpirun command>& out&
tail -f out
(
trap : SIGINT
mpirun command>& out&
)
tail -f out
(
trap : SIGINT
(
mpirun command>& out&
)
)
tail -f out
also has the same effect. How is mpirun overriding the trap in the *parent* subshell so that it ends up getting the SIGINT that was supposedly blocked at that level? Am I missing something trivial? How can I avoid this?
I keep telling you - you can't. The better way to do this is to execute mpirun, and then run tail in a -separate- command. Now you can ctrl-c tail without mpirun seeing it.
But you are welcome to not believe me and continue thrashing... :-/
Thanks,
Pablo
Post by Ralph Castain
Post by Pablo Lopez Rios
Post by Ralph Castain
Post by Pablo Lopez Rios
Pressing Ctrl+C should stop tail -f, and the MPI job
should continue.
I don't think that is true at all. When you hit ctrl-C,
every process executing in the script receives it. Mpirun
traps the ctrl-c and immediately terminates all running
MPI procs.
By "Ctrl+C should stop tail -f" I mean that this is the
desired behaviour of the script, not that this is what ought
to happen in general. My question is how to achieve this
behaviour, since I'm having trouble working around mpirun
catching sigint.
Like I said in my other response, you can't - mpirun automatically traps sigint and terminates the job in order to ensure proper cleanup during abnormal terminations.
I'm not sure what you are actually trying to accomplish, but there are other signals that don't cause termination. For example, we trap and forward SIGUSR1 and SIGUSR2 to your application procs, if that is of use.
But ctrl-c has a special meaning ("die"), and you can't tell mpirun to ignore it.
Post by Pablo Lopez Rios
Thanks,
Pablo
Post by Ralph Castain
Post by Pablo Lopez Rios
Hi,
I'm having a bit of a problem with wrapping mpirun in a script. The script needs to run an MPI job in the background and tail -f the output. Pressing Ctrl+C should stop tail -f, and the MPI job should continue.
I don't think that is true at all. When you hit ctrl-C, every process executing in the script receives it. Mpirun traps the ctrl-c and immediately terminates all running MPI procs.
Post by Pablo Lopez Rios
However mpirun seems to detect the SIGINT that was meant for tail, and kills the job immediately. I've tried workarounds involving nohup, disown, trap, subshells (including calling the script from within itself), etc, to no avail.
( trap "" sigint; exec mpiexec ...)&
i.e. replace the subshell with changed interrupt handling with the mpiexec. Well, maybe mpiexec is adjusting it on its own again. This can be checked in /proc/<pid>/status
-- Reuti
$ ./ompi_bug.sh mpi
1
2
3
4
^C
$ ./ompi_bug.sh nompi
1
2
3
4
^C
$ cat output.*
1
2
3
4
mpirun: killing job...
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 1222 on node pablomme exited on signal 0 (Unknown signal 0).
--------------------------------------------------------------------------
mpirun: clean termination accomplished
1
2
3
4
5
6
7
8
9
10
Done
This convinces me that there is something strange with OpenMPI, since I expect no difference in signal handling when running a simple command with or without mpirun in the middle.
I've tried looking for options to change this behaviour, but I don't seem to find any. Is there one, preferably in the form of an environment variable? Or is this a bug?
I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also v1.2.8 as distributed with OpenSUSE 11.3.
Thanks,
Pablo
<ompi_bug.sh.gz>_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
Ralph Castain
2011-04-23 17:58:05 UTC
Permalink
Post by Pablo Lopez Rios
Post by Reuti
What about setsid and pushing it in a new
seesion instead of using& in the script?
:-) That works. Thanks!
setsid bash -c "mpirun command>& out"&
tail -f out
Yes - but now you can't kill mpirun when something goes wrong....<shrug>
Post by Pablo Lopez Rios
Thanks,
Pablo
Post by Reuti
Post by Ralph Castain
Post by Ralph Castain
I'm not sure what you are actually trying to accomplish
mpirun command>& out&
tail -f out
such that hitting Ctrl+C stops tail but leaves mpirun running. I can certainly do this without mpirun,
I don't think that's true. If both commands are in a script, then at least for me, a ctrl-c of the -script- will cause ctrl-c to be sent to -both- processes.
What about setsid and pushing it in a new seesion instead of using& in the script?
-- Reuti
Post by Ralph Castain
At least when I test it, even non-mpirun processes will abort.
it's not unreasonable to expect to be able to do the same with mpirun.
I'm afraid it won't work, per my earlier comments.
I need mpirun to either ignore the SIGINT or not receive it at all -- and as per your comments, ignoring it is not an option.
mpirun command>& out&
tail -f out
(
trap : SIGINT
mpirun command>& out&
)
tail -f out
(
trap : SIGINT
(
mpirun command>& out&
)
)
tail -f out
also has the same effect. How is mpirun overriding the trap in the *parent* subshell so that it ends up getting the SIGINT that was supposedly blocked at that level? Am I missing something trivial? How can I avoid this?
I keep telling you - you can't. The better way to do this is to execute mpirun, and then run tail in a -separate- command. Now you can ctrl-c tail without mpirun seeing it.
But you are welcome to not believe me and continue thrashing... :-/
Thanks,
Pablo
Post by Ralph Castain
Post by Pablo Lopez Rios
Post by Ralph Castain
Post by Pablo Lopez Rios
Pressing Ctrl+C should stop tail -f, and the MPI job
should continue.
I don't think that is true at all. When you hit ctrl-C,
every process executing in the script receives it. Mpirun
traps the ctrl-c and immediately terminates all running
MPI procs.
By "Ctrl+C should stop tail -f" I mean that this is the
desired behaviour of the script, not that this is what ought
to happen in general. My question is how to achieve this
behaviour, since I'm having trouble working around mpirun
catching sigint.
Like I said in my other response, you can't - mpirun automatically traps sigint and terminates the job in order to ensure proper cleanup during abnormal terminations.
I'm not sure what you are actually trying to accomplish, but there are other signals that don't cause termination. For example, we trap and forward SIGUSR1 and SIGUSR2 to your application procs, if that is of use.
But ctrl-c has a special meaning ("die"), and you can't tell mpirun to ignore it.
Post by Pablo Lopez Rios
Thanks,
Pablo
Post by Ralph Castain
Post by Pablo Lopez Rios
Hi,
I'm having a bit of a problem with wrapping mpirun in a script. The script needs to run an MPI job in the background and tail -f the output. Pressing Ctrl+C should stop tail -f, and the MPI job should continue.
I don't think that is true at all. When you hit ctrl-C, every process executing in the script receives it. Mpirun traps the ctrl-c and immediately terminates all running MPI procs.
Post by Pablo Lopez Rios
However mpirun seems to detect the SIGINT that was meant for tail, and kills the job immediately. I've tried workarounds involving nohup, disown, trap, subshells (including calling the script from within itself), etc, to no avail.
( trap "" sigint; exec mpiexec ...)&
i.e. replace the subshell with changed interrupt handling with the mpiexec. Well, maybe mpiexec is adjusting it on its own again. This can be checked in /proc/<pid>/status
-- Reuti
$ ./ompi_bug.sh mpi
1
2
3
4
^C
$ ./ompi_bug.sh nompi
1
2
3
4
^C
$ cat output.*
1
2
3
4
mpirun: killing job...
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 1222 on node pablomme exited on signal 0 (Unknown signal 0).
--------------------------------------------------------------------------
mpirun: clean termination accomplished
1
2
3
4
5
6
7
8
9
10
Done
This convinces me that there is something strange with OpenMPI, since I expect no difference in signal handling when running a simple command with or without mpirun in the middle.
I've tried looking for options to change this behaviour, but I don't seem to find any. Is there one, preferably in the form of an environment variable? Or is this a bug?
I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also v1.2.8 as distributed with OpenSUSE 11.3.
Thanks,
Pablo
<ompi_bug.sh.gz>_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
Pablo Lopez Rios
2011-04-23 18:03:01 UTC
Permalink
Not directly, as in fg + Ctrl+C, but 'killall mpirun' or 'killall
command' works as usual. I wanted a script with the same effect as
running 'runscript & tail -f out' from the command line, and this is
exactly it.
Post by Ralph Castain
Post by Pablo Lopez Rios
Post by Reuti
What about setsid and pushing it in a new
seesion instead of using& in the script?
:-) That works. Thanks!
setsid bash -c "mpirun command>& out"&
tail -f out
Yes - but now you can't kill mpirun when something goes wrong....<shrug>
Post by Pablo Lopez Rios
Thanks,
Pablo
Post by Reuti
Post by Ralph Castain
Post by Ralph Castain
I'm not sure what you are actually trying to accomplish
mpirun command>& out&
tail -f out
such that hitting Ctrl+C stops tail but leaves mpirun running. I can certainly do this without mpirun,
I don't think that's true. If both commands are in a script, then at least for me, a ctrl-c of the -script- will cause ctrl-c to be sent to -both- processes.
What about setsid and pushing it in a new seesion instead of using& in the script?
-- Reuti
Post by Ralph Castain
At least when I test it, even non-mpirun processes will abort.
it's not unreasonable to expect to be able to do the same with mpirun.
I'm afraid it won't work, per my earlier comments.
I need mpirun to either ignore the SIGINT or not receive it at all -- and as per your comments, ignoring it is not an option.
mpirun command>& out&
tail -f out
(
trap : SIGINT
mpirun command>& out&
)
tail -f out
(
trap : SIGINT
(
mpirun command>& out&
)
)
tail -f out
also has the same effect. How is mpirun overriding the trap in the *parent* subshell so that it ends up getting the SIGINT that was supposedly blocked at that level? Am I missing something trivial? How can I avoid this?
I keep telling you - you can't. The better way to do this is to execute mpirun, and then run tail in a -separate- command. Now you can ctrl-c tail without mpirun seeing it.
But you are welcome to not believe me and continue thrashing... :-/
Thanks,
Pablo
Post by Ralph Castain
Post by Pablo Lopez Rios
Post by Ralph Castain
Post by Pablo Lopez Rios
Pressing Ctrl+C should stop tail -f, and the MPI job
should continue.
I don't think that is true at all. When you hit ctrl-C,
every process executing in the script receives it. Mpirun
traps the ctrl-c and immediately terminates all running
MPI procs.
By "Ctrl+C should stop tail -f" I mean that this is the
desired behaviour of the script, not that this is what ought
to happen in general. My question is how to achieve this
behaviour, since I'm having trouble working around mpirun
catching sigint.
Like I said in my other response, you can't - mpirun automatically traps sigint and terminates the job in order to ensure proper cleanup during abnormal terminations.
I'm not sure what you are actually trying to accomplish, but there are other signals that don't cause termination. For example, we trap and forward SIGUSR1 and SIGUSR2 to your application procs, if that is of use.
But ctrl-c has a special meaning ("die"), and you can't tell mpirun to ignore it.
Post by Pablo Lopez Rios
Thanks,
Pablo
Post by Ralph Castain
Post by Pablo Lopez Rios
Hi,
I'm having a bit of a problem with wrapping mpirun in a script. The script needs to run an MPI job in the background and tail -f the output. Pressing Ctrl+C should stop tail -f, and the MPI job should continue.
I don't think that is true at all. When you hit ctrl-C, every process executing in the script receives it. Mpirun traps the ctrl-c and immediately terminates all running MPI procs.
Post by Pablo Lopez Rios
However mpirun seems to detect the SIGINT that was meant for tail, and kills the job immediately. I've tried workarounds involving nohup, disown, trap, subshells (including calling the script from within itself), etc, to no avail.
( trap "" sigint; exec mpiexec ...)&
i.e. replace the subshell with changed interrupt handling with the mpiexec. Well, maybe mpiexec is adjusting it on its own again. This can be checked in /proc/<pid>/status
-- Reuti
$ ./ompi_bug.sh mpi
1
2
3
4
^C
$ ./ompi_bug.sh nompi
1
2
3
4
^C
$ cat output.*
1
2
3
4
mpirun: killing job...
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 1222 on node pablomme exited on signal 0 (Unknown signal 0).
--------------------------------------------------------------------------
mpirun: clean termination accomplished
1
2
3
4
5
6
7
8
9
10
Done
This convinces me that there is something strange with OpenMPI, since I expect no difference in signal handling when running a simple command with or without mpirun in the middle.
I've tried looking for options to change this behaviour, but I don't seem to find any. Is there one, preferably in the form of an environment variable? Or is this a bug?
I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also v1.2.8 as distributed with OpenSUSE 11.3.
Thanks,
Pablo
<ompi_bug.sh.gz>_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
Reuti
2011-04-23 18:07:22 UTC
Permalink
Post by Ralph Castain
Post by Pablo Lopez Rios
Post by Reuti
What about setsid and pushing it in a new
seesion instead of using& in the script?
:-) That works. Thanks!
setsid bash -c "mpirun command>& out"&
tail -f out
Yes - but now you can't kill mpirun when something goes wrong....<shrug>
You can still send a sigint from the command line to the mpirun process or its process group besides killall.

-- Reuti
Post by Ralph Castain
Post by Pablo Lopez Rios
Thanks,
Pablo
Post by Reuti
Post by Ralph Castain
Post by Ralph Castain
I'm not sure what you are actually trying to accomplish
mpirun command>& out&
tail -f out
such that hitting Ctrl+C stops tail but leaves mpirun running. I can certainly do this without mpirun,
I don't think that's true. If both commands are in a script, then at least for me, a ctrl-c of the -script- will cause ctrl-c to be sent to -both- processes.
What about setsid and pushing it in a new seesion instead of using& in the script?
-- Reuti
Post by Ralph Castain
At least when I test it, even non-mpirun processes will abort.
it's not unreasonable to expect to be able to do the same with mpirun.
I'm afraid it won't work, per my earlier comments.
I need mpirun to either ignore the SIGINT or not receive it at all -- and as per your comments, ignoring it is not an option.
mpirun command>& out&
tail -f out
(
trap : SIGINT
mpirun command>& out&
)
tail -f out
(
trap : SIGINT
(
mpirun command>& out&
)
)
tail -f out
also has the same effect. How is mpirun overriding the trap in the *parent* subshell so that it ends up getting the SIGINT that was supposedly blocked at that level? Am I missing something trivial? How can I avoid this?
I keep telling you - you can't. The better way to do this is to execute mpirun, and then run tail in a -separate- command. Now you can ctrl-c tail without mpirun seeing it.
But you are welcome to not believe me and continue thrashing... :-/
Thanks,
Pablo
Post by Ralph Castain
Post by Pablo Lopez Rios
Post by Ralph Castain
Post by Pablo Lopez Rios
Pressing Ctrl+C should stop tail -f, and the MPI job
should continue.
I don't think that is true at all. When you hit ctrl-C,
every process executing in the script receives it. Mpirun
traps the ctrl-c and immediately terminates all running
MPI procs.
By "Ctrl+C should stop tail -f" I mean that this is the
desired behaviour of the script, not that this is what ought
to happen in general. My question is how to achieve this
behaviour, since I'm having trouble working around mpirun
catching sigint.
Like I said in my other response, you can't - mpirun automatically traps sigint and terminates the job in order to ensure proper cleanup during abnormal terminations.
I'm not sure what you are actually trying to accomplish, but there are other signals that don't cause termination. For example, we trap and forward SIGUSR1 and SIGUSR2 to your application procs, if that is of use.
But ctrl-c has a special meaning ("die"), and you can't tell mpirun to ignore it.
Post by Pablo Lopez Rios
Thanks,
Pablo
Post by Ralph Castain
Post by Pablo Lopez Rios
Hi,
I'm having a bit of a problem with wrapping mpirun in a script. The script needs to run an MPI job in the background and tail -f the output. Pressing Ctrl+C should stop tail -f, and the MPI job should continue.
I don't think that is true at all. When you hit ctrl-C, every process executing in the script receives it. Mpirun traps the ctrl-c and immediately terminates all running MPI procs.
Post by Pablo Lopez Rios
However mpirun seems to detect the SIGINT that was meant for tail, and kills the job immediately. I've tried workarounds involving nohup, disown, trap, subshells (including calling the script from within itself), etc, to no avail.
( trap "" sigint; exec mpiexec ...)&
i.e. replace the subshell with changed interrupt handling with the mpiexec. Well, maybe mpiexec is adjusting it on its own again. This can be checked in /proc/<pid>/status
-- Reuti
$ ./ompi_bug.sh mpi
1
2
3
4
^C
$ ./ompi_bug.sh nompi
1
2
3
4
^C
$ cat output.*
1
2
3
4
mpirun: killing job...
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 1222 on node pablomme exited on signal 0 (Unknown signal 0).
--------------------------------------------------------------------------
mpirun: clean termination accomplished
1
2
3
4
5
6
7
8
9
10
Done
This convinces me that there is something strange with OpenMPI, since I expect no difference in signal handling when running a simple command with or without mpirun in the middle.
I've tried looking for options to change this behaviour, but I don't seem to find any. Is there one, preferably in the form of an environment variable? Or is this a bug?
I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also v1.2.8 as distributed with OpenSUSE 11.3.
Thanks,
Pablo
<ompi_bug.sh.gz>_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
Ralph Castain
2011-04-23 18:14:21 UTC
Permalink
Post by Reuti
Post by Ralph Castain
Post by Pablo Lopez Rios
Post by Reuti
What about setsid and pushing it in a new
seesion instead of using& in the script?
:-) That works. Thanks!
setsid bash -c "mpirun command>& out"&
tail -f out
Yes - but now you can't kill mpirun when something goes wrong....<shrug>
You can still send a sigint from the command line to the mpirun process or its process group besides killall.
Yes - or I could just have run tail in a separate shell and avoided the entire email thread and problem... :-)

Whatever...so long as peace returns.
Post by Reuti
-- Reuti
Post by Ralph Castain
Post by Pablo Lopez Rios
Thanks,
Pablo
Post by Reuti
Post by Ralph Castain
Post by Ralph Castain
I'm not sure what you are actually trying to accomplish
mpirun command>& out&
tail -f out
such that hitting Ctrl+C stops tail but leaves mpirun running. I can certainly do this without mpirun,
I don't think that's true. If both commands are in a script, then at least for me, a ctrl-c of the -script- will cause ctrl-c to be sent to -both- processes.
What about setsid and pushing it in a new seesion instead of using& in the script?
-- Reuti
Post by Ralph Castain
At least when I test it, even non-mpirun processes will abort.
it's not unreasonable to expect to be able to do the same with mpirun.
I'm afraid it won't work, per my earlier comments.
I need mpirun to either ignore the SIGINT or not receive it at all -- and as per your comments, ignoring it is not an option.
mpirun command>& out&
tail -f out
(
trap : SIGINT
mpirun command>& out&
)
tail -f out
(
trap : SIGINT
(
mpirun command>& out&
)
)
tail -f out
also has the same effect. How is mpirun overriding the trap in the *parent* subshell so that it ends up getting the SIGINT that was supposedly blocked at that level? Am I missing something trivial? How can I avoid this?
I keep telling you - you can't. The better way to do this is to execute mpirun, and then run tail in a -separate- command. Now you can ctrl-c tail without mpirun seeing it.
But you are welcome to not believe me and continue thrashing... :-/
Thanks,
Pablo
Post by Ralph Castain
Post by Pablo Lopez Rios
Post by Ralph Castain
Post by Pablo Lopez Rios
Pressing Ctrl+C should stop tail -f, and the MPI job
should continue.
I don't think that is true at all. When you hit ctrl-C,
every process executing in the script receives it. Mpirun
traps the ctrl-c and immediately terminates all running
MPI procs.
By "Ctrl+C should stop tail -f" I mean that this is the
desired behaviour of the script, not that this is what ought
to happen in general. My question is how to achieve this
behaviour, since I'm having trouble working around mpirun
catching sigint.
Like I said in my other response, you can't - mpirun automatically traps sigint and terminates the job in order to ensure proper cleanup during abnormal terminations.
I'm not sure what you are actually trying to accomplish, but there are other signals that don't cause termination. For example, we trap and forward SIGUSR1 and SIGUSR2 to your application procs, if that is of use.
But ctrl-c has a special meaning ("die"), and you can't tell mpirun to ignore it.
Post by Pablo Lopez Rios
Thanks,
Pablo
Post by Ralph Castain
Post by Pablo Lopez Rios
Hi,
I'm having a bit of a problem with wrapping mpirun in a script. The script needs to run an MPI job in the background and tail -f the output. Pressing Ctrl+C should stop tail -f, and the MPI job should continue.
I don't think that is true at all. When you hit ctrl-C, every process executing in the script receives it. Mpirun traps the ctrl-c and immediately terminates all running MPI procs.
Post by Pablo Lopez Rios
However mpirun seems to detect the SIGINT that was meant for tail, and kills the job immediately. I've tried workarounds involving nohup, disown, trap, subshells (including calling the script from within itself), etc, to no avail.
( trap "" sigint; exec mpiexec ...)&
i.e. replace the subshell with changed interrupt handling with the mpiexec. Well, maybe mpiexec is adjusting it on its own again. This can be checked in /proc/<pid>/status
-- Reuti
$ ./ompi_bug.sh mpi
1
2
3
4
^C
$ ./ompi_bug.sh nompi
1
2
3
4
^C
$ cat output.*
1
2
3
4
mpirun: killing job...
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 1222 on node pablomme exited on signal 0 (Unknown signal 0).
--------------------------------------------------------------------------
mpirun: clean termination accomplished
1
2
3
4
5
6
7
8
9
10
Done
This convinces me that there is something strange with OpenMPI, since I expect no difference in signal handling when running a simple command with or without mpirun in the middle.
I've tried looking for options to change this behaviour, but I don't seem to find any. Is there one, preferably in the form of an environment variable? Or is this a bug?
I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also v1.2.8 as distributed with OpenSUSE 11.3.
Thanks,
Pablo
<ompi_bug.sh.gz>_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
Pablo Lopez Rios
2011-04-23 18:00:04 UTC
Permalink
Probably you are right in that if the executable in question actively
requests to trap SIGINT there is nothing you can do (short of running in
a new session as suggested by Reuti). But try the script in my first
email, or look at the output I printed; it works for other commands.
Post by Ralph Castain
Post by Ralph Castain
I'm not sure what you are actually trying to accomplish
mpirun command>& out&
tail -f out
such that hitting Ctrl+C stops tail but leaves mpirun running. I can certainly do this without mpirun,
I don't think that's true. If both commands are in a script, then at least for me, a ctrl-c of the -script- will cause ctrl-c to be sent to -both- processes.
At least when I test it, even non-mpirun processes will abort.
it's not unreasonable to expect to be able to do the same with mpirun.
I'm afraid it won't work, per my earlier comments.
I need mpirun to either ignore the SIGINT or not receive it at all -- and as per your comments, ignoring it is not an option.
mpirun command>& out&
tail -f out
(
trap : SIGINT
mpirun command>& out&
)
tail -f out
(
trap : SIGINT
(
mpirun command>& out&
)
)
tail -f out
also has the same effect. How is mpirun overriding the trap in the *parent* subshell so that it ends up getting the SIGINT that was supposedly blocked at that level? Am I missing something trivial? How can I avoid this?
I keep telling you - you can't. The better way to do this is to execute mpirun, and then run tail in a -separate- command. Now you can ctrl-c tail without mpirun seeing it.
But you are welcome to not believe me and continue thrashing... :-/
Thanks,
Pablo
Post by Ralph Castain
Post by Pablo Lopez Rios
Post by Ralph Castain
Post by Pablo Lopez Rios
Pressing Ctrl+C should stop tail -f, and the MPI job
should continue.
I don't think that is true at all. When you hit ctrl-C,
every process executing in the script receives it. Mpirun
traps the ctrl-c and immediately terminates all running
MPI procs.
By "Ctrl+C should stop tail -f" I mean that this is the
desired behaviour of the script, not that this is what ought
to happen in general. My question is how to achieve this
behaviour, since I'm having trouble working around mpirun
catching sigint.
Like I said in my other response, you can't - mpirun automatically traps sigint and terminates the job in order to ensure proper cleanup during abnormal terminations.
I'm not sure what you are actually trying to accomplish, but there are other signals that don't cause termination. For example, we trap and forward SIGUSR1 and SIGUSR2 to your application procs, if that is of use.
But ctrl-c has a special meaning ("die"), and you can't tell mpirun to ignore it.
Post by Pablo Lopez Rios
Thanks,
Pablo
Post by Ralph Castain
Post by Pablo Lopez Rios
Hi,
I'm having a bit of a problem with wrapping mpirun in a script. The script needs to run an MPI job in the background and tail -f the output. Pressing Ctrl+C should stop tail -f, and the MPI job should continue.
I don't think that is true at all. When you hit ctrl-C, every process executing in the script receives it. Mpirun traps the ctrl-c and immediately terminates all running MPI procs.
Post by Pablo Lopez Rios
However mpirun seems to detect the SIGINT that was meant for tail, and kills the job immediately. I've tried workarounds involving nohup, disown, trap, subshells (including calling the script from within itself), etc, to no avail.
( trap "" sigint; exec mpiexec ...)&
i.e. replace the subshell with changed interrupt handling with the mpiexec. Well, maybe mpiexec is adjusting it on its own again. This can be checked in /proc/<pid>/status
-- Reuti
$ ./ompi_bug.sh mpi
1
2
3
4
^C
$ ./ompi_bug.sh nompi
1
2
3
4
^C
$ cat output.*
1
2
3
4
mpirun: killing job...
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 1222 on node pablomme exited on signal 0 (Unknown signal 0).
--------------------------------------------------------------------------
mpirun: clean termination accomplished
1
2
3
4
5
6
7
8
9
10
Done
This convinces me that there is something strange with OpenMPI, since I expect no difference in signal handling when running a simple command with or without mpirun in the middle.
I've tried looking for options to change this behaviour, but I don't seem to find any. Is there one, preferably in the form of an environment variable? Or is this a bug?
I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also v1.2.8 as distributed with OpenSUSE 11.3.
Thanks,
Pablo
<ompi_bug.sh.gz>_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
Pablo Lopez Rios
2011-04-23 15:07:02 UTC
Permalink
( trap "" sigint; exec mpiexec ...)&
Yup, that's included in the workarounds I tried. Tried again with your specific suggestion; no luck.
Well, maybe mpiexec is adjusting it on its own
again. This can be checked in /proc/<pid>/status
The signal masks in /proc/$!/status are:

nompi (bash):
SigBlk: 0000000000010000 -> 16 blocked
SigIgn: 0000000000000006 -> 1,2 ignored
SigCgt: 0000000000010000 -> 16 caught

mpi (mpirun):
SigBlk: 0000000000000000 -> none blocked
SigIgn: 0000000000000004 -> 2 ignored
SigCgt: 0000000180015ee2 -> 1,5,6,7,9,10,11,12,14,16,31,32 caught

I think I'm off by one in interpreting the above masks (for instance I would expect signals 30 and 31 to be caught, not 31 and 32), but I'm already assuming that the least significant bit is "signal 0"; assuming it is "signal 1" would just worsen the values.

Anyway, why does mpirun bypass the traps I try to set and how do I stop it doing so?

Thanks,
Pablo
Hi,
I'm having a bit of a problem with wrapping mpirun in a script. The script needs to run an MPI job in the background and tail -f the output. Pressing Ctrl+C should stop tail -f, and the MPI job should continue. However mpirun seems to detect the SIGINT that was meant for tail, and kills the job immediately. I've tried workarounds involving nohup, disown, trap, subshells (including calling the script from within itself), etc, to no avail.
( trap "" sigint; exec mpiexec ...)&
i.e. replace the subshell with changed interrupt handling with the mpiexec. Well, maybe mpiexec is adjusting it on its own again. This can be checked in /proc/<pid>/status
-- Reuti
$ ./ompi_bug.sh mpi
1
2
3
4
^C
$ ./ompi_bug.sh nompi
1
2
3
4
^C
$ cat output.*
1
2
3
4
mpirun: killing job...
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 1222 on node pablomme exited on signal 0 (Unknown signal 0).
--------------------------------------------------------------------------
mpirun: clean termination accomplished
1
2
3
4
5
6
7
8
9
10
Done
This convinces me that there is something strange with OpenMPI, since I expect no difference in signal handling when running a simple command with or without mpirun in the middle.
I've tried looking for options to change this behaviour, but I don't seem to find any. Is there one, preferably in the form of an environment variable? Or is this a bug?
I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also v1.2.8 as distributed with OpenSUSE 11.3.
Thanks,
Pablo
<ompi_bug.sh.gz>_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
Ralph Castain
2011-04-23 15:15:16 UTC
Permalink
Post by Pablo Lopez Rios
( trap "" sigint; exec mpiexec ...)&
Yup, that's included in the workarounds I tried. Tried again with your specific suggestion; no luck.
Well, maybe mpiexec is adjusting it on its own
again. This can be checked in /proc/<pid>/status
SigBlk: 0000000000010000 -> 16 blocked
SigIgn: 0000000000000006 -> 1,2 ignored
SigCgt: 0000000000010000 -> 16 caught
SigBlk: 0000000000000000 -> none blocked
SigIgn: 0000000000000004 -> 2 ignored
SigCgt: 0000000180015ee2 -> 1,5,6,7,9,10,11,12,14,16,31,32 caught
I think I'm off by one in interpreting the above masks (for instance I would expect signals 30 and 31 to be caught, not 31 and 32), but I'm already assuming that the least significant bit is "signal 0"; assuming it is "signal 1" would just worsen the values.
Anyway, why does mpirun bypass the traps I try to set and how do I stop it doing so?
You can't - this is a design requirement for clean termination of MPI jobs when the user interrupts execution.
Post by Pablo Lopez Rios
Thanks,
Pablo
Hi,
I'm having a bit of a problem with wrapping mpirun in a script. The script needs to run an MPI job in the background and tail -f the output. Pressing Ctrl+C should stop tail -f, and the MPI job should continue. However mpirun seems to detect the SIGINT that was meant for tail, and kills the job immediately. I've tried workarounds involving nohup, disown, trap, subshells (including calling the script from within itself), etc, to no avail.
( trap "" sigint; exec mpiexec ...)&
i.e. replace the subshell with changed interrupt handling with the mpiexec. Well, maybe mpiexec is adjusting it on its own again. This can be checked in /proc/<pid>/status
-- Reuti
$ ./ompi_bug.sh mpi
1
2
3
4
^C
$ ./ompi_bug.sh nompi
1
2
3
4
^C
$ cat output.*
1
2
3
4
mpirun: killing job...
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 1222 on node pablomme exited on signal 0 (Unknown signal 0).
--------------------------------------------------------------------------
mpirun: clean termination accomplished
1
2
3
4
5
6
7
8
9
10
Done
This convinces me that there is something strange with OpenMPI, since I expect no difference in signal handling when running a simple command with or without mpirun in the middle.
I've tried looking for options to change this behaviour, but I don't seem to find any. Is there one, preferably in the form of an environment variable? Or is this a bug?
I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also v1.2.8 as distributed with OpenSUSE 11.3.
Thanks,
Pablo
<ompi_bug.sh.gz>_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
Reuti
2011-04-23 16:17:16 UTC
Permalink
Post by Pablo Lopez Rios
( trap "" sigint; exec mpiexec ...)&
Yup, that's included in the workarounds I tried. Tried again with your specific suggestion; no luck.
Well, maybe mpiexec is adjusting it on its own
again. This can be checked in /proc/<pid>/status
SigBlk: 0000000000010000 -> 16 blocked
SigIgn: 0000000000000006 -> 1,2 ignored
SigCgt: 0000000000010000 -> 16 caught
SigBlk: 0000000000000000 -> none blocked
SigIgn: 0000000000000004 -> 2 ignored
SigCgt: 0000000180015ee2 -> 1,5,6,7,9,10,11,12,14,16,31,32 caught
I think I'm off by one in interpreting the above masks
I think so.
Post by Pablo Lopez Rios
(for instance I would expect signals 30 and 31 to be caught, not 31 and 32), but I'm already assuming that the least significant bit is "signal 0"; assuming it is "signal 1" would just worsen the values.
Anyway, why does mpirun bypass the traps I try to set and how do I stop it doing so?
I get:

$ cat /proc/31619/status
...
SigCgt: 000000004b813efb
...
$ trap '' int
$ cat /proc/31619/status
...
SigCgt: 000000004b813ef9
...
$ trap '' hup
$ cat /proc/31619/status
...
SigCgt: 000000004b813ef8

Looks like SIGINT(2) is bit 1 and likewise SIGHUP(1) is bit 0.

-- Reuti
Post by Pablo Lopez Rios
Thanks,
Pablo
Hi,
I'm having a bit of a problem with wrapping mpirun in a script. The script needs to run an MPI job in the background and tail -f the output. Pressing Ctrl+C should stop tail -f, and the MPI job should continue. However mpirun seems to detect the SIGINT that was meant for tail, and kills the job immediately. I've tried workarounds involving nohup, disown, trap, subshells (including calling the script from within itself), etc, to no avail.
( trap "" sigint; exec mpiexec ...)&
i.e. replace the subshell with changed interrupt handling with the mpiexec. Well, maybe mpiexec is adjusting it on its own again. This can be checked in /proc/<pid>/status
-- Reuti
$ ./ompi_bug.sh mpi
1
2
3
4
^C
$ ./ompi_bug.sh nompi
1
2
3
4
^C
$ cat output.*
1
2
3
4
mpirun: killing job...
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 1222 on node pablomme exited on signal 0 (Unknown signal 0).
--------------------------------------------------------------------------
mpirun: clean termination accomplished
1
2
3
4
5
6
7
8
9
10
Done
This convinces me that there is something strange with OpenMPI, since I expect no difference in signal handling when running a simple command with or without mpirun in the middle.
I've tried looking for options to change this behaviour, but I don't seem to find any. Is there one, preferably in the form of an environment variable? Or is this a bug?
I'm using OpenMPI v1.4.3 as distributed with Ubuntu 11.04, and also v1.2.8 as distributed with OpenSUSE 11.3.
Thanks,
Pablo
<ompi_bug.sh.gz>_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
Loading...