Oh my - I finally tracked it down. A simple one character error.
Thanks for your patience. Fix is https://github.com/open-mpi/ompi/pull/3773 <https://github.com/open-mpi/ompi/pull/3773> and will be ported to 2.x and 3.0
Ralph
> On Jun 27, 2017, at 11:17 AM, ***@open-mpi.org wrote:
>
> Ideally, we should be delivering the signal to all procs in the process group of each dum.sh. Looking at the code in the head of the 2.x branch, that does indeed appear to be what we are doing, assuming that we found setpgid in your system:
>
> static int odls_default_kill_local(pid_t pid, int signum)
> {
> pid_t pgrp;
>
> #if HAVE_SETPGID
> pgrp = getpgid(pid);
> if (-1 != pgrp) {
> /* target the lead process of the process
> * group so we ensure that the signal is
> * seen by all members of that group. This
> * ensures that the signal is seen by any
> * child processes our child may have
> * started
> */
> pid = pgrp;
> }
> #endif
> if (0 != kill(pid, signum)) {
> if (ESRCH != errno) {
> OPAL_OUTPUT_VERBOSE((2, orte_odls_base_framework.framework_output,
> "%s odls:default:SENT KILL %d TO PID %d GOT ERRNO %d",
> ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), signum, (int)pid, errno));
> return errno;
> }
> }
> OPAL_OUTPUT_VERBOSE((2, orte_odls_base_framework.framework_output,
> "%s odls:default:SENT KILL %d TO PID %d SUCCESS",
> ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), signum, (int)pid));
> return 0;
> }
>
> For some strange reason, it appears that you arenât see this? Iâm building the branch now and will see if I can reproduce it.
>
>> On Jun 27, 2017, at 10:58 AM, Ted Sussman <***@adina.com <mailto:***@adina.com>> wrote:
>>
>> Hello all,
>>
>> Thank you for your help and advice. It has taken me several days to understand what you were trying to tell me. I have now studied the problem in more detail, using a version of Open MPI 2.1.1 built with --enable-debug.
>>
>> -----
>>
>> Consider the following scenario in Open MPI 2.1.1:
>>
>> mpirun --> dum.sh --> aborttest.exe (rank 0)
>> --> dum.sh --> aborttest.exe (rank 1)
>>
>> aborttest.exe calls MPI_Bcast several times, then aborttest.exe rank 0 calls MPI_Abort.
>>
>> As far as I can figure out, this is what happens after aborttest.exe rank 0 calls MPI_Abort.
>>
>> 1) aborttest.exe for rank 0 exits. aborttest.exe for rank 1 is polling (waiting for message from MPI_Bcast).
>>
>> 2) mpirun (or maybe orted?) sends the signals SIGCONT, SIGTERM, SIGKILL to both dum.sh processes.
>>
>> 3) Both dum.sh processes are killed.
>>
>> 4) aborttest.exe for rank 1 continues to poll. mpirun never exits.
>>
>> ----
>>
>> Now suppose that dum.sh traps SIGCONT, and that the trap handler in dum.sh sends signal SIGINT to $PPID. This is what seems to happen after aborttest.exe rank 0 calls MPI_Abort:
>>
>> 1) aborttest.exe for rank 0 exits. aborttest.exe for rank 1 is polling (waiting for message from MPI_Bcast).
>>
>> 2) mpirun (or maybe orted?) sends the signals SIGCONT, SIGTERM, SIGKILL to both dum.sh processes.
>>
>> 3) dum.sh for rank 0 catches SIGCONT and sents SIGINT to its parent. dum.sh for rank 1 appears to be killed (I don't understand this, why doesn't dum.sh for rank 1 also catch SIGCONT?)
>>
>> 4) mpirun catches the SIGINT and kills aborttest.exe for rank 1, then mpirun exits.
>>
>> So adding the trap handler to dum.sh solves my problem.
>>
>> Is this the preferred solution to my problem? Or is there a more elegant solution?
>>
>> Sincerely,
>>
>> Ted Sussman
>>
>>
>>
>>
>>
>>
>>
>>
>> On 19 Jun 2017 at 11:19, ***@open-mpi.org <mailto:***@open-mpi.org> wrote:
>>
>> >
>> >
>> >
>> > On Jun 19, 2017, at 10:53 AM, Ted Sussman <***@adina.com <mailto:***@adina.com>> wrote:
>> >
>> > For what it's worth, the problem might be related to the following:
>> >
>> > mpirun: -np 2 ... dum.sh
>> > dum.sh: Invoke aborttest11.exe
>> > aborttest11.exe: Call MPI_Init, go into an infinite loop.
>> >
>> > Now when mpirun is running, send signals at the processes, as follows:
>> >
>> > 1) kill -9 (pid for one of the aborttest11.exe processes)
>> >
>> > The shell for this aborttest11.exe continues. Once this shell exits, then Open MPI sends
>> > signals to both shells, killing the other shell, but the remaining aborttest11.exe survives. The
>> > PPID for the remaining aborttest11.exe becomes 1.
>> >
>> > We have no visibility into your aborttest processes since we didnât launch them. So killing one of
>> > them is invisible to us. We can only see the shell scripts.
>> >
>> >
>> > 2) kill -9 (pid for one of the dum.sh processes).
>> >
>> > Open MPI sends signals to both of the shells. Both shells are killed off, but both
>> > aborttest11.exe processes survive, with PPID set to 1.
>> >
>> > This again is a question of how you handle things in your program. The _only_ process we can
>> > see is your script. If you kill a script that started a process, then your process is going to have to
>> > know how to detect the script has died and âsuicideâ - there is nothing we can do to help.
>> >
>> > Honestly, it sounds to me like the real problem here is that your .exe program isnât monitoring the
>> > shell above it to know when to âsuicideâ. I donât see how we can help you there.
>> >
>> >
>> >
>> > On 19 Jun 2017 at 10:10, ***@open-mpi.org <mailto:***@open-mpi.org> wrote:
>> >
>> > >
>> > > That is typical behavior when you throw something into âsleepâ - not much we can do
>> > about it, I
>> > > think.
>> > >
>> > > On Jun 19, 2017, at 9:58 AM, Ted Sussman <***@adina.com <mailto:***@adina.com> > wrote:
>> > >
>> > > Hello,
>> > >
>> > > I have rebuilt Open MPI 2.1.1 on the same computer, including --enable-debug.
>> > >
>> > > I have attached the abort test program aborttest10.tgz. This version sleeps for 5 sec before
>> > > calling MPI_ABORT, so that I can check the pids using ps.
>> > >
>> > > This is what happens (see run2.sh.out).
>> > >
>> > > Open MPI invokes two instances of dum.sh. Each instance of dum.sh invokes aborttest.exe.
>> > >
>> > > Pid Process
>> > > -------------------
>> > > 19565 dum.sh
>> > > 19566 dum.sh
>> > > 19567 aborttest10.exe
>> > > 19568 aborttest10.exe
>> > >
>> > > When MPI_ABORT is called, Open MPI sends SIGCONT, SIGTERM and SIGKILL to both
>> > > instances of dum.sh (pids 19565 and 19566).
>> > >
>> > > ps shows that both the shell processes vanish, and that one of the aborttest10.exe
>> > processes
>> > > vanishes. But the other aborttest10.exe remains and continues until it is finished sleeping.
>> > >
>> > > Hope that this information is useful.
>> > >
>> > > Sincerely,
>> > >
>> > > Ted Sussman
>> > >
>> > >
>> > >
>> > > On 19 Jun 2017 at 23:06, ***@rist.or.jp <mailto:***@rist.or.jp> wrote:
>> > >
>> > >
>> > > Ted,
>> > >
>> > > some traces are missing because you did not configure with --enable-debug
>> > > i am afraid you have to do it (and you probably want to install that debug version in an
>> > > other
>> > > location since its performances are not good for production) in order to get all the logs.
>> > >
>> > > Cheers,
>> > >
>> > > Gilles
>> > >
>> > > ----- Original Message -----
>> > > Hello Gilles,
>> > >
>> > > I retried my example, with the same results as I observed before. The process with rank
>> > > 1
>> > > does not get killed by MPI_ABORT.
>> > >
>> > > I have attached to this E-mail:
>> > >
>> > > config.log.bz2
>> > > ompi_info.bz2 (uses ompi_info -a)
>> > > aborttest09.tgz
>> > >
>> > > This testing is done on a computer running Linux 3.10.0. This is a different computer
>> > > than
>> > > the computer that I previously used for testing. You can confirm that I am using Open
>> > > MPI
>> > > 2.1.1.
>> > >
>> > > tar xvzf aborttest09.tgz
>> > > cd aborttest09
>> > > ./sh run2.sh
>> > >
>> > > run2.sh contains the command
>> > >
>> > > /opt/openmpi-2.1.1-GNU/bin/mpirun -np 2 -mca btl tcp,self --mca odls_base_verbose
>> > > 10
>> > > ./dum.sh
>> > >
>> > > The output from this run is in aborttest09/run2.sh.out.
>> > >
>> > > The output shows that the the "default" component is selected by odls.
>> > >
>> > > The only messages from odls are: odls: launch spawning child ... (two messages).
>> > > There
>> > > are no messages from odls with "kill" and I see no SENDING SIGCONT / SIGKILL
>> > > messages.
>> > >
>> > > I am not running from within any batch manager.
>> > >
>> > > Sincerely,
>> > >
>> > > Ted Sussman
>> > >
>> > > On 17 Jun 2017 at 16:02, ***@rist.or.jp <mailto:***@rist.or.jp> wrote:
>> > >
>> > > Ted,
>> > >
>> > > i do not observe the same behavior you describe with Open MPI 2.1.1
>> > >
>> > > # mpirun -np 2 -mca btl tcp,self --mca odls_base_verbose 5 ./abort.sh
>> > >
>> > > abort.sh 31361 launching abort
>> > > abort.sh 31362 launching abort
>> > > I am rank 0 with pid 31363
>> > > I am rank 1 with pid 31364
>> > > ------------------------------------------------------------------------
>> > > --
>> > > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
>> > > with errorcode 1.
>> > >
>> > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>> > > You may or may not see output from other processes, depending on
>> > > exactly when Open MPI kills them.
>> > > ------------------------------------------------------------------------
>> > > --
>> > > [linux:31356] [[18199,0],0] odls:kill_local_proc working on WILDCARD
>> > > [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
>> > > [[18199,1],0]
>> > > [linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],0]
>> > > [linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 31361
>> > > SUCCESS
>> > > [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
>> > > [[18199,1],1]
>> > > [linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],1]
>> > > [linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 31362
>> > > SUCCESS
>> > > [linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],0]
>> > > [linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 31361
>> > > SUCCESS
>> > > [linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],1]
>> > > [linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 31362
>> > > SUCCESS
>> > > [linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],0]
>> > > [linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 31361
>> > > SUCCESS
>> > > [linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],1]
>> > > [linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 31362
>> > > SUCCESS
>> > > [linux:31356] [[18199,0],0] odls:kill_local_proc working on WILDCARD
>> > > [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
>> > > [[18199,1],0]
>> > > [linux:31356] [[18199,0],0] odls:kill_local_proc child [[18199,1],0] is
>> > > not alive
>> > > [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
>> > > [[18199,1],1]
>> > > [linux:31356] [[18199,0],0] odls:kill_local_proc child [[18199,1],1] is
>> > > not alive
>> > >
>> > >
>> > > Open MPI did kill both shells, and they were indeed killed as evidenced
>> > > by ps
>> > >
>> > > #ps -fu gilles --forest
>> > > UID PID PPID C STIME TTY TIME CMD
>> > > gilles 1564 1561 0 15:39 ? 00:00:01 sshd: ***@pts/1
>> > > gilles 1565 1564 0 15:39 pts/1 00:00:00 \_ -bash
>> > > gilles 31356 1565 3 15:57 pts/1 00:00:00 \_ /home/gilles/
>> > > local/ompi-v2.x/bin/mpirun -np 2 -mca btl tcp,self --mca odls_base
>> > > gilles 31364 1 1 15:57 pts/1 00:00:00 ./abort
>> > >
>> > >
>> > > so trapping SIGTERM in your shell and manually killing the MPI task
>> > > should work
>> > > (as Jeff explained, as long as the shell script is fast enough to do
>> > > that between SIGTERM and SIGKILL)
>> > >
>> > >
>> > > if you observe a different behavior, please double check your Open MPI
>> > > version and post the outputs of the same commands.
>> > >
>> > > btw, are you running from a batch manager ? if yes, which one ?
>> > >
>> > > Cheers,
>> > >
>> > > Gilles
>> > >
>> > > ----- Original Message -----
>> > > Ted,
>> > >
>> > > if you
>> > >
>> > > mpirun --mca odls_base_verbose 10 ...
>> > >
>> > > you will see which processes get killed and how
>> > >
>> > > Best regards,
>> > >
>> > >
>> > > Gilles
>> > >
>> > > ----- Original Message -----
>> > > Hello Jeff,
>> > >
>> > > Thanks for your comments.
>> > >
>> > > I am not seeing behavior #4, on the two computers that I have
>> > > tested
>> > > on, using Open MPI
>> > > 2.1.1.
>> > >
>> > > I wonder if you can duplicate my results with the files that I have
>> > > uploaded.
>> > >
>> > > Regarding what is the "correct" behavior, I am willing to modify my
>> > > application to correspond
>> > > to Open MPI's behavior (whatever behavior the Open MPI
>> > > developers
>> > > decide is best) --
>> > > provided that Open MPI does in fact kill off both shells.
>> > >
>> > > So my highest priority now is to find out why Open MPI 2.1.1 does
>> > > not
>> > > kill off both shells on
>> > > my computer.
>> > >
>> > > Sincerely,
>> > >
>> > > Ted Sussman
>> > >
>> > > On 16 Jun 2017 at 16:35, Jeff Squyres (jsquyres) wrote:
>> > >
>> > > Ted --
>> > >
>> > > Sorry for jumping in late. Here's my $0.02...
>> > >
>> > > In the runtime, we can do 4 things:
>> > >
>> > > 1. Kill just the process that we forked.
>> > > 2. Kill just the process(es) that call back and identify
>> > > themselves
>> > > as MPI processes (we don't track this right now, but we could add that
>> > > functionality).
>> > > 3. Union of #1 and #2.
>> > > 4. Kill all processes (to include any intermediate processes
>> > > that
>> > > are not included in #1 and #2).
>> > >
>> > > In Open MPI 2.x, #4 is the intended behavior. There may be a
>> > > bug
>> > > or
>> > > two that needs to get fixed (e.g., in your last mail, I don't see
>> > > offhand why it waits until the MPI process finishes sleeping), but we
>> > > should be killing the process group, which -- unless any of the
>> > > descendant processes have explicitly left the process group -- should
>> > > hit the entire process tree.
>> > >
>> > > Sidenote: there's actually a way to be a bit more aggressive
>> > > and
>> > > do
>> > > a better job of ensuring that we kill *all* processes (via creative
>> > > use
>> > > of PR_SET_CHILD_SUBREAPER), but that's basically a future
>> > > enhancement
>> > > /
>> > > optimization.
>> > >
>> > > I think Gilles and Ralph proposed a good point to you: if you
>> > > want
>> > > to be sure to be able to do cleanup after an MPI process terminates (
>> > > normally or abnormally), you should trap signals in your intermediate
>> > > processes to catch what Open MPI's runtime throws and therefore know
>> > > that it is time to cleanup.
>> > >
>> > > Hypothetically, this should work in all versions of Open MPI...?
>> > >
>> > > I think Ralph made a pull request that adds an MCA param to
>> > > change
>> > > the default behavior from #4 to #1.
>> > >
>> > > Note, however, that there's a little time between when Open
>> > > MPI
>> > > sends the SIGTERM and the SIGKILL, so this solution could be racy. If
>> > > you find that you're running out of time to cleanup, we might be able
>> > > to
>> > > make the delay between the SIGTERM and SIGKILL be configurable
>> > > (e.g.,
>> > > via MCA param).
>> > >
>> > >
>> > >
>> > >
>> > > On Jun 16, 2017, at 10:08 AM, Ted Sussman
>> > > <***@adina.com <mailto:***@adina.com>
>> > >
>> > > wrote:
>> > >
>> > > Hello Gilles and Ralph,
>> > >
>> > > Thank you for your advice so far. I appreciate the time
>> > > that
>> > > you
>> > > have spent to educate me about the details of Open MPI.
>> > >
>> > > But I think that there is something fundamental that I
>> > > don't
>> > > understand. Consider Example 2 run with Open MPI 2.1.1.
>> > >
>> > > mpirun --> shell for process 0 --> executable for process
>> > > 0 -->
>> > > MPI calls, MPI_Abort
>> > > --> shell for process 1 --> executable for process 1 -->
>> > > MPI calls
>> > >
>> > > After the MPI_Abort is called, ps shows that both shells
>> > > are
>> > > running, and that the executable for process 1 is running (in this
>> > > case,
>> > > process 1 is sleeping). And mpirun does not exit until process 1 is
>> > > finished sleeping.
>> > >
>> > > I cannot reconcile this observed behavior with the
>> > > statement
>> > >
>> > > > 2.x: each process is put into its own process group
>> > > upon launch. When we issue a
>> > > > "kill", we issue it to the process group. Thus,
>> > > every
>> > > child proc of that child proc will
>> > > > receive it. IIRC, this was the intended behavior.
>> > >
>> > > I assume that, for my example, there are two process
>> > > groups.
>> > > The
>> > > process group for process 0 contains the shell for process 0 and the
>> > > executable for process 0; and the process group for process 1 contains
>> > > the shell for process 1 and the executable for process 1. So what
>> > > does
>> > > MPI_ABORT do? MPI_ABORT does not kill the process group for process
>> > > 0,
>> > >
>> > > since the shell for process 0 continues. And MPI_ABORT does not kill
>> > > the process group for process 1, since both the shell and executable
>> > > for
>> > > process 1 continue.
>> > >
>> > > If I hit Ctrl-C after MPI_Abort is called, I get the message
>> > >
>> > > mpirun: abort is already in progress.. hit ctrl-c again to
>> > > forcibly terminate
>> > >
>> > > but I don't need to hit Ctrl-C again because mpirun
>> > > immediately
>> > > exits.
>> > >
>> > > Can you shed some light on all of this?
>> > >
>> > > Sincerely,
>> > >
>> > > Ted Sussman
>> > >
>> > >
>> > > On 15 Jun 2017 at 14:44, ***@open-mpi.org <mailto:***@open-mpi.org> wrote:
>> > >
>> > >
>> > > You have to understand that we have no way of
>> > > knowing who is
>> > > making MPI calls - all we see is
>> > > the proc that we started, and we know someone of
>> > > that rank is
>> > > running (but we have no way of
>> > > knowing which of the procs you sub-spawned it is).
>> > >
>> > > So the behavior you are seeking only occurred in
>> > > some earlier
>> > > release by sheer accident. Nor will
>> > > you find it portable as there is no specification
>> > > directing
>> > > that
>> > > behavior.
>> > >
>> > > The behavior IÂŽve provided is to either deliver the
>> > > signal to
>> > > _
>> > > all_ child processes (including
>> > > grandchildren etc.), or _only_ the immediate child
>> > > of the
>> > > daemon.
>> > > It wonÂŽt do what you describe -
>> > > kill the mPI proc underneath the shell, but not the
>> > > shell
>> > > itself.
>> > >
>> > > What you can eventually do is use PMIx to ask the
>> > > runtime to
>> > > selectively deliver signals to
>> > > pid/procs for you. We donÂŽt have that capability
>> > > implemented
>> > > just yet, IÂŽm afraid.
>> > >
>> > > Meantime, when I get a chance, I can code an
>> > > option that will
>> > > record the pid of the subproc that
>> > > calls MPI_Init, and then letÂŽs you deliver signals to
>> > > just
>> > > that
>> > > proc. No promises as to when that will
>> > > be done.
>> > >
>> > >
>> > > On Jun 15, 2017, at 1:37 PM, Ted Sussman
>> > > <ted.sussman@
>> > > adina.
>> > > com> wrote:
>> > >
>> > > Hello Ralph,
>> > >
>> > > I am just an Open MPI end user, so I will need to
>> > > wait for
>> > > the next official release.
>> > >
>> > > mpirun --> shell for process 0 --> executable for
>> > > process
>> > > 0
>> > > --> MPI calls
>> > > --> shell for process 1 --> executable for process
>> > > 1
>> > > --> MPI calls
>> > > ...
>> > >
>> > > I guess the question is, should MPI_ABORT kill the
>> > > executables or the shells? I naively
>> > > thought, that, since it is the executables that make
>> > > the
>> > > MPI
>> > > calls, it is the executables that
>> > > should be aborted by the call to MPI_ABORT. Since
>> > > the
>> > > shells don't make MPI calls, the
>> > > shells should not be aborted.
>> > >
>> > > And users might have several layers of shells in
>> > > between
>> > > mpirun and the executable.
>> > >
>> > > So now I will look for the latest version of Open MPI
>> > > that
>> > > has the 1.4.3 behavior.
>> > >
>> > > Sincerely,
>> > >
>> > > Ted Sussman
>> > >
>> > > On 15 Jun 2017 at 12:31, ***@open-mpi.org <mailto:***@open-mpi.org> wrote:
>> > >
>> > > >
>> > > > Yeah, things jittered a little there as we debated
>> > > the "
>> > > right" behavior. Generally, when we
>> > > see that
>> > > > happening it means that a param is required, but
>> > > somehow
>> > > we never reached that point.
>> > > >
>> > > > See if https://github.com/open-mpi/ompi/pull/3704 <https://github.com/open-mpi/ompi/pull/3704>
>> > > helps
>> > > -
>> > > if so, I can schedule it for the next
>> > > 2.x
>> > > > release if the RMs agree to take it
>> > > >
>> > > > Ralph
>> > > >
>> > > > On Jun 15, 2017, at 12:20 PM, Ted Sussman <ted.
>> > > sussman
>> > > @adina.com > wrote:
>> > > >
>> > > > Thank you for your comments.
>> > > >
>> > > > Our application relies upon "dum.sh" to clean up
>> > > after
>> > > the process exits, either if the
>> > > process
>> > > > exits normally, or if the process exits abnormally
>> > > because of MPI_ABORT. If the process
>> > > > group is killed by MPI_ABORT, this clean up will not
>> > > be performed. If exec is used to launch
>> > > > the executable from dum.sh, then dum.sh is
>> > > terminated
>> > > by the exec, so dum.sh cannot
>> > > > perform any clean up.
>> > > >
>> > > > I suppose that other user applications might work
>> > > similarly, so it would be good to have an
>> > > > MCA parameter to control the behavior of
>> > > MPI_ABORT.
>> > > >
>> > > > We could rewrite our shell script that invokes
>> > > mpirun,
>> > > so that the cleanup that is now done
>> > > > by
>> > > > dum.sh is done by the invoking shell script after
>> > > mpirun exits. Perhaps this technique is the
>> > > > preferred way to clean up after mpirun is invoked.
>> > > >
>> > > > By the way, I have also tested with Open MPI
>> > > 1.10.7,
>> > > and Open MPI 1.10.7 has different
>> > > > behavior than either Open MPI 1.4.3 or Open MPI
>> > > 2.1.
>> > > 1.
>> > > In this explanation, it is important to
>> > > > know that the aborttest executable sleeps for 20
>> > > sec.
>> > > >
>> > > > When running example 2:
>> > > >
>> > > > 1.4.3: process 1 immediately aborts
>> > > > 1.10.7: process 1 doesn't abort and never stops.
>> > > > 2.1.1 process 1 doesn't abort, but stops after it is
>> > > finished sleeping
>> > > >
>> > > > Sincerely,
>> > > >
>> > > > Ted Sussman
>> > > >
>> > > > On 15 Jun 2017 at 9:18, ***@open-mpi.org <mailto:***@open-mpi.org> wrote:
>> > > >
>> > > > Here is how the system is working:
>> > > >
>> > > > Master: each process is put into its own process
>> > > group
>> > > upon launch. When we issue a
>> > > > "kill", however, we only issue it to the individual
>> > > process (instead of the process group
>> > > > that is headed by that child process). This is
>> > > probably a bug as I donÂŽt believe that is
>> > > > what we intended, but set that aside for now.
>> > > >
>> > > > 2.x: each process is put into its own process group
>> > > upon launch. When we issue a
>> > > > "kill", we issue it to the process group. Thus,
>> > > every
>> > > child proc of that child proc will
>> > > > receive it. IIRC, this was the intended behavior.
>> > > >
>> > > > It is rather trivial to make the change (it only
>> > > involves 3 lines of code), but IÂŽm not sure
>> > > > of what our intended behavior is supposed to be.
>> > > Once
>> > > we clarify that, it is also trivial
>> > > > to add another MCA param (you can never have too
>> > > many!)
>> > > to allow you to select the
>> > > > other behavior.
>> > > >
>> > > >
>> > > > On Jun 15, 2017, at 5:23 AM, Ted Sussman <ted.
>> > > sussman@
>> > > adina.com <http://adina.com/> > wrote:
>> > > >
>> > > > Hello Gilles,
>> > > >
>> > > > Thank you for your quick answer. I confirm that if
>> > > exec is used, both processes
>> > > > immediately
>> > > > abort.
>> > > >
>> > > > Now suppose that the line
>> > > >
>> > > > echo "After aborttest:
>> > > >
>> > > OMPI_COMM_WORLD_RANK="$OMPI_COMM_
>> > > WORLD_RANK
>> > > >
>> > > > is added to the end of dum.sh.
>> > > >
>> > > > If Example 2 is run with Open MPI 1.4.3, the output
>> > > is
>> > > >
>> > > > After aborttest: OMPI_COMM_WORLD_RANK=0
>> > > >
>> > > > which shows that the shell script for the process
>> > > with
>> > > rank 0 continues after the
>> > > > abort,
>> > > > but that the shell script for the process with rank
>> > > 1
>> > > does not continue after the
>> > > > abort.
>> > > >
>> > > > If Example 2 is run with Open MPI 2.1.1, with exec
>> > > used to invoke
>> > > > aborttest02.exe, then
>> > > > there is no such output, which shows that both shell
>> > > scripts do not continue after
>> > > > the abort.
>> > > >
>> > > > I prefer the Open MPI 1.4.3 behavior because our
>> > > original application depends
>> > > > upon the
>> > > > Open MPI 1.4.3 behavior. (Our original application
>> > > will also work if both
>> > > > executables are
>> > > > aborted, and if both shell scripts continue after
>> > > the
>> > > abort.)
>> > > >
>> > > > It might be too much to expect, but is there a way
>> > > to
>> > > recover the Open MPI 1.4.3
>> > > > behavior
>> > > > using Open MPI 2.1.1?
>> > > >
>> > > > Sincerely,
>> > > >
>> > > > Ted Sussman
>> > > >
>> > > >
>> > > > On 15 Jun 2017 at 9:50, Gilles Gouaillardet wrote:
>> > > >
>> > > > Ted,
>> > > >
>> > > >
>> > > > fwiw, the 'master' branch has the behavior you
>> > > expect.
>> > > >
>> > > >
>> > > > meanwhile, you can simple edit your 'dum.sh' script
>> > > and replace
>> > > >
>> > > > /home/buildadina/src/aborttest02/aborttest02.exe
>> > > >
>> > > > with
>> > > >
>> > > > exec /home/buildadina/src/aborttest02/aborttest02.
>> > > exe
>> > > >
>> > > >
>> > > > Cheers,
>> > > >
>> > > >
>> > > > Gilles
>> > > >
>> > > >
>> > > > On 6/15/2017 3:01 AM, Ted Sussman wrote:
>> > > > Hello,
>> > > >
>> > > > My question concerns MPI_ABORT, indirect
>> > > execution
>> > > of
>> > > > executables by mpirun and Open
>> > > > MPI 2.1.1. When mpirun runs executables directly,
>> > > MPI
>> > > _ABORT
>> > > > works as expected, but
>> > > > when mpirun runs executables indirectly,
>> > > MPI_ABORT
>> > > does not
>> > > > work as expected.
>> > > >
>> > > > If Open MPI 1.4.3 is used instead of Open MPI
>> > > 2.1.1,
>> > > MPI_ABORT
>> > > > works as expected in all
>> > > > cases.
>> > > >
>> > > > The examples given below have been simplified as
>> > > far
>> > > as possible
>> > > > to show the issues.
>> > > >
>> > > > ---
>> > > >
>> > > > Example 1
>> > > >
>> > > > Consider an MPI job run in the following way:
>> > > >
>> > > > mpirun ... -app addmpw1
>> > > >
>> > > > where the appfile addmpw1 lists two executables:
>> > > >
>> > > > -n 1 -host gulftown ... aborttest02.exe
>> > > > -n 1 -host gulftown ... aborttest02.exe
>> > > >
>> > > > The two executables are executed on the local node
>> > > gulftown.
>> > > > aborttest02 calls MPI_ABORT
>> > > > for rank 0, then sleeps.
>> > > >
>> > > > The above MPI job runs as expected. Both
>> > > processes
>> > > immediately
>> > > > abort when rank 0 calls
>> > > > MPI_ABORT.
>> > > >
>> > > > ---
>> > > >
>> > > > Example 2
>> > > >
>> > > > Now change the above example as follows:
>> > > >
>> > > > mpirun ... -app addmpw2
>> > > >
>> > > > where the appfile addmpw2 lists shell scripts:
>> > > >
>> > > > -n 1 -host gulftown ... dum.sh
>> > > > -n 1 -host gulftown ... dum.sh
>> > > >
>> > > > dum.sh invokes aborttest02.exe. So aborttest02.exe
>> > > is
>> > > executed
>> > > > indirectly by mpirun.
>> > > >
>> > > > In this case, the MPI job only aborts process 0 when
>> > > rank 0 calls
>> > > > MPI_ABORT. Process 1
>> > > > continues to run. This behavior is unexpected.
>> > > >
>> > > > ----
>> > > >
>> > > > I have attached all files to this E-mail. Since
>> > > there
>> > > are absolute
>> > > > pathnames in the files, to
>> > > > reproduce my findings, you will need to update the
>> > > pathnames in the
>> > > > appfiles and shell
>> > > > scripts. To run example 1,
>> > > >
>> > > > sh run1.sh
>> > > >
>> > > > and to run example 2,
>> > > >
>> > > > sh run2.sh
>> > > >
>> > > > ---
>> > > >
>> > > > I have tested these examples with Open MPI 1.4.3
>> > > and
>> > > 2.
>> > > 0.3. In
>> > > > Open MPI 1.4.3, both
>> > > > examples work as expected. Open MPI 2.0.3 has
>> > > the
>> > > same behavior
>> > > > as Open MPI 2.1.1.
>> > > >
>> > > > ---
>> > > >
>> > > > I would prefer that Open MPI 2.1.1 aborts both
>> > > processes, even
>> > > > when the executables are
>> > > > invoked indirectly by mpirun. If there is an MCA
>> > > setting that is
>> > > > needed to make Open MPI
>> > > > 2.1.1 abort both processes, please let me know.
>> > > >
>> > > >
>> > > > Sincerely,
>> > > >
>> > > > Theodore Sussman
>> > > >
>> > > >
>> > > > The following section of this message contains a
>> > > file
>> > > attachment
>> > > > prepared for transmission using the Internet MIME
>> > > message format.
>> > > > If you are using Pegasus Mail, or any other MIME-
>> > > compliant system,
>> > > > you should be able to save it or view it from within
>> > > your mailer.
>> > > > If you cannot, please ask your system administrator
>> > > for assistance.
>> > > >
>> > > > ---- File information -----------
>> > > > File: config.log.bz2
>> > > > Date: 14 Jun 2017, 13:35
>> > > > Size: 146548 bytes.
>> > > > Type: Binary
>> > > >
>> > > >
>> > > > The following section of this message contains a
>> > > file
>> > > attachment
>> > > > prepared for transmission using the Internet MIME
>> > > message format.
>> > > > If you are using Pegasus Mail, or any other MIME-
>> > > compliant system,
>> > > > you should be able to save it or view it from within
>> > > your mailer.
>> > > > If you cannot, please ask your system administrator
>> > > for assistance.
>> > > >
>> > > > ---- File information -----------
>> > > > File: ompi_info.bz2
>> > > > Date: 14 Jun 2017, 13:35
>> > > > Size: 24088 bytes.
>> > > > Type: Binary
>> > > >
>> > > >
>> > > > The following section of this message contains a
>> > > file
>> > > attachment
>> > > > prepared for transmission using the Internet MIME
>> > > message format.
>> > > > If you are using Pegasus Mail, or any other MIME-
>> > > compliant system,
>> > > > you should be able to save it or view it from within
>> > > your mailer.
>> > > > If you cannot, please ask your system administrator
>> > > for assistance.
>> > > >
>> > > > ---- File information -----------
>> > > > File: aborttest02.tgz
>> > > > Date: 14 Jun 2017, 13:52
>> > > > Size: 4285 bytes.
>> > > > Type: Binary
>> > > >
>> > > >
>> > > >
>> > > ________________________________________
>> > > _______
>> > > > users mailing list
>> > > > ***@lists.open-mpi.org <mailto:***@lists.open-mpi.org>
>> > > >
>> > > https://rfd.newmexicoconsortium.org/mailman/listin <https://rfd.newmexicoconsortium.org/mailman/listin>
>> > > fo/users
>> > >
>> > >
>> > > >
>> > > >
>> > > ________________________________________
>> > > _______
>> > > > users mailing list
>> > > > ***@lists.open-mpi.org <mailto:***@lists.open-mpi.org>
>> > > >
>> > > https://rfd.newmexicoconsortium.org/mailman/listin <https://rfd.newmexicoconsortium.org/mailman/listin>
>> > > fo/users
>> > >
>> > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > ________________________________________
>> > > _______
>> > > > users mailing list
>> > > > ***@lists.open-mpi.org <mailto:***@lists.open-mpi.org>
>> > > >
>> > > https://rfd.newmexicoconsortium.org/mailman/listin <https://rfd.newmexicoconsortium.org/mailman/listin>
>> > > fo/users
>> > >
>> > >
>> > > >
>> > > >
>> > > ________________________________________
>> > > _______
>> > > > users mailing list
>> > > > ***@lists.open-mpi.org <mailto:***@lists.open-mpi.org>
>> > > >
>> > > https://rfd.newmexicoconsortium.org/mailman/listin <https://rfd.newmexicoconsortium.org/mailman/listin>
>> > > fo/users
>> > >
>> > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > ________________________________________
>> > > _______
>> > > > users mailing list
>> > > > ***@lists.open-mpi.org <mailto:***@lists.open-mpi.org>
>> > > >
>> > > https://rfd.newmexicoconsortium.org/mailman/listin <https://rfd.newmexicoconsortium.org/mailman/listin>
>> > > fo/users
>> > >
>> > >
>> > > >
>> > >
>> > >
>> > > __________________________________________
>> > > _____
>> > > users mailing list
>> > > ***@lists.open-mpi.org <mailto:***@lists.open-mpi.org>
>> > >
>> > > https://rfd.newmexicoconsortium.org/mailman/listin <https://rfd.newmexicoconsortium.org/mailman/listin>
>> > > fo/users
>> > >
>> > >
>> > >
>> > > _____________________________________________
>> > > __
>> > > users mailing list
>> > > ***@lists.open-mpi.org <mailto:***@lists.open-mpi.org>
>> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/us <https://rfd.newmexicoconsortium.org/mailman/listinfo/us>
>> > > ers
>> > >
>> > >
>> > > --
>> > > Jeff Squyres
>> > > ***@cisco.com <mailto:***@cisco.com>
>> > >
>> > > _______________________________________________
>> > > users mailing list
>> > > ***@lists.open-mpi.org <mailto:***@lists.open-mpi.org>
>> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>> > >
>> > >
>> > >
>> > > _______________________________________________
>> > > users mailing list
>> > > ***@lists.open-mpi.org <mailto:***@lists.open-mpi.org>
>> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>> > >
>> > > _______________________________________________
>> > > users mailing list
>> > > ***@lists.open-mpi.org <mailto:***@lists.open-mpi.org>
>> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>> > >
>> > > _______________________________________________
>> > > users mailing list
>> > > ***@lists.open-mpi.org <mailto:***@lists.open-mpi.org>
>> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>> > >
>> > >
>> > >
>> > >
>> > > The following section of this message contains a file attachment
>> > > prepared for transmission using the Internet MIME message format.
>> > > If you are using Pegasus Mail, or any other MIME-compliant system,
>> > > you should be able to save it or view it from within your mailer.
>> > > If you cannot, please ask your system administrator for assistance.
>> > >
>> > > ---- File information -----------
>> > > File: aborttest10.tgz
>> > > Date: 19 Jun 2017, 12:42
>> > > Size: 4740 bytes.
>> > > Type: Binary
>> > > <aborttest10.tgz>_______________________________________________
>> > > users mailing list
>> > > ***@lists.open-mpi.org <mailto:***@lists.open-mpi.org>
>> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>> > >
>> >
>> >
>> > The following section of this message contains a file attachment
>> > prepared for transmission using the Internet MIME message format.
>> > If you are using Pegasus Mail, or any other MIME-compliant system,
>> > you should be able to save it or view it from within your mailer.
>> > If you cannot, please ask your system administrator for assistance.
>> >
>> > ---- File information -----------
>> > File: aborttest11.tgz
>> > Date: 19 Jun 2017, 13:48
>> > Size: 3800 bytes.
>> > Type: Unknown
>> > <aborttest11.tgz> _______________________________________________
>> > users mailing list
>> > ***@lists.open-mpi.org <mailto:***@lists.open-mpi.org>
>> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>> >
>>
>>
>> _______________________________________________
>> users mailing list
>> ***@lists.open-mpi.org <mailto:***@lists.open-mpi.org>
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users