Discussion:
[OMPI users] MPI_ABORT, indirect execution of executables by mpirun, Open MPI 2.1.1
Ted Sussman
2017-06-14 18:01:35 UTC
Permalink
Hello,

My question concerns MPI_ABORT, indirect execution of executables by mpirun and Open
MPI 2.1.1. When mpirun runs executables directly, MPI_ABORT works as expected, but
when mpirun runs executables indirectly, MPI_ABORT does not work as expected.

If Open MPI 1.4.3 is used instead of Open MPI 2.1.1, MPI_ABORT works as expected in all
cases.

The examples given below have been simplified as far as possible to show the issues.

---

Example 1

Consider an MPI job run in the following way:

mpirun ... -app addmpw1

where the appfile addmpw1 lists two executables:

-n 1 -host gulftown ... aborttest02.exe
-n 1 -host gulftown ... aborttest02.exe

The two executables are executed on the local node gulftown. aborttest02 calls MPI_ABORT
for rank 0, then sleeps.

The above MPI job runs as expected. Both processes immediately abort when rank 0 calls
MPI_ABORT.

---

Example 2

Now change the above example as follows:

mpirun ... -app addmpw2

where the appfile addmpw2 lists shell scripts:

-n 1 -host gulftown ... dum.sh
-n 1 -host gulftown ... dum.sh

dum.sh invokes aborttest02.exe. So aborttest02.exe is executed indirectly by mpirun.

In this case, the MPI job only aborts process 0 when rank 0 calls MPI_ABORT. Process 1
continues to run. This behavior is unexpected.

----

I have attached all files to this E-mail. Since there are absolute pathnames in the files, to
reproduce my findings, you will need to update the pathnames in the appfiles and shell
scripts. To run example 1,

sh run1.sh

and to run example 2,

sh run2.sh

---

I have tested these examples with Open MPI 1.4.3 and 2.0.3. In Open MPI 1.4.3, both
examples work as expected. Open MPI 2.0.3 has the same behavior as Open MPI 2.1.1.

---

I would prefer that Open MPI 2.1.1 aborts both processes, even when the executables are
invoked indirectly by mpirun. If there is an MCA setting that is needed to make Open MPI
2.1.1 abort both processes, please let me know.


Sincerely,

Theodore Sussman
Gilles Gouaillardet
2017-06-15 00:50:22 UTC
Permalink
Ted,


fwiw, the 'master' branch has the behavior you expect.


meanwhile, you can simple edit your 'dum.sh' script and replace

/home/buildadina/src/aborttest02/aborttest02.exe

with

exec /home/buildadina/src/aborttest02/aborttest02.exe


Cheers,


Gilles


On 6/15/2017 3:01 AM, Ted Sussman wrote:
> Hello,
>
> My question concerns MPI_ABORT, indirect execution of executables by mpirun and Open
> MPI 2.1.1. When mpirun runs executables directly, MPI_ABORT works as expected, but
> when mpirun runs executables indirectly, MPI_ABORT does not work as expected.
>
> If Open MPI 1.4.3 is used instead of Open MPI 2.1.1, MPI_ABORT works as expected in all
> cases.
>
> The examples given below have been simplified as far as possible to show the issues.
>
> ---
>
> Example 1
>
> Consider an MPI job run in the following way:
>
> mpirun ... -app addmpw1
>
> where the appfile addmpw1 lists two executables:
>
> -n 1 -host gulftown ... aborttest02.exe
> -n 1 -host gulftown ... aborttest02.exe
>
> The two executables are executed on the local node gulftown. aborttest02 calls MPI_ABORT
> for rank 0, then sleeps.
>
> The above MPI job runs as expected. Both processes immediately abort when rank 0 calls
> MPI_ABORT.
>
> ---
>
> Example 2
>
> Now change the above example as follows:
>
> mpirun ... -app addmpw2
>
> where the appfile addmpw2 lists shell scripts:
>
> -n 1 -host gulftown ... dum.sh
> -n 1 -host gulftown ... dum.sh
>
> dum.sh invokes aborttest02.exe. So aborttest02.exe is executed indirectly by mpirun.
>
> In this case, the MPI job only aborts process 0 when rank 0 calls MPI_ABORT. Process 1
> continues to run. This behavior is unexpected.
>
> ----
>
> I have attached all files to this E-mail. Since there are absolute pathnames in the files, to
> reproduce my findings, you will need to update the pathnames in the appfiles and shell
> scripts. To run example 1,
>
> sh run1.sh
>
> and to run example 2,
>
> sh run2.sh
>
> ---
>
> I have tested these examples with Open MPI 1.4.3 and 2.0.3. In Open MPI 1.4.3, both
> examples work as expected. Open MPI 2.0.3 has the same behavior as Open MPI 2.1.1.
>
> ---
>
> I would prefer that Open MPI 2.1.1 aborts both processes, even when the executables are
> invoked indirectly by mpirun. If there is an MCA setting that is needed to make Open MPI
> 2.1.1 abort both processes, please let me know.
>
>
> Sincerely,
>
> Theodore Sussman
>
>
> The following section of this message contains a file attachment
> prepared for transmission using the Internet MIME message format.
> If you are using Pegasus Mail, or any other MIME-compliant system,
> you should be able to save it or view it from within your mailer.
> If you cannot, please ask your system administrator for assistance.
>
> ---- File information -----------
> File: config.log.bz2
> Date: 14 Jun 2017, 13:35
> Size: 146548 bytes.
> Type: Binary
>
>
> The following section of this message contains a file attachment
> prepared for transmission using the Internet MIME message format.
> If you are using Pegasus Mail, or any other MIME-compliant system,
> you should be able to save it or view it from within your mailer.
> If you cannot, please ask your system administrator for assistance.
>
> ---- File information -----------
> File: ompi_info.bz2
> Date: 14 Jun 2017, 13:35
> Size: 24088 bytes.
> Type: Binary
>
>
> The following section of this message contains a file attachment
> prepared for transmission using the Internet MIME message format.
> If you are using Pegasus Mail, or any other MIME-compliant system,
> you should be able to save it or view it from within your mailer.
> If you cannot, please ask your system administrator for assistance.
>
> ---- File information -----------
> File: aborttest02.tgz
> Date: 14 Jun 2017, 13:52
> Size: 4285 bytes.
> Type: Binary
>
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Ted Sussman
2017-06-15 12:23:49 UTC
Permalink
Hello Gilles,

Thank you for your quick answer. I confirm that if exec is used, both processes immediately
abort.

Now suppose that the line

echo "After aborttest: OMPI_COMM_WORLD_RANK="$OMPI_COMM_WORLD_RANK

is added to the end of dum.sh.

If Example 2 is run with Open MPI 1.4.3, the output is

After aborttest: OMPI_COMM_WORLD_RANK=0

which shows that the shell script for the process with rank 0 continues after the abort,
but that the shell script for the process with rank 1 does not continue after the abort.

If Example 2 is run with Open MPI 2.1.1, with exec used to invoke aborttest02.exe, then
there is no such output, which shows that both shell scripts do not continue after the abort.

I prefer the Open MPI 1.4.3 behavior because our original application depends upon the
Open MPI 1.4.3 behavior. (Our original application will also work if both executables are
aborted, and if both shell scripts continue after the abort.)

It might be too much to expect, but is there a way to recover the Open MPI 1.4.3 behavior
using Open MPI 2.1.1?

Sincerely,

Ted Sussman


On 15 Jun 2017 at 9:50, Gilles Gouaillardet wrote:

> Ted,
>
>
> fwiw, the 'master' branch has the behavior you expect.
>
>
> meanwhile, you can simple edit your 'dum.sh' script and replace
>
> /home/buildadina/src/aborttest02/aborttest02.exe
>
> with
>
> exec /home/buildadina/src/aborttest02/aborttest02.exe
>
>
> Cheers,
>
>
> Gilles
>
>
> On 6/15/2017 3:01 AM, Ted Sussman wrote:
> > Hello,
> >
> > My question concerns MPI_ABORT, indirect execution of executables by mpirun and Open
> > MPI 2.1.1. When mpirun runs executables directly, MPI_ABORT works as expected, but
> > when mpirun runs executables indirectly, MPI_ABORT does not work as expected.
> >
> > If Open MPI 1.4.3 is used instead of Open MPI 2.1.1, MPI_ABORT works as expected in all
> > cases.
> >
> > The examples given below have been simplified as far as possible to show the issues.
> >
> > ---
> >
> > Example 1
> >
> > Consider an MPI job run in the following way:
> >
> > mpirun ... -app addmpw1
> >
> > where the appfile addmpw1 lists two executables:
> >
> > -n 1 -host gulftown ... aborttest02.exe
> > -n 1 -host gulftown ... aborttest02.exe
> >
> > The two executables are executed on the local node gulftown. aborttest02 calls MPI_ABORT
> > for rank 0, then sleeps.
> >
> > The above MPI job runs as expected. Both processes immediately abort when rank 0 calls
> > MPI_ABORT.
> >
> > ---
> >
> > Example 2
> >
> > Now change the above example as follows:
> >
> > mpirun ... -app addmpw2
> >
> > where the appfile addmpw2 lists shell scripts:
> >
> > -n 1 -host gulftown ... dum.sh
> > -n 1 -host gulftown ... dum.sh
> >
> > dum.sh invokes aborttest02.exe. So aborttest02.exe is executed indirectly by mpirun.
> >
> > In this case, the MPI job only aborts process 0 when rank 0 calls MPI_ABORT. Process 1
> > continues to run. This behavior is unexpected.
> >
> > ----
> >
> > I have attached all files to this E-mail. Since there are absolute pathnames in the files, to
> > reproduce my findings, you will need to update the pathnames in the appfiles and shell
> > scripts. To run example 1,
> >
> > sh run1.sh
> >
> > and to run example 2,
> >
> > sh run2.sh
> >
> > ---
> >
> > I have tested these examples with Open MPI 1.4.3 and 2.0.3. In Open MPI 1.4.3, both
> > examples work as expected. Open MPI 2.0.3 has the same behavior as Open MPI 2.1.1.
> >
> > ---
> >
> > I would prefer that Open MPI 2.1.1 aborts both processes, even when the executables are
> > invoked indirectly by mpirun. If there is an MCA setting that is needed to make Open MPI
> > 2.1.1 abort both processes, please let me know.
> >
> >
> > Sincerely,
> >
> > Theodore Sussman
> >
> >
> > The following section of this message contains a file attachment
> > prepared for transmission using the Internet MIME message format.
> > If you are using Pegasus Mail, or any other MIME-compliant system,
> > you should be able to save it or view it from within your mailer.
> > If you cannot, please ask your system administrator for assistance.
> >
> > ---- File information -----------
> > File: config.log.bz2
> > Date: 14 Jun 2017, 13:35
> > Size: 146548 bytes.
> > Type: Binary
> >
> >
> > The following section of this message contains a file attachment
> > prepared for transmission using the Internet MIME message format.
> > If you are using Pegasus Mail, or any other MIME-compliant system,
> > you should be able to save it or view it from within your mailer.
> > If you cannot, please ask your system administrator for assistance.
> >
> > ---- File information -----------
> > File: ompi_info.bz2
> > Date: 14 Jun 2017, 13:35
> > Size: 24088 bytes.
> > Type: Binary
> >
> >
> > The following section of this message contains a file attachment
> > prepared for transmission using the Internet MIME message format.
> > If you are using Pegasus Mail, or any other MIME-compliant system,
> > you should be able to save it or view it from within your mailer.
> > If you cannot, please ask your system administrator for assistance.
> >
> > ---- File information -----------
> > File: aborttest02.tgz
> > Date: 14 Jun 2017, 13:52
> > Size: 4285 bytes.
> > Type: Binary
> >
> >
> > _______________________________________________
> > users mailing list
> > ***@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
r***@open-mpi.org
2017-06-15 16:18:55 UTC
Permalink
Here is how the system is working:

Master: each process is put into its own process group upon launch. When we issue a “kill”, however, we only issue it to the individual process (instead of the process group that is headed by that child process). This is probably a bug as I don’t believe that is what we intended, but set that aside for now.

2.x: each process is put into its own process group upon launch. When we issue a “kill”, we issue it to the process group. Thus, every child proc of that child proc will receive it. IIRC, this was the intended behavior.

It is rather trivial to make the change (it only involves 3 lines of code), but I’m not sure of what our intended behavior is supposed to be. Once we clarify that, it is also trivial to add another MCA param (you can never have too many!) to allow you to select the other behavior.


> On Jun 15, 2017, at 5:23 AM, Ted Sussman <***@adina.com> wrote:
>
> Hello Gilles,
>
> Thank you for your quick answer. I confirm that if exec is used, both processes immediately
> abort.
>
> Now suppose that the line
>
> echo "After aborttest: OMPI_COMM_WORLD_RANK="$OMPI_COMM_WORLD_RANK
>
> is added to the end of dum.sh.
>
> If Example 2 is run with Open MPI 1.4.3, the output is
>
> After aborttest: OMPI_COMM_WORLD_RANK=0
>
> which shows that the shell script for the process with rank 0 continues after the abort,
> but that the shell script for the process with rank 1 does not continue after the abort.
>
> If Example 2 is run with Open MPI 2.1.1, with exec used to invoke aborttest02.exe, then
> there is no such output, which shows that both shell scripts do not continue after the abort.
>
> I prefer the Open MPI 1.4.3 behavior because our original application depends upon the
> Open MPI 1.4.3 behavior. (Our original application will also work if both executables are
> aborted, and if both shell scripts continue after the abort.)
>
> It might be too much to expect, but is there a way to recover the Open MPI 1.4.3 behavior
> using Open MPI 2.1.1?
>
> Sincerely,
>
> Ted Sussman
>
>
> On 15 Jun 2017 at 9:50, Gilles Gouaillardet wrote:
>
>> Ted,
>>
>>
>> fwiw, the 'master' branch has the behavior you expect.
>>
>>
>> meanwhile, you can simple edit your 'dum.sh' script and replace
>>
>> /home/buildadina/src/aborttest02/aborttest02.exe
>>
>> with
>>
>> exec /home/buildadina/src/aborttest02/aborttest02.exe
>>
>>
>> Cheers,
>>
>>
>> Gilles
>>
>>
>> On 6/15/2017 3:01 AM, Ted Sussman wrote:
>>> Hello,
>>>
>>> My question concerns MPI_ABORT, indirect execution of executables by mpirun and Open
>>> MPI 2.1.1. When mpirun runs executables directly, MPI_ABORT works as expected, but
>>> when mpirun runs executables indirectly, MPI_ABORT does not work as expected.
>>>
>>> If Open MPI 1.4.3 is used instead of Open MPI 2.1.1, MPI_ABORT works as expected in all
>>> cases.
>>>
>>> The examples given below have been simplified as far as possible to show the issues.
>>>
>>> ---
>>>
>>> Example 1
>>>
>>> Consider an MPI job run in the following way:
>>>
>>> mpirun ... -app addmpw1
>>>
>>> where the appfile addmpw1 lists two executables:
>>>
>>> -n 1 -host gulftown ... aborttest02.exe
>>> -n 1 -host gulftown ... aborttest02.exe
>>>
>>> The two executables are executed on the local node gulftown. aborttest02 calls MPI_ABORT
>>> for rank 0, then sleeps.
>>>
>>> The above MPI job runs as expected. Both processes immediately abort when rank 0 calls
>>> MPI_ABORT.
>>>
>>> ---
>>>
>>> Example 2
>>>
>>> Now change the above example as follows:
>>>
>>> mpirun ... -app addmpw2
>>>
>>> where the appfile addmpw2 lists shell scripts:
>>>
>>> -n 1 -host gulftown ... dum.sh
>>> -n 1 -host gulftown ... dum.sh
>>>
>>> dum.sh invokes aborttest02.exe. So aborttest02.exe is executed indirectly by mpirun.
>>>
>>> In this case, the MPI job only aborts process 0 when rank 0 calls MPI_ABORT. Process 1
>>> continues to run. This behavior is unexpected.
>>>
>>> ----
>>>
>>> I have attached all files to this E-mail. Since there are absolute pathnames in the files, to
>>> reproduce my findings, you will need to update the pathnames in the appfiles and shell
>>> scripts. To run example 1,
>>>
>>> sh run1.sh
>>>
>>> and to run example 2,
>>>
>>> sh run2.sh
>>>
>>> ---
>>>
>>> I have tested these examples with Open MPI 1.4.3 and 2.0.3. In Open MPI 1.4.3, both
>>> examples work as expected. Open MPI 2.0.3 has the same behavior as Open MPI 2.1.1.
>>>
>>> ---
>>>
>>> I would prefer that Open MPI 2.1.1 aborts both processes, even when the executables are
>>> invoked indirectly by mpirun. If there is an MCA setting that is needed to make Open MPI
>>> 2.1.1 abort both processes, please let me know.
>>>
>>>
>>> Sincerely,
>>>
>>> Theodore Sussman
>>>
>>>
>>> The following section of this message contains a file attachment
>>> prepared for transmission using the Internet MIME message format.
>>> If you are using Pegasus Mail, or any other MIME-compliant system,
>>> you should be able to save it or view it from within your mailer.
>>> If you cannot, please ask your system administrator for assistance.
>>>
>>> ---- File information -----------
>>> File: config.log.bz2
>>> Date: 14 Jun 2017, 13:35
>>> Size: 146548 bytes.
>>> Type: Binary
>>>
>>>
>>> The following section of this message contains a file attachment
>>> prepared for transmission using the Internet MIME message format.
>>> If you are using Pegasus Mail, or any other MIME-compliant system,
>>> you should be able to save it or view it from within your mailer.
>>> If you cannot, please ask your system administrator for assistance.
>>>
>>> ---- File information -----------
>>> File: ompi_info.bz2
>>> Date: 14 Jun 2017, 13:35
>>> Size: 24088 bytes.
>>> Type: Binary
>>>
>>>
>>> The following section of this message contains a file attachment
>>> prepared for transmission using the Internet MIME message format.
>>> If you are using Pegasus Mail, or any other MIME-compliant system,
>>> you should be able to save it or view it from within your mailer.
>>> If you cannot, please ask your system administrator for assistance.
>>>
>>> ---- File information -----------
>>> File: aborttest02.tgz
>>> Date: 14 Jun 2017, 13:52
>>> Size: 4285 bytes.
>>> Type: Binary
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> ***@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
>> _______________________________________________
>> users mailing list
>> ***@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Ted Sussman
2017-06-15 19:20:52 UTC
Permalink
Thank you for your comments.

Our application relies upon "dum.sh" to clean up after the process exits, either if the process
exits normally, or if the process exits abnormally because of MPI_ABORT. If the process
group is killed by MPI_ABORT, this clean up will not be performed. If exec is used to launch
the executable from dum.sh, then dum.sh is terminated by the exec, so dum.sh cannot
perform any clean up.

I suppose that other user applications might work similarly, so it would be good to have an
MCA parameter to control the behavior of MPI_ABORT.

We could rewrite our shell script that invokes mpirun, so that the cleanup that is now done by
dum.sh is done by the invoking shell script after mpirun exits. Perhaps this technique is the
preferred way to clean up after mpirun is invoked.

By the way, I have also tested with Open MPI 1.10.7, and Open MPI 1.10.7 has different
behavior than either Open MPI 1.4.3 or Open MPI 2.1.1. In this explanation, it is important to
know that the aborttest executable sleeps for 20 sec.

When running example 2:

1.4.3: process 1 immediately aborts
1.10.7: process 1 doesn't abort and never stops.
2.1.1 process 1 doesn't abort, but stops after it is finished sleeping

Sincerely,

Ted Sussman

On 15 Jun 2017 at 9:18, ***@open-mpi.org wrote:

> Here is how the system is working:
>
> Master: each process is put into its own process group upon launch. When we issue a "kill", however, we only issue it to the individual process (instead of the process group that is headed by that child process). This is probably a bug as I don´t believe that is what we intended, but set that aside for now.
>
> 2.x: each process is put into its own process group upon launch. When we issue a "kill", we issue it to the process group. Thus, every child proc of that child proc will receive it. IIRC, this was the intended behavior.
>
> It is rather trivial to make the change (it only involves 3 lines of code), but I´m not sure of what our intended behavior is supposed to be. Once we clarify that, it is also trivial to add another MCA param (you can never have too many!) to allow you to select the other behavior.
>
>
> > On Jun 15, 2017, at 5:23 AM, Ted Sussman <***@adina.com> wrote:
> >
> > Hello Gilles,
> >
> > Thank you for your quick answer. I confirm that if exec is used, both processes immediately
> > abort.
> >
> > Now suppose that the line
> >
> > echo "After aborttest: OMPI_COMM_WORLD_RANK="$OMPI_COMM_WORLD_RANK
> >
> > is added to the end of dum.sh.
> >
> > If Example 2 is run with Open MPI 1.4.3, the output is
> >
> > After aborttest: OMPI_COMM_WORLD_RANK=0
> >
> > which shows that the shell script for the process with rank 0 continues after the abort,
> > but that the shell script for the process with rank 1 does not continue after the abort.
> >
> > If Example 2 is run with Open MPI 2.1.1, with exec used to invoke aborttest02.exe, then
> > there is no such output, which shows that both shell scripts do not continue after the abort.
> >
> > I prefer the Open MPI 1.4.3 behavior because our original application depends upon the
> > Open MPI 1.4.3 behavior. (Our original application will also work if both executables are
> > aborted, and if both shell scripts continue after the abort.)
> >
> > It might be too much to expect, but is there a way to recover the Open MPI 1.4.3 behavior
> > using Open MPI 2.1.1?
> >
> > Sincerely,
> >
> > Ted Sussman
> >
> >
> > On 15 Jun 2017 at 9:50, Gilles Gouaillardet wrote:
> >
> >> Ted,
> >>
> >>
> >> fwiw, the 'master' branch has the behavior you expect.
> >>
> >>
> >> meanwhile, you can simple edit your 'dum.sh' script and replace
> >>
> >> /home/buildadina/src/aborttest02/aborttest02.exe
> >>
> >> with
> >>
> >> exec /home/buildadina/src/aborttest02/aborttest02.exe
> >>
> >>
> >> Cheers,
> >>
> >>
> >> Gilles
> >>
> >>
> >> On 6/15/2017 3:01 AM, Ted Sussman wrote:
> >>> Hello,
> >>>
> >>> My question concerns MPI_ABORT, indirect execution of executables by mpirun and Open
> >>> MPI 2.1.1. When mpirun runs executables directly, MPI_ABORT works as expected, but
> >>> when mpirun runs executables indirectly, MPI_ABORT does not work as expected.
> >>>
> >>> If Open MPI 1.4.3 is used instead of Open MPI 2.1.1, MPI_ABORT works as expected in all
> >>> cases.
> >>>
> >>> The examples given below have been simplified as far as possible to show the issues.
> >>>
> >>> ---
> >>>
> >>> Example 1
> >>>
> >>> Consider an MPI job run in the following way:
> >>>
> >>> mpirun ... -app addmpw1
> >>>
> >>> where the appfile addmpw1 lists two executables:
> >>>
> >>> -n 1 -host gulftown ... aborttest02.exe
> >>> -n 1 -host gulftown ... aborttest02.exe
> >>>
> >>> The two executables are executed on the local node gulftown. aborttest02 calls MPI_ABORT
> >>> for rank 0, then sleeps.
> >>>
> >>> The above MPI job runs as expected. Both processes immediately abort when rank 0 calls
> >>> MPI_ABORT.
> >>>
> >>> ---
> >>>
> >>> Example 2
> >>>
> >>> Now change the above example as follows:
> >>>
> >>> mpirun ... -app addmpw2
> >>>
> >>> where the appfile addmpw2 lists shell scripts:
> >>>
> >>> -n 1 -host gulftown ... dum.sh
> >>> -n 1 -host gulftown ... dum.sh
> >>>
> >>> dum.sh invokes aborttest02.exe. So aborttest02.exe is executed indirectly by mpirun.
> >>>
> >>> In this case, the MPI job only aborts process 0 when rank 0 calls MPI_ABORT. Process 1
> >>> continues to run. This behavior is unexpected.
> >>>
> >>> ----
> >>>
> >>> I have attached all files to this E-mail. Since there are absolute pathnames in the files, to
> >>> reproduce my findings, you will need to update the pathnames in the appfiles and shell
> >>> scripts. To run example 1,
> >>>
> >>> sh run1.sh
> >>>
> >>> and to run example 2,
> >>>
> >>> sh run2.sh
> >>>
> >>> ---
> >>>
> >>> I have tested these examples with Open MPI 1.4.3 and 2.0.3. In Open MPI 1.4.3, both
> >>> examples work as expected. Open MPI 2.0.3 has the same behavior as Open MPI 2.1.1.
> >>>
> >>> ---
> >>>
> >>> I would prefer that Open MPI 2.1.1 aborts both processes, even when the executables are
> >>> invoked indirectly by mpirun. If there is an MCA setting that is needed to make Open MPI
> >>> 2.1.1 abort both processes, please let me know.
> >>>
> >>>
> >>> Sincerely,
> >>>
> >>> Theodore Sussman
> >>>
> >>>
> >>> The following section of this message contains a file attachment
> >>> prepared for transmission using the Internet MIME message format.
> >>> If you are using Pegasus Mail, or any other MIME-compliant system,
> >>> you should be able to save it or view it from within your mailer.
> >>> If you cannot, please ask your system administrator for assistance.
> >>>
> >>> ---- File information -----------
> >>> File: config.log.bz2
> >>> Date: 14 Jun 2017, 13:35
> >>> Size: 146548 bytes.
> >>> Type: Binary
> >>>
> >>>
> >>> The following section of this message contains a file attachment
> >>> prepared for transmission using the Internet MIME message format.
> >>> If you are using Pegasus Mail, or any other MIME-compliant system,
> >>> you should be able to save it or view it from within your mailer.
> >>> If you cannot, please ask your system administrator for assistance.
> >>>
> >>> ---- File information -----------
> >>> File: ompi_info.bz2
> >>> Date: 14 Jun 2017, 13:35
> >>> Size: 24088 bytes.
> >>> Type: Binary
> >>>
> >>>
> >>> The following section of this message contains a file attachment
> >>> prepared for transmission using the Internet MIME message format.
> >>> If you are using Pegasus Mail, or any other MIME-compliant system,
> >>> you should be able to save it or view it from within your mailer.
> >>> If you cannot, please ask your system administrator for assistance.
> >>>
> >>> ---- File information -----------
> >>> File: aborttest02.tgz
> >>> Date: 14 Jun 2017, 13:52
> >>> Size: 4285 bytes.
> >>> Type: Binary
> >>>
> >>>
> >>> _______________________________________________
> >>> users mailing list
> >>> ***@lists.open-mpi.org
> >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >>
> >> _______________________________________________
> >> users mailing list
> >> ***@lists.open-mpi.org
> >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > ***@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
r***@open-mpi.org
2017-06-15 19:31:51 UTC
Permalink
Yeah, things jittered a little there as we debated the “right” behavior. Generally, when we see that happening it means that a param is required, but somehow we never reached that point.

See if https://github.com/open-mpi/ompi/pull/3704 <https://github.com/open-mpi/ompi/pull/3704> helps - if so, I can schedule it for the next 2.x release if the RMs agree to take it

Ralph

> On Jun 15, 2017, at 12:20 PM, Ted Sussman <***@adina.com> wrote:
>
> Thank you for your comments.
>
> Our application relies upon "dum.sh" to clean up after the process exits, either if the process
> exits normally, or if the process exits abnormally because of MPI_ABORT. If the process
> group is killed by MPI_ABORT, this clean up will not be performed. If exec is used to launch
> the executable from dum.sh, then dum.sh is terminated by the exec, so dum.sh cannot
> perform any clean up.
>
> I suppose that other user applications might work similarly, so it would be good to have an
> MCA parameter to control the behavior of MPI_ABORT.
>
> We could rewrite our shell script that invokes mpirun, so that the cleanup that is now done by
> dum.sh is done by the invoking shell script after mpirun exits. Perhaps this technique is the
> preferred way to clean up after mpirun is invoked.
>
> By the way, I have also tested with Open MPI 1.10.7, and Open MPI 1.10.7 has different
> behavior than either Open MPI 1.4.3 or Open MPI 2.1.1. In this explanation, it is important to
> know that the aborttest executable sleeps for 20 sec.
>
> When running example 2:
>
> 1.4.3: process 1 immediately aborts
> 1.10.7: process 1 doesn't abort and never stops.
> 2.1.1 process 1 doesn't abort, but stops after it is finished sleeping
>
> Sincerely,
>
> Ted Sussman
>
> On 15 Jun 2017 at 9:18, ***@open-mpi.org wrote:
>
>> Here is how the system is working:
>>
>> Master: each process is put into its own process group upon launch. When we issue a "kill", however, we only issue it to the individual process (instead of the process group that is headed by that child process). This is probably a bug as I donÂŽt believe that is what we intended, but set that aside for now.
>>
>> 2.x: each process is put into its own process group upon launch. When we issue a "kill", we issue it to the process group. Thus, every child proc of that child proc will receive it. IIRC, this was the intended behavior.
>>
>> It is rather trivial to make the change (it only involves 3 lines of code), but IÂŽm not sure of what our intended behavior is supposed to be. Once we clarify that, it is also trivial to add another MCA param (you can never have too many!) to allow you to select the other behavior.
>>
>>
>>> On Jun 15, 2017, at 5:23 AM, Ted Sussman <***@adina.com> wrote:
>>>
>>> Hello Gilles,
>>>
>>> Thank you for your quick answer. I confirm that if exec is used, both processes immediately
>>> abort.
>>>
>>> Now suppose that the line
>>>
>>> echo "After aborttest: OMPI_COMM_WORLD_RANK="$OMPI_COMM_WORLD_RANK
>>>
>>> is added to the end of dum.sh.
>>>
>>> If Example 2 is run with Open MPI 1.4.3, the output is
>>>
>>> After aborttest: OMPI_COMM_WORLD_RANK=0
>>>
>>> which shows that the shell script for the process with rank 0 continues after the abort,
>>> but that the shell script for the process with rank 1 does not continue after the abort.
>>>
>>> If Example 2 is run with Open MPI 2.1.1, with exec used to invoke aborttest02.exe, then
>>> there is no such output, which shows that both shell scripts do not continue after the abort.
>>>
>>> I prefer the Open MPI 1.4.3 behavior because our original application depends upon the
>>> Open MPI 1.4.3 behavior. (Our original application will also work if both executables are
>>> aborted, and if both shell scripts continue after the abort.)
>>>
>>> It might be too much to expect, but is there a way to recover the Open MPI 1.4.3 behavior
>>> using Open MPI 2.1.1?
>>>
>>> Sincerely,
>>>
>>> Ted Sussman
>>>
>>>
>>> On 15 Jun 2017 at 9:50, Gilles Gouaillardet wrote:
>>>
>>>> Ted,
>>>>
>>>>
>>>> fwiw, the 'master' branch has the behavior you expect.
>>>>
>>>>
>>>> meanwhile, you can simple edit your 'dum.sh' script and replace
>>>>
>>>> /home/buildadina/src/aborttest02/aborttest02.exe
>>>>
>>>> with
>>>>
>>>> exec /home/buildadina/src/aborttest02/aborttest02.exe
>>>>
>>>>
>>>> Cheers,
>>>>
>>>>
>>>> Gilles
>>>>
>>>>
>>>> On 6/15/2017 3:01 AM, Ted Sussman wrote:
>>>>> Hello,
>>>>>
>>>>> My question concerns MPI_ABORT, indirect execution of executables by mpirun and Open
>>>>> MPI 2.1.1. When mpirun runs executables directly, MPI_ABORT works as expected, but
>>>>> when mpirun runs executables indirectly, MPI_ABORT does not work as expected.
>>>>>
>>>>> If Open MPI 1.4.3 is used instead of Open MPI 2.1.1, MPI_ABORT works as expected in all
>>>>> cases.
>>>>>
>>>>> The examples given below have been simplified as far as possible to show the issues.
>>>>>
>>>>> ---
>>>>>
>>>>> Example 1
>>>>>
>>>>> Consider an MPI job run in the following way:
>>>>>
>>>>> mpirun ... -app addmpw1
>>>>>
>>>>> where the appfile addmpw1 lists two executables:
>>>>>
>>>>> -n 1 -host gulftown ... aborttest02.exe
>>>>> -n 1 -host gulftown ... aborttest02.exe
>>>>>
>>>>> The two executables are executed on the local node gulftown. aborttest02 calls MPI_ABORT
>>>>> for rank 0, then sleeps.
>>>>>
>>>>> The above MPI job runs as expected. Both processes immediately abort when rank 0 calls
>>>>> MPI_ABORT.
>>>>>
>>>>> ---
>>>>>
>>>>> Example 2
>>>>>
>>>>> Now change the above example as follows:
>>>>>
>>>>> mpirun ... -app addmpw2
>>>>>
>>>>> where the appfile addmpw2 lists shell scripts:
>>>>>
>>>>> -n 1 -host gulftown ... dum.sh
>>>>> -n 1 -host gulftown ... dum.sh
>>>>>
>>>>> dum.sh invokes aborttest02.exe. So aborttest02.exe is executed indirectly by mpirun.
>>>>>
>>>>> In this case, the MPI job only aborts process 0 when rank 0 calls MPI_ABORT. Process 1
>>>>> continues to run. This behavior is unexpected.
>>>>>
>>>>> ----
>>>>>
>>>>> I have attached all files to this E-mail. Since there are absolute pathnames in the files, to
>>>>> reproduce my findings, you will need to update the pathnames in the appfiles and shell
>>>>> scripts. To run example 1,
>>>>>
>>>>> sh run1.sh
>>>>>
>>>>> and to run example 2,
>>>>>
>>>>> sh run2.sh
>>>>>
>>>>> ---
>>>>>
>>>>> I have tested these examples with Open MPI 1.4.3 and 2.0.3. In Open MPI 1.4.3, both
>>>>> examples work as expected. Open MPI 2.0.3 has the same behavior as Open MPI 2.1.1.
>>>>>
>>>>> ---
>>>>>
>>>>> I would prefer that Open MPI 2.1.1 aborts both processes, even when the executables are
>>>>> invoked indirectly by mpirun. If there is an MCA setting that is needed to make Open MPI
>>>>> 2.1.1 abort both processes, please let me know.
>>>>>
>>>>>
>>>>> Sincerely,
>>>>>
>>>>> Theodore Sussman
>>>>>
>>>>>
>>>>> The following section of this message contains a file attachment
>>>>> prepared for transmission using the Internet MIME message format.
>>>>> If you are using Pegasus Mail, or any other MIME-compliant system,
>>>>> you should be able to save it or view it from within your mailer.
>>>>> If you cannot, please ask your system administrator for assistance.
>>>>>
>>>>> ---- File information -----------
>>>>> File: config.log.bz2
>>>>> Date: 14 Jun 2017, 13:35
>>>>> Size: 146548 bytes.
>>>>> Type: Binary
>>>>>
>>>>>
>>>>> The following section of this message contains a file attachment
>>>>> prepared for transmission using the Internet MIME message format.
>>>>> If you are using Pegasus Mail, or any other MIME-compliant system,
>>>>> you should be able to save it or view it from within your mailer.
>>>>> If you cannot, please ask your system administrator for assistance.
>>>>>
>>>>> ---- File information -----------
>>>>> File: ompi_info.bz2
>>>>> Date: 14 Jun 2017, 13:35
>>>>> Size: 24088 bytes.
>>>>> Type: Binary
>>>>>
>>>>>
>>>>> The following section of this message contains a file attachment
>>>>> prepared for transmission using the Internet MIME message format.
>>>>> If you are using Pegasus Mail, or any other MIME-compliant system,
>>>>> you should be able to save it or view it from within your mailer.
>>>>> If you cannot, please ask your system administrator for assistance.
>>>>>
>>>>> ---- File information -----------
>>>>> File: aborttest02.tgz
>>>>> Date: 14 Jun 2017, 13:52
>>>>> Size: 4285 bytes.
>>>>> Type: Binary
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> ***@lists.open-mpi.org
>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> ***@lists.open-mpi.org
>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> ***@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
>> _______________________________________________
>> users mailing list
>> ***@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Ted Sussman
2017-06-15 20:37:38 UTC
Permalink
Hello Ralph,

I am just an Open MPI end user, so I will need to wait for the next official release.

mpirun --> shell for process 0 --> executable for process 0 --> MPI calls
--> shell for process 1 --> executable for process 1 --> MPI calls
...

I guess the question is, should MPI_ABORT kill the executables or the shells? I naively
thought, that, since it is the executables that make the MPI calls, it is the executables that
should be aborted by the call to MPI_ABORT. Since the shells don't make MPI calls, the
shells should not be aborted.

And users might have several layers of shells in between mpirun and the executable.

So now I will look for the latest version of Open MPI that has the 1.4.3 behavior.

Sincerely,

Ted Sussman

On 15 Jun 2017 at 12:31, ***@open-mpi.org wrote:

>
> Yeah, things jittered a little there as we debated the "right" behavior. Generally, when we see that
> happening it means that a param is required, but somehow we never reached that point.
>
> See if https://github.com/open-mpi/ompi/pull/3704  helps - if so, I can schedule it for the next 2.x
> release if the RMs agree to take it
>
> Ralph
>
> On Jun 15, 2017, at 12:20 PM, Ted Sussman <***@adina.com> wrote:
>
> Thank you for your comments.
>
> Our application relies upon "dum.sh" to clean up after the process exits, either if the process
> exits normally, or if the process exits abnormally because of MPI_ABORT.  If the process
> group is killed by MPI_ABORT, this clean up will not be performed.  If exec is used to launch
> the executable from dum.sh, then dum.sh is terminated by the exec, so dum.sh cannot
> perform any clean up.
>
> I suppose that other user applications might work similarly, so it would be good to have an
> MCA parameter to control the behavior of MPI_ABORT.
>
> We could rewrite our shell script that invokes mpirun, so that the cleanup that is now done
> by
> dum.sh is done by the invoking shell script after mpirun exits.  Perhaps this technique is the
> preferred way to clean up after mpirun is invoked.
>
> By the way, I have also tested with Open MPI 1.10.7, and Open MPI 1.10.7 has different
> behavior than either Open MPI 1.4.3 or Open MPI 2.1.1.  In this explanation, it is important to
> know that the aborttest executable sleeps for 20 sec.
>
> When running example 2:
>
> 1.4.3: process 1 immediately aborts
> 1.10.7: process 1 doesn't abort and never stops.
> 2.1.1 process 1 doesn't abort, but stops after it is finished sleeping
>
> Sincerely,
>
> Ted Sussman
>
> On 15 Jun 2017 at 9:18, ***@open-mpi.org wrote:
>
> Here is how the system is working:
>
> Master: each process is put into its own process group upon launch. When we issue a
> "kill", however, we only issue it to the individual process (instead of the process group
> that is headed by that child process). This is probably a bug as I donŽt believe that is
> what we intended, but set that aside for now.
>
> 2.x: each process is put into its own process group upon launch. When we issue a
> "kill", we issue it to the process group. Thus, every child proc of that child proc will
> receive it. IIRC, this was the intended behavior.
>
> It is rather trivial to make the change (it only involves 3 lines of code), but IŽm not sure
> of what our intended behavior is supposed to be. Once we clarify that, it is also trivial
> to add another MCA param (you can never have too many!) to allow you to select the
> other behavior.
>
>
> On Jun 15, 2017, at 5:23 AM, Ted Sussman <***@adina.com> wrote:
>
> Hello Gilles,
>
> Thank you for your quick answer.  I confirm that if exec is used, both processes
> immediately
> abort.
>
> Now suppose that the line
>
> echo "After aborttest:
> OMPI_COMM_WORLD_RANK="$OMPI_COMM_WORLD_RANK
>
> is added to the end of dum.sh.
>
> If Example 2 is run with Open MPI 1.4.3, the output is
>
> After aborttest: OMPI_COMM_WORLD_RANK=0
>
> which shows that the shell script for the process with rank 0 continues after the
> abort,
> but that the shell script for the process with rank 1 does not continue after the
> abort.
>
> If Example 2 is run with Open MPI 2.1.1, with exec used to invoke
> aborttest02.exe, then
> there is no such output, which shows that both shell scripts do not continue after
> the abort.
>
> I prefer the Open MPI 1.4.3 behavior because our original application depends
> upon the
> Open MPI 1.4.3 behavior.  (Our original application will also work if both
> executables are
> aborted, and if both shell scripts continue after the abort.)
>
> It might be too much to expect, but is there a way to recover the Open MPI 1.4.3
> behavior
> using Open MPI 2.1.1?  
>
> Sincerely,
>
> Ted Sussman
>
>
> On 15 Jun 2017 at 9:50, Gilles Gouaillardet wrote:
>
> Ted,
>
>
> fwiw, the 'master' branch has the behavior you expect.
>
>
> meanwhile, you can simple edit your 'dum.sh' script and replace
>
> /home/buildadina/src/aborttest02/aborttest02.exe
>
> with
>
> exec /home/buildadina/src/aborttest02/aborttest02.exe
>
>
> Cheers,
>
>
> Gilles
>
>
> On 6/15/2017 3:01 AM, Ted Sussman wrote:
> Hello,
>
> My question concerns MPI_ABORT, indirect execution of
> executables by mpirun and Open
> MPI 2.1.1.  When mpirun runs executables directly, MPI_ABORT
> works as expected, but
> when mpirun runs executables indirectly, MPI_ABORT does not
> work as expected.
>
> If Open MPI 1.4.3 is used instead of Open MPI 2.1.1, MPI_ABORT
> works as expected in all
> cases.
>
> The examples given below have been simplified as far as possible
> to show the issues.
>
> ---
>
> Example 1
>
> Consider an MPI job run in the following way:
>
> mpirun ... -app addmpw1
>
> where the appfile addmpw1 lists two executables:
>
> -n 1 -host gulftown ... aborttest02.exe
> -n 1 -host gulftown ... aborttest02.exe
>
> The two executables are executed on the local node gulftown.
>  aborttest02 calls MPI_ABORT
> for rank 0, then sleeps.
>
> The above MPI job runs as expected.  Both processes immediately
> abort when rank 0 calls
> MPI_ABORT.
>
> ---
>
> Example 2
>
> Now change the above example as follows:
>
> mpirun ... -app addmpw2
>
> where the appfile addmpw2 lists shell scripts:
>
> -n 1 -host gulftown ... dum.sh
> -n 1 -host gulftown ... dum.sh
>
> dum.sh invokes aborttest02.exe.  So aborttest02.exe is executed
> indirectly by mpirun.
>
> In this case, the MPI job only aborts process 0 when rank 0 calls
> MPI_ABORT.  Process 1
> continues to run.  This behavior is unexpected.
>
> ----
>
> I have attached all files to this E-mail.  Since there are absolute
> pathnames in the files, to
> reproduce my findings, you will need to update the pathnames in the
> appfiles and shell
> scripts.  To run example 1,
>
> sh run1.sh
>
> and to run example 2,
>
> sh run2.sh
>
> ---
>
> I have tested these examples with Open MPI 1.4.3 and 2.0.3.  In
> Open MPI 1.4.3, both
> examples work as expected.  Open MPI 2.0.3 has the same behavior
> as Open MPI 2.1.1.
>
> ---
>
> I would prefer that Open MPI 2.1.1 aborts both processes, even
> when the executables are
> invoked indirectly by mpirun.  If there is an MCA setting that is
> needed to make Open MPI
> 2.1.1 abort both processes, please let me know.
>
>
> Sincerely,
>
> Theodore Sussman
>
>
> The following section of this message contains a file attachment
> prepared for transmission using the Internet MIME message format.
> If you are using Pegasus Mail, or any other MIME-compliant system,
> you should be able to save it or view it from within your mailer.
> If you cannot, please ask your system administrator for assistance.
>
>   ---- File information -----------
>     File:  config.log.bz2
>     Date:  14 Jun 2017, 13:35
>     Size:  146548 bytes.
>     Type:  Binary
>
>
> The following section of this message contains a file attachment
> prepared for transmission using the Internet MIME message format.
> If you are using Pegasus Mail, or any other MIME-compliant system,
> you should be able to save it or view it from within your mailer.
> If you cannot, please ask your system administrator for assistance.
>
>   ---- File information -----------
>     File:  ompi_info.bz2
>     Date:  14 Jun 2017, 13:35
>     Size:  24088 bytes.
>     Type:  Binary
>
>
> The following section of this message contains a file attachment
> prepared for transmission using the Internet MIME message format.
> If you are using Pegasus Mail, or any other MIME-compliant system,
> you should be able to save it or view it from within your mailer.
> If you cannot, please ask your system administrator for assistance.
>
>   ---- File information -----------
>     File:  aborttest02.tgz
>     Date:  14 Jun 2017, 13:52
>     Size:  4285 bytes.
>     Type:  Binary
>
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
r***@open-mpi.org
2017-06-15 21:44:42 UTC
Permalink
You have to understand that we have no way of knowing who is making MPI calls - all we see is the proc that we started, and we know someone of that rank is running (but we have no way of knowing which of the procs you sub-spawned it is).

So the behavior you are seeking only occurred in some earlier release by sheer accident. Nor will you find it portable as there is no specification directing that behavior.

The behavior I’ve provided is to either deliver the signal to _all_ child processes (including grandchildren etc.), or _only_ the immediate child of the daemon. It won’t do what you describe - kill the mPI proc underneath the shell, but not the shell itself.

What you can eventually do is use PMIx to ask the runtime to selectively deliver signals to pid/procs for you. We don’t have that capability implemented just yet, I’m afraid.

Meantime, when I get a chance, I can code an option that will record the pid of the subproc that calls MPI_Init, and then let’s you deliver signals to just that proc. No promises as to when that will be done.


> On Jun 15, 2017, at 1:37 PM, Ted Sussman <***@adina.com> wrote:
>
> Hello Ralph,
>
> I am just an Open MPI end user, so I will need to wait for the next official release.
>
> mpirun --> shell for process 0 --> executable for process 0 --> MPI calls
> --> shell for process 1 --> executable for process 1 --> MPI calls
> ...
>
> I guess the question is, should MPI_ABORT kill the executables or the shells? I naively thought, that, since it is the executables that make the MPI calls, it is the executables that should be aborted by the call to MPI_ABORT. Since the shells don't make MPI calls, the shells should not be aborted.
>
> And users might have several layers of shells in between mpirun and the executable.
>
> So now I will look for the latest version of Open MPI that has the 1.4.3 behavior.
>
> Sincerely,
>
> Ted Sussman
>
> On 15 Jun 2017 at 12:31, ***@open-mpi.org wrote:
>
> >
> > Yeah, things jittered a little there as we debated the “right” behavior. Generally, when we see that
> > happening it means that a param is required, but somehow we never reached that point.
> >
> > See if https://github.com/open-mpi/ompi/pull/3704 helps - if so, I can schedule it for the next 2.x
> > release if the RMs agree to take it
> >
> > Ralph
> >
> > On Jun 15, 2017, at 12:20 PM, Ted Sussman <***@adina.com> wrote:
> >
> > Thank you for your comments.
> >
> > Our application relies upon "dum.sh" to clean up after the process exits, either if the process
> > exits normally, or if the process exits abnormally because of MPI_ABORT. If the process
> > group is killed by MPI_ABORT, this clean up will not be performed. If exec is used to launch
> > the executable from dum.sh, then dum.sh is terminated by the exec, so dum.sh cannot
> > perform any clean up.
> >
> > I suppose that other user applications might work similarly, so it would be good to have an
> > MCA parameter to control the behavior of MPI_ABORT.
> >
> > We could rewrite our shell script that invokes mpirun, so that the cleanup that is now done
> > by
> > dum.sh is done by the invoking shell script after mpirun exits. Perhaps this technique is the
> > preferred way to clean up after mpirun is invoked.
> >
> > By the way, I have also tested with Open MPI 1.10.7, and Open MPI 1.10.7 has different
> > behavior than either Open MPI 1.4.3 or Open MPI 2.1.1. In this explanation, it is important to
> > know that the aborttest executable sleeps for 20 sec.
> >
> > When running example 2:
> >
> > 1.4.3: process 1 immediately aborts
> > 1.10.7: process 1 doesn't abort and never stops.
> > 2.1.1 process 1 doesn't abort, but stops after it is finished sleeping
> >
> > Sincerely,
> >
> > Ted Sussman
> >
> > On 15 Jun 2017 at 9:18, ***@open-mpi.org wrote:
> >
> > Here is how the system is working:
> >
> > Master: each process is put into its own process group upon launch. When we issue a
> > "kill", however, we only issue it to the individual process (instead of the process group
> > that is headed by that child process). This is probably a bug as I donÂŽt believe that is
> > what we intended, but set that aside for now.
> >
> > 2.x: each process is put into its own process group upon launch. When we issue a
> > "kill", we issue it to the process group. Thus, every child proc of that child proc will
> > receive it. IIRC, this was the intended behavior.
> >
> > It is rather trivial to make the change (it only involves 3 lines of code), but IÂŽm not sure
> > of what our intended behavior is supposed to be. Once we clarify that, it is also trivial
> > to add another MCA param (you can never have too many!) to allow you to select the
> > other behavior.
> >
> >
> > On Jun 15, 2017, at 5:23 AM, Ted Sussman <***@adina.com> wrote:
> >
> > Hello Gilles,
> >
> > Thank you for your quick answer. I confirm that if exec is used, both processes
> > immediately
> > abort.
> >
> > Now suppose that the line
> >
> > echo "After aborttest:
> > OMPI_COMM_WORLD_RANK="$OMPI_COMM_WORLD_RANK
> >
> > is added to the end of dum.sh.
> >
> > If Example 2 is run with Open MPI 1.4.3, the output is
> >
> > After aborttest: OMPI_COMM_WORLD_RANK=0
> >
> > which shows that the shell script for the process with rank 0 continues after the
> > abort,
> > but that the shell script for the process with rank 1 does not continue after the
> > abort.
> >
> > If Example 2 is run with Open MPI 2.1.1, with exec used to invoke
> > aborttest02.exe, then
> > there is no such output, which shows that both shell scripts do not continue after
> > the abort.
> >
> > I prefer the Open MPI 1.4.3 behavior because our original application depends
> > upon the
> > Open MPI 1.4.3 behavior. (Our original application will also work if both
> > executables are
> > aborted, and if both shell scripts continue after the abort.)
> >
> > It might be too much to expect, but is there a way to recover the Open MPI 1.4.3
> > behavior
> > using Open MPI 2.1.1?
> >
> > Sincerely,
> >
> > Ted Sussman
> >
> >
> > On 15 Jun 2017 at 9:50, Gilles Gouaillardet wrote:
> >
> > Ted,
> >
> >
> > fwiw, the 'master' branch has the behavior you expect.
> >
> >
> > meanwhile, you can simple edit your 'dum.sh' script and replace
> >
> > /home/buildadina/src/aborttest02/aborttest02.exe
> >
> > with
> >
> > exec /home/buildadina/src/aborttest02/aborttest02.exe
> >
> >
> > Cheers,
> >
> >
> > Gilles
> >
> >
> > On 6/15/2017 3:01 AM, Ted Sussman wrote:
> > Hello,
> >
> > My question concerns MPI_ABORT, indirect execution of
> > executables by mpirun and Open
> > MPI 2.1.1. When mpirun runs executables directly, MPI_ABORT
> > works as expected, but
> > when mpirun runs executables indirectly, MPI_ABORT does not
> > work as expected.
> >
> > If Open MPI 1.4.3 is used instead of Open MPI 2.1.1, MPI_ABORT
> > works as expected in all
> > cases.
> >
> > The examples given below have been simplified as far as possible
> > to show the issues.
> >
> > ---
> >
> > Example 1
> >
> > Consider an MPI job run in the following way:
> >
> > mpirun ... -app addmpw1
> >
> > where the appfile addmpw1 lists two executables:
> >
> > -n 1 -host gulftown ... aborttest02.exe
> > -n 1 -host gulftown ... aborttest02.exe
> >
> > The two executables are executed on the local node gulftown.
> > aborttest02 calls MPI_ABORT
> > for rank 0, then sleeps.
> >
> > The above MPI job runs as expected. Both processes immediately
> > abort when rank 0 calls
> > MPI_ABORT.
> >
> > ---
> >
> > Example 2
> >
> > Now change the above example as follows:
> >
> > mpirun ... -app addmpw2
> >
> > where the appfile addmpw2 lists shell scripts:
> >
> > -n 1 -host gulftown ... dum.sh
> > -n 1 -host gulftown ... dum.sh
> >
> > dum.sh invokes aborttest02.exe. So aborttest02.exe is executed
> > indirectly by mpirun.
> >
> > In this case, the MPI job only aborts process 0 when rank 0 calls
> > MPI_ABORT. Process 1
> > continues to run. This behavior is unexpected.
> >
> > ----
> >
> > I have attached all files to this E-mail. Since there are absolute
> > pathnames in the files, to
> > reproduce my findings, you will need to update the pathnames in the
> > appfiles and shell
> > scripts. To run example 1,
> >
> > sh run1.sh
> >
> > and to run example 2,
> >
> > sh run2.sh
> >
> > ---
> >
> > I have tested these examples with Open MPI 1.4.3 and 2.0.3. In
> > Open MPI 1.4.3, both
> > examples work as expected. Open MPI 2.0.3 has the same behavior
> > as Open MPI 2.1.1.
> >
> > ---
> >
> > I would prefer that Open MPI 2.1.1 aborts both processes, even
> > when the executables are
> > invoked indirectly by mpirun. If there is an MCA setting that is
> > needed to make Open MPI
> > 2.1.1 abort both processes, please let me know.
> >
> >
> > Sincerely,
> >
> > Theodore Sussman
> >
> >
> > The following section of this message contains a file attachment
> > prepared for transmission using the Internet MIME message format.
> > If you are using Pegasus Mail, or any other MIME-compliant system,
> > you should be able to save it or view it from within your mailer.
> > If you cannot, please ask your system administrator for assistance.
> >
> > ---- File information -----------
> > File: config.log.bz2
> > Date: 14 Jun 2017, 13:35
> > Size: 146548 bytes.
> > Type: Binary
> >
> >
> > The following section of this message contains a file attachment
> > prepared for transmission using the Internet MIME message format.
> > If you are using Pegasus Mail, or any other MIME-compliant system,
> > you should be able to save it or view it from within your mailer.
> > If you cannot, please ask your system administrator for assistance.
> >
> > ---- File information -----------
> > File: ompi_info.bz2
> > Date: 14 Jun 2017, 13:35
> > Size: 24088 bytes.
> > Type: Binary
> >
> >
> > The following section of this message contains a file attachment
> > prepared for transmission using the Internet MIME message format.
> > If you are using Pegasus Mail, or any other MIME-compliant system,
> > you should be able to save it or view it from within your mailer.
> > If you cannot, please ask your system administrator for assistance.
> >
> > ---- File information -----------
> > File: aborttest02.tgz
> > Date: 14 Jun 2017, 13:52
> > Size: 4285 bytes.
> > Type: Binary
> >
> >
> > _______________________________________________
> > users mailing list
> > ***@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
> > _______________________________________________
> > users mailing list
> > ***@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > ***@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
> > _______________________________________________
> > users mailing list
> > ***@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > ***@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
>
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Gilles Gouaillardet
2017-06-16 00:55:35 UTC
Permalink
Ted,

note that the shell receives a SIGTERM followed by a SIGKILL (if
needed ?) from Open MPI

so if you cannot exec the MPI binary, you have the option to trap
SIGTERM in your shell script, and then manually propagate it (or a
SIGKILL) to the MPI app

Cheers,

Gilles

On Fri, Jun 16, 2017 at 6:44 AM, ***@open-mpi.org <***@open-mpi.org> wrote:
> You have to understand that we have no way of knowing who is making MPI
> calls - all we see is the proc that we started, and we know someone of that
> rank is running (but we have no way of knowing which of the procs you
> sub-spawned it is).
>
> So the behavior you are seeking only occurred in some earlier release by
> sheer accident. Nor will you find it portable as there is no specification
> directing that behavior.
>
> The behavior I’ve provided is to either deliver the signal to _all_ child
> processes (including grandchildren etc.), or _only_ the immediate child of
> the daemon. It won’t do what you describe - kill the mPI proc underneath the
> shell, but not the shell itself.
>
> What you can eventually do is use PMIx to ask the runtime to selectively
> deliver signals to pid/procs for you. We don’t have that capability
> implemented just yet, I’m afraid.
>
> Meantime, when I get a chance, I can code an option that will record the pid
> of the subproc that calls MPI_Init, and then let’s you deliver signals to
> just that proc. No promises as to when that will be done.
>
>
> On Jun 15, 2017, at 1:37 PM, Ted Sussman <***@adina.com> wrote:
>
> Hello Ralph,
>
> I am just an Open MPI end user, so I will need to wait for the next official
> release.
>
> mpirun --> shell for process 0 --> executable for process 0 --> MPI calls
> --> shell for process 1 --> executable for process 1 --> MPI calls
> ...
>
> I guess the question is, should MPI_ABORT kill the executables or the
> shells? I naively thought, that, since it is the executables that make the
> MPI calls, it is the executables that should be aborted by the call to
> MPI_ABORT. Since the shells don't make MPI calls, the shells should not be
> aborted.
>
> And users might have several layers of shells in between mpirun and the
> executable.
>
> So now I will look for the latest version of Open MPI that has the 1.4.3
> behavior.
>
> Sincerely,
>
> Ted Sussman
>
> On 15 Jun 2017 at 12:31, ***@open-mpi.org wrote:
>
>>
>> Yeah, things jittered a little there as we debated the “right” behavior.
>> Generally, when we see that
>> happening it means that a param is required, but somehow we never reached
>> that point.
>>
>> See if https://github.com/open-mpi/ompi/pull/3704 helps - if so, I can
>> schedule it for the next 2.x
>> release if the RMs agree to take it
>>
>> Ralph
>>
>> On Jun 15, 2017, at 12:20 PM, Ted Sussman <***@adina.com>
>> wrote:
>>
>> Thank you for your comments.
>>
>> Our application relies upon "dum.sh" to clean up after the process
>> exits, either if the process
>> exits normally, or if the process exits abnormally because of
>> MPI_ABORT. If the process
>> group is killed by MPI_ABORT, this clean up will not be performed. If
>> exec is used to launch
>> the executable from dum.sh, then dum.sh is terminated by the exec, so
>> dum.sh cannot
>> perform any clean up.
>>
>> I suppose that other user applications might work similarly, so it
>> would be good to have an
>> MCA parameter to control the behavior of MPI_ABORT.
>>
>> We could rewrite our shell script that invokes mpirun, so that the
>> cleanup that is now done
>> by
>> dum.sh is done by the invoking shell script after mpirun exits.
>> Perhaps this technique is the
>> preferred way to clean up after mpirun is invoked.
>>
>> By the way, I have also tested with Open MPI 1.10.7, and Open MPI
>> 1.10.7 has different
>> behavior than either Open MPI 1.4.3 or Open MPI 2.1.1. In this
>> explanation, it is important to
>> know that the aborttest executable sleeps for 20 sec.
>>
>> When running example 2:
>>
>> 1.4.3: process 1 immediately aborts
>> 1.10.7: process 1 doesn't abort and never stops.
>> 2.1.1 process 1 doesn't abort, but stops after it is finished sleeping
>>
>> Sincerely,
>>
>> Ted Sussman
>>
>> On 15 Jun 2017 at 9:18, ***@open-mpi.org wrote:
>>
>> Here is how the system is working:
>>
>> Master: each process is put into its own process group upon launch.
>> When we issue a
>> "kill", however, we only issue it to the individual process (instead
>> of the process group
>> that is headed by that child process). This is probably a bug as I
>> don´t believe that is
>> what we intended, but set that aside for now.
>>
>> 2.x: each process is put into its own process group upon launch. When
>> we issue a
>> "kill", we issue it to the process group. Thus, every child proc of
>> that child proc will
>> receive it. IIRC, this was the intended behavior.
>>
>> It is rather trivial to make the change (it only involves 3 lines of
>> code), but I´m not sure
>> of what our intended behavior is supposed to be. Once we clarify that,
>> it is also trivial
>> to add another MCA param (you can never have too many!) to allow you
>> to select the
>> other behavior.
>>
>>
>> On Jun 15, 2017, at 5:23 AM, Ted Sussman <***@adina.com>
>> wrote:
>>
>> Hello Gilles,
>>
>> Thank you for your quick answer. I confirm that if exec is used, both
>> processes
>> immediately
>> abort.
>>
>> Now suppose that the line
>>
>> echo "After aborttest:
>> OMPI_COMM_WORLD_RANK="$OMPI_COMM_WORLD_RANK
>>
>> is added to the end of dum.sh.
>>
>> If Example 2 is run with Open MPI 1.4.3, the output is
>>
>> After aborttest: OMPI_COMM_WORLD_RANK=0
>>
>> which shows that the shell script for the process with rank 0
>> continues after the
>> abort,
>> but that the shell script for the process with rank 1 does not
>> continue after the
>> abort.
>>
>> If Example 2 is run with Open MPI 2.1.1, with exec used to invoke
>> aborttest02.exe, then
>> there is no such output, which shows that both shell scripts do not
>> continue after
>> the abort.
>>
>> I prefer the Open MPI 1.4.3 behavior because our original application
>> depends
>> upon the
>> Open MPI 1.4.3 behavior. (Our original application will also work if
>> both
>> executables are
>> aborted, and if both shell scripts continue after the abort.)
>>
>> It might be too much to expect, but is there a way to recover the Open
>> MPI 1.4.3
>> behavior
>> using Open MPI 2.1.1?
>>
>> Sincerely,
>>
>> Ted Sussman
>>
>>
>> On 15 Jun 2017 at 9:50, Gilles Gouaillardet wrote:
>>
>> Ted,
>>
>>
>> fwiw, the 'master' branch has the behavior you expect.
>>
>>
>> meanwhile, you can simple edit your 'dum.sh' script and replace
>>
>> /home/buildadina/src/aborttest02/aborttest02.exe
>>
>> with
>>
>> exec /home/buildadina/src/aborttest02/aborttest02.exe
>>
>>
>> Cheers,
>>
>>
>> Gilles
>>
>>
>> On 6/15/2017 3:01 AM, Ted Sussman wrote:
>> Hello,
>>
>> My question concerns MPI_ABORT, indirect execution of
>> executables by mpirun and Open
>> MPI 2.1.1. When mpirun runs executables directly, MPI_ABORT
>> works as expected, but
>> when mpirun runs executables indirectly, MPI_ABORT does not
>> work as expected.
>>
>> If Open MPI 1.4.3 is used instead of Open MPI 2.1.1, MPI_ABORT
>> works as expected in all
>> cases.
>>
>> The examples given below have been simplified as far as possible
>> to show the issues.
>>
>> ---
>>
>> Example 1
>>
>> Consider an MPI job run in the following way:
>>
>> mpirun ... -app addmpw1
>>
>> where the appfile addmpw1 lists two executables:
>>
>> -n 1 -host gulftown ... aborttest02.exe
>> -n 1 -host gulftown ... aborttest02.exe
>>
>> The two executables are executed on the local node gulftown.
>> aborttest02 calls MPI_ABORT
>> for rank 0, then sleeps.
>>
>> The above MPI job runs as expected. Both processes immediately
>> abort when rank 0 calls
>> MPI_ABORT.
>>
>> ---
>>
>> Example 2
>>
>> Now change the above example as follows:
>>
>> mpirun ... -app addmpw2
>>
>> where the appfile addmpw2 lists shell scripts:
>>
>> -n 1 -host gulftown ... dum.sh
>> -n 1 -host gulftown ... dum.sh
>>
>> dum.sh invokes aborttest02.exe. So aborttest02.exe is executed
>> indirectly by mpirun.
>>
>> In this case, the MPI job only aborts process 0 when rank 0 calls
>> MPI_ABORT. Process 1
>> continues to run. This behavior is unexpected.
>>
>> ----
>>
>> I have attached all files to this E-mail. Since there are absolute
>> pathnames in the files, to
>> reproduce my findings, you will need to update the pathnames in the
>> appfiles and shell
>> scripts. To run example 1,
>>
>> sh run1.sh
>>
>> and to run example 2,
>>
>> sh run2.sh
>>
>> ---
>>
>> I have tested these examples with Open MPI 1.4.3 and 2.0.3. In
>> Open MPI 1.4.3, both
>> examples work as expected. Open MPI 2.0.3 has the same behavior
>> as Open MPI 2.1.1.
>>
>> ---
>>
>> I would prefer that Open MPI 2.1.1 aborts both processes, even
>> when the executables are
>> invoked indirectly by mpirun. If there is an MCA setting that is
>> needed to make Open MPI
>> 2.1.1 abort both processes, please let me know.
>>
>>
>> Sincerely,
>>
>> Theodore Sussman
>>
>>
>> The following section of this message contains a file attachment
>> prepared for transmission using the Internet MIME message format.
>> If you are using Pegasus Mail, or any other MIME-compliant system,
>> you should be able to save it or view it from within your mailer.
>> If you cannot, please ask your system administrator for assistance.
>>
>> ---- File information -----------
>> File: config.log.bz2
>> Date: 14 Jun 2017, 13:35
>> Size: 146548 bytes.
>> Type: Binary
>>
>>
>> The following section of this message contains a file attachment
>> prepared for transmission using the Internet MIME message format.
>> If you are using Pegasus Mail, or any other MIME-compliant system,
>> you should be able to save it or view it from within your mailer.
>> If you cannot, please ask your system administrator for assistance.
>>
>> ---- File information -----------
>> File: ompi_info.bz2
>> Date: 14 Jun 2017, 13:35
>> Size: 24088 bytes.
>> Type: Binary
>>
>>
>> The following section of this message contains a file attachment
>> prepared for transmission using the Internet MIME message format.
>> If you are using Pegasus Mail, or any other MIME-compliant system,
>> you should be able to save it or view it from within your mailer.
>> If you cannot, please ask your system administrator for assistance.
>>
>> ---- File information -----------
>> File: aborttest02.tgz
>> Date: 14 Jun 2017, 13:52
>> Size: 4285 bytes.
>> Type: Binary
>>
>>
>> _______________________________________________
>> users mailing list
>> ***@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
>> _______________________________________________
>> users mailing list
>> ***@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> ***@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
>> _______________________________________________
>> users mailing list
>> ***@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> ***@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
>
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Ted Sussman
2017-06-16 14:08:39 UTC
Permalink
Hello Gilles and Ralph,

Thank you for your advice so far. I appreciate the time that you have spent to educate me
about the details of Open MPI.

But I think that there is something fundamental that I don't understand. Consider Example 2
run with Open MPI 2.1.1.

mpirun --> shell for process 0 -->  executable for process 0 --> MPI calls, MPI_Abort
   --> shell for process 1 -->  executable for process 1 --> MPI calls

After the MPI_Abort is called, ps shows that both shells are running, and that the executable
for process 1 is running (in this case, process 1 is sleeping). And mpirun does not exit until
process 1 is finished sleeping.

I cannot reconcile this observed behavior with the statement

> >     2.x: each process is put into its own process group upon launch. When we issue a
> >     "kill", we issue it to the process group. Thus, every child proc of that child proc will
> >     receive it. IIRC, this was the intended behavior.

I assume that, for my example, there are two process groups. The process group for
process 0 contains the shell for process 0 and the executable for process 0; and the process
group for process 1 contains the shell for process 1 and the executable for process 1. So
what does MPI_ABORT do? MPI_ABORT does not kill the process group for process 0,
since the shell for process 0 continues. And MPI_ABORT does not kill the process group for
process 1, since both the shell and executable for process 1 continue.

If I hit Ctrl-C after MPI_Abort is called, I get the message

mpirun: abort is already in progress.. hit ctrl-c again to forcibly terminate

but I don't need to hit Ctrl-C again because mpirun immediately exits.

Can you shed some light on all of this?

Sincerely,

Ted Sussman


On 15 Jun 2017 at 14:44, ***@open-mpi.org wrote:

>
> You have to understand that we have no way of knowing who is making MPI calls - all we see is
> the proc that we started, and we know someone of that rank is running (but we have no way of
> knowing which of the procs you sub-spawned it is).
>
> So the behavior you are seeking only occurred in some earlier release by sheer accident. Nor will
> you find it portable as there is no specification directing that behavior.
>
> The behavior IŽve provided is to either deliver the signal to _all_ child processes (including
> grandchildren etc.), or _only_ the immediate child of the daemon. It wonŽt do what you describe -
> kill the mPI proc underneath the shell, but not the shell itself.
>
> What you can eventually do is use PMIx to ask the runtime to selectively deliver signals to
> pid/procs for you. We donŽt have that capability implemented just yet, IŽm afraid.
>
> Meantime, when I get a chance, I can code an option that will record the pid of the subproc that
> calls MPI_Init, and then letŽs you deliver signals to just that proc. No promises as to when that will
> be done.
>
>
> On Jun 15, 2017, at 1:37 PM, Ted Sussman <***@adina.com> wrote:
>
> Hello Ralph,
>
> I am just an Open MPI end user, so I will need to wait for the next official release.
>
> mpirun --> shell for process 0 -->  executable for process 0 --> MPI calls
>        --> shell for process 1 -->  executable for process 1 --> MPI calls
>                                 ...
>
> I guess the question is, should MPI_ABORT kill the executables or the shells?  I naively
> thought, that, since it is the executables that make the MPI calls, it is the executables that
> should be aborted by the call to MPI_ABORT.  Since the shells don't make MPI calls, the
> shells should not be aborted.
>
> And users might have several layers of shells in between mpirun and the executable.
>
> So now I will look for the latest version of Open MPI that has the 1.4.3 behavior.
>
> Sincerely,
>
> Ted Sussman
>
> On 15 Jun 2017 at 12:31, ***@open-mpi.org wrote:
>
> >
> > Yeah, things jittered a little there as we debated the "right" behavior. Generally, when we
> see that
> > happening it means that a param is required, but somehow we never reached that point.
> >
> > See if https://github.com/open-mpi/ompi/pull/3704  helps - if so, I can schedule it for the next
> 2.x
> > release if the RMs agree to take it
> >
> > Ralph
> >
> >     On Jun 15, 2017, at 12:20 PM, Ted Sussman <***@adina.com > wrote:
> >
> >     Thank you for your comments.
> >    
> >     Our application relies upon "dum.sh" to clean up after the process exits, either if the
> process
> >     exits normally, or if the process exits abnormally because of MPI_ABORT.  If the process
> >     group is killed by MPI_ABORT, this clean up will not be performed.  If exec is used to launch
> >     the executable from dum.sh, then dum.sh is terminated by the exec, so dum.sh cannot
> >     perform any clean up.
> >    
> >     I suppose that other user applications might work similarly, so it would be good to have an
> >     MCA parameter to control the behavior of MPI_ABORT.
> >    
> >     We could rewrite our shell script that invokes mpirun, so that the cleanup that is now done
> >     by
> >     dum.sh is done by the invoking shell script after mpirun exits.  Perhaps this technique is the
> >     preferred way to clean up after mpirun is invoked.
> >    
> >     By the way, I have also tested with Open MPI 1.10.7, and Open MPI 1.10.7 has different
> >     behavior than either Open MPI 1.4.3 or Open MPI 2.1.1.  In this explanation, it is important to
> >     know that the aborttest executable sleeps for 20 sec.
> >    
> >     When running example 2:
> >    
> >     1.4.3: process 1 immediately aborts
> >     1.10.7: process 1 doesn't abort and never stops.
> >     2.1.1 process 1 doesn't abort, but stops after it is finished sleeping
> >    
> >     Sincerely,
> >    
> >     Ted Sussman
> >    
> >     On 15 Jun 2017 at 9:18, ***@open-mpi.org wrote:
> >
> >     Here is how the system is working:
> >    
> >     Master: each process is put into its own process group upon launch. When we issue a
> >     "kill", however, we only issue it to the individual process (instead of the process group
> >     that is headed by that child process). This is probably a bug as I donŽt believe that is
> >     what we intended, but set that aside for now.
> >    
> >     2.x: each process is put into its own process group upon launch. When we issue a
> >     "kill", we issue it to the process group. Thus, every child proc of that child proc will
> >     receive it. IIRC, this was the intended behavior.
> >    
> >     It is rather trivial to make the change (it only involves 3 lines of code), but IŽm not sure
> >     of what our intended behavior is supposed to be. Once we clarify that, it is also trivial
> >     to add another MCA param (you can never have too many!) to allow you to select the
> >     other behavior.
> >    
> >
> >     On Jun 15, 2017, at 5:23 AM, Ted Sussman <***@adina.com > wrote:
> >    
> >     Hello Gilles,
> >    
> >     Thank you for your quick answer.  I confirm that if exec is used, both processes
> >     immediately
> >     abort.
> >    
> >     Now suppose that the line
> >    
> >     echo "After aborttest:
> >     OMPI_COMM_WORLD_RANK="$OMPI_COMM_WORLD_RANK
> >    
> >     is added to the end of dum.sh.
> >    
> >     If Example 2 is run with Open MPI 1.4.3, the output is
> >    
> >     After aborttest: OMPI_COMM_WORLD_RANK=0
> >    
> >     which shows that the shell script for the process with rank 0 continues after the
> >     abort,
> >     but that the shell script for the process with rank 1 does not continue after the
> >     abort.
> >    
> >     If Example 2 is run with Open MPI 2.1.1, with exec used to invoke
> >     aborttest02.exe, then
> >     there is no such output, which shows that both shell scripts do not continue after
> >     the abort.
> >    
> >     I prefer the Open MPI 1.4.3 behavior because our original application depends
> >     upon the
> >     Open MPI 1.4.3 behavior.  (Our original application will also work if both
> >     executables are
> >     aborted, and if both shell scripts continue after the abort.)
> >    
> >     It might be too much to expect, but is there a way to recover the Open MPI 1.4.3
> >     behavior
> >     using Open MPI 2.1.1?  
> >    
> >     Sincerely,
> >    
> >     Ted Sussman
> >    
> >    
> >     On 15 Jun 2017 at 9:50, Gilles Gouaillardet wrote:
> >
> >     Ted,
> >    
> >    
> >     fwiw, the 'master' branch has the behavior you expect.
> >    
> >    
> >     meanwhile, you can simple edit your 'dum.sh' script and replace
> >    
> >     /home/buildadina/src/aborttest02/aborttest02.exe
> >    
> >     with
> >    
> >     exec /home/buildadina/src/aborttest02/aborttest02.exe
> >    
> >    
> >     Cheers,
> >    
> >    
> >     Gilles
> >    
> >    
> >     On 6/15/2017 3:01 AM, Ted Sussman wrote:
> >     Hello,
> >    
> >     My question concerns MPI_ABORT, indirect execution of
> >     executables by mpirun and Open
> >     MPI 2.1.1.  When mpirun runs executables directly, MPI_ABORT
> >     works as expected, but
> >     when mpirun runs executables indirectly, MPI_ABORT does not
> >     work as expected.
> >    
> >     If Open MPI 1.4.3 is used instead of Open MPI 2.1.1, MPI_ABORT
> >     works as expected in all
> >     cases.
> >    
> >     The examples given below have been simplified as far as possible
> >     to show the issues.
> >    
> >     ---
> >    
> >     Example 1
> >    
> >     Consider an MPI job run in the following way:
> >    
> >     mpirun ... -app addmpw1
> >    
> >     where the appfile addmpw1 lists two executables:
> >    
> >     -n 1 -host gulftown ... aborttest02.exe
> >     -n 1 -host gulftown ... aborttest02.exe
> >    
> >     The two executables are executed on the local node gulftown.
> >      aborttest02 calls MPI_ABORT
> >     for rank 0, then sleeps.
> >    
> >     The above MPI job runs as expected.  Both processes immediately
> >     abort when rank 0 calls
> >     MPI_ABORT.
> >    
> >     ---
> >    
> >     Example 2
> >    
> >     Now change the above example as follows:
> >    
> >     mpirun ... -app addmpw2
> >    
> >     where the appfile addmpw2 lists shell scripts:
> >    
> >     -n 1 -host gulftown ... dum.sh
> >     -n 1 -host gulftown ... dum.sh
> >    
> >     dum.sh invokes aborttest02.exe.  So aborttest02.exe is executed
> >     indirectly by mpirun.
> >    
> >     In this case, the MPI job only aborts process 0 when rank 0 calls
> >     MPI_ABORT.  Process 1
> >     continues to run.  This behavior is unexpected.
> >    
> >     ----
> >    
> >     I have attached all files to this E-mail.  Since there are absolute
> >     pathnames in the files, to
> >     reproduce my findings, you will need to update the pathnames in the
> >     appfiles and shell
> >     scripts.  To run example 1,
> >    
> >     sh run1.sh
> >    
> >     and to run example 2,
> >    
> >     sh run2.sh
> >    
> >     ---
> >    
> >     I have tested these examples with Open MPI 1.4.3 and 2.0.3.  In
> >     Open MPI 1.4.3, both
> >     examples work as expected.  Open MPI 2.0.3 has the same behavior
> >     as Open MPI 2.1.1.
> >    
> >     ---
> >    
> >     I would prefer that Open MPI 2.1.1 aborts both processes, even
> >     when the executables are
> >     invoked indirectly by mpirun.  If there is an MCA setting that is
> >     needed to make Open MPI
> >     2.1.1 abort both processes, please let me know.
> >    
> >    
> >     Sincerely,
> >    
> >     Theodore Sussman
> >    
> >    
> >     The following section of this message contains a file attachment
> >     prepared for transmission using the Internet MIME message format.
> >     If you are using Pegasus Mail, or any other MIME-compliant system,
> >     you should be able to save it or view it from within your mailer.
> >     If you cannot, please ask your system administrator for assistance.
> >    
> >       ---- File information -----------
> >         File:  config.log.bz2
> >         Date:  14 Jun 2017, 13:35
> >         Size:  146548 bytes.
> >         Type:  Binary
> >    
> >    
> >     The following section of this message contains a file attachment
> >     prepared for transmission using the Internet MIME message format.
> >     If you are using Pegasus Mail, or any other MIME-compliant system,
> >     you should be able to save it or view it from within your mailer.
> >     If you cannot, please ask your system administrator for assistance.
> >    
> >       ---- File information -----------
> >         File:  ompi_info.bz2
> >         Date:  14 Jun 2017, 13:35
> >         Size:  24088 bytes.
> >         Type:  Binary
> >    
> >    
> >     The following section of this message contains a file attachment
> >     prepared for transmission using the Internet MIME message format.
> >     If you are using Pegasus Mail, or any other MIME-compliant system,
> >     you should be able to save it or view it from within your mailer.
> >     If you cannot, please ask your system administrator for assistance.
> >    
> >       ---- File information -----------
> >         File:  aborttest02.tgz
> >         Date:  14 Jun 2017, 13:52
> >         Size:  4285 bytes.
> >         Type:  Binary
> >    
> >    
> >     _______________________________________________
> >     users mailing list
> >     ***@lists.open-mpi.org
> >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >    
> >     _______________________________________________
> >     users mailing list
> >     ***@lists.open-mpi.org
> >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >    
> >    
> >    
> >     _______________________________________________
> >     users mailing list
> >     ***@lists.open-mpi.org
> >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >    
> >     _______________________________________________
> >     users mailing list
> >     ***@lists.open-mpi.org
> >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >    
> >    
> >    
> >     _______________________________________________
> >     users mailing list
> >     ***@lists.open-mpi.org
> >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
>
>   
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
Jeff Squyres (jsquyres)
2017-06-16 16:35:03 UTC
Permalink
Ted --

Sorry for jumping in late. Here's my $0.02...

In the runtime, we can do 4 things:

1. Kill just the process that we forked.
2. Kill just the process(es) that call back and identify themselves as MPI processes (we don't track this right now, but we could add that functionality).
3. Union of #1 and #2.
4. Kill all processes (to include any intermediate processes that are not included in #1 and #2).

In Open MPI 2.x, #4 is the intended behavior. There may be a bug or two that needs to get fixed (e.g., in your last mail, I don't see offhand why it waits until the MPI process finishes sleeping), but we should be killing the process group, which -- unless any of the descendant processes have explicitly left the process group -- should hit the entire process tree.

Sidenote: there's actually a way to be a bit more aggressive and do a better job of ensuring that we kill *all* processes (via creative use of PR_SET_CHILD_SUBREAPER), but that's basically a future enhancement / optimization.

I think Gilles and Ralph proposed a good point to you: if you want to be sure to be able to do cleanup after an MPI process terminates (normally or abnormally), you should trap signals in your intermediate processes to catch what Open MPI's runtime throws and therefore know that it is time to cleanup.

Hypothetically, this should work in all versions of Open MPI...?

I think Ralph made a pull request that adds an MCA param to change the default behavior from #4 to #1.

Note, however, that there's a little time between when Open MPI sends the SIGTERM and the SIGKILL, so this solution could be racy. If you find that you're running out of time to cleanup, we might be able to make the delay between the SIGTERM and SIGKILL be configurable (e.g., via MCA param).




> On Jun 16, 2017, at 10:08 AM, Ted Sussman <***@adina.com> wrote:
>
> Hello Gilles and Ralph,
>
> Thank you for your advice so far. I appreciate the time that you have spent to educate me about the details of Open MPI.
>
> But I think that there is something fundamental that I don't understand. Consider Example 2 run with Open MPI 2.1.1.
>
> mpirun --> shell for process 0 --> executable for process 0 --> MPI calls, MPI_Abort
> --> shell for process 1 --> executable for process 1 --> MPI calls
>
> After the MPI_Abort is called, ps shows that both shells are running, and that the executable for process 1 is running (in this case, process 1 is sleeping). And mpirun does not exit until process 1 is finished sleeping.
>
> I cannot reconcile this observed behavior with the statement
>
> > > 2.x: each process is put into its own process group upon launch. When we issue a
> > > "kill", we issue it to the process group. Thus, every child proc of that child proc will
> > > receive it. IIRC, this was the intended behavior.
>
> I assume that, for my example, there are two process groups. The process group for process 0 contains the shell for process 0 and the executable for process 0; and the process group for process 1 contains the shell for process 1 and the executable for process 1. So what does MPI_ABORT do? MPI_ABORT does not kill the process group for process 0, since the shell for process 0 continues. And MPI_ABORT does not kill the process group for process 1, since both the shell and executable for process 1 continue.
>
> If I hit Ctrl-C after MPI_Abort is called, I get the message
>
> mpirun: abort is already in progress.. hit ctrl-c again to forcibly terminate
>
> but I don't need to hit Ctrl-C again because mpirun immediately exits.
>
> Can you shed some light on all of this?
>
> Sincerely,
>
> Ted Sussman
>
>
> On 15 Jun 2017 at 14:44, ***@open-mpi.org wrote:
>
> >
> > You have to understand that we have no way of knowing who is making MPI calls - all we see is
> > the proc that we started, and we know someone of that rank is running (but we have no way of
> > knowing which of the procs you sub-spawned it is).
> >
> > So the behavior you are seeking only occurred in some earlier release by sheer accident. Nor will
> > you find it portable as there is no specification directing that behavior.
> >
> > The behavior I’ve provided is to either deliver the signal to _all_ child processes (including
> > grandchildren etc.), or _only_ the immediate child of the daemon. It won’t do what you describe -
> > kill the mPI proc underneath the shell, but not the shell itself.
> >
> > What you can eventually do is use PMIx to ask the runtime to selectively deliver signals to
> > pid/procs for you. We don’t have that capability implemented just yet, I’m afraid.
> >
> > Meantime, when I get a chance, I can code an option that will record the pid of the subproc that
> > calls MPI_Init, and then let’s you deliver signals to just that proc. No promises as to when that will
> > be done.
> >
> >
> > On Jun 15, 2017, at 1:37 PM, Ted Sussman <***@adina.com> wrote:
> >
> > Hello Ralph,
> >
> > I am just an Open MPI end user, so I will need to wait for the next official release.
> >
> > mpirun --> shell for process 0 --> executable for process 0 --> MPI calls
> > --> shell for process 1 --> executable for process 1 --> MPI calls
> > ...
> >
> > I guess the question is, should MPI_ABORT kill the executables or the shells? I naively
> > thought, that, since it is the executables that make the MPI calls, it is the executables that
> > should be aborted by the call to MPI_ABORT. Since the shells don't make MPI calls, the
> > shells should not be aborted.
> >
> > And users might have several layers of shells in between mpirun and the executable.
> >
> > So now I will look for the latest version of Open MPI that has the 1.4.3 behavior.
> >
> > Sincerely,
> >
> > Ted Sussman
> >
> > On 15 Jun 2017 at 12:31, ***@open-mpi.org wrote:
> >
> > >
> > > Yeah, things jittered a little there as we debated the “right” behavior. Generally, when we
> > see that
> > > happening it means that a param is required, but somehow we never reached that point.
> > >
> > > See if https://github.com/open-mpi/ompi/pull/3704 helps - if so, I can schedule it for the next
> > 2.x
> > > release if the RMs agree to take it
> > >
> > > Ralph
> > >
> > > On Jun 15, 2017, at 12:20 PM, Ted Sussman <***@adina.com > wrote:
> > >
> > > Thank you for your comments.
> > >
> > > Our application relies upon "dum.sh" to clean up after the process exits, either if the
> > process
> > > exits normally, or if the process exits abnormally because of MPI_ABORT. If the process
> > > group is killed by MPI_ABORT, this clean up will not be performed. If exec is used to launch
> > > the executable from dum.sh, then dum.sh is terminated by the exec, so dum.sh cannot
> > > perform any clean up.
> > >
> > > I suppose that other user applications might work similarly, so it would be good to have an
> > > MCA parameter to control the behavior of MPI_ABORT.
> > >
> > > We could rewrite our shell script that invokes mpirun, so that the cleanup that is now done
> > > by
> > > dum.sh is done by the invoking shell script after mpirun exits. Perhaps this technique is the
> > > preferred way to clean up after mpirun is invoked.
> > >
> > > By the way, I have also tested with Open MPI 1.10.7, and Open MPI 1.10.7 has different
> > > behavior than either Open MPI 1.4.3 or Open MPI 2.1.1. In this explanation, it is important to
> > > know that the aborttest executable sleeps for 20 sec.
> > >
> > > When running example 2:
> > >
> > > 1.4.3: process 1 immediately aborts
> > > 1.10.7: process 1 doesn't abort and never stops.
> > > 2.1.1 process 1 doesn't abort, but stops after it is finished sleeping
> > >
> > > Sincerely,
> > >
> > > Ted Sussman
> > >
> > > On 15 Jun 2017 at 9:18, ***@open-mpi.org wrote:
> > >
> > > Here is how the system is working:
> > >
> > > Master: each process is put into its own process group upon launch. When we issue a
> > > "kill", however, we only issue it to the individual process (instead of the process group
> > > that is headed by that child process). This is probably a bug as I don´t believe that is
> > > what we intended, but set that aside for now.
> > >
> > > 2.x: each process is put into its own process group upon launch. When we issue a
> > > "kill", we issue it to the process group. Thus, every child proc of that child proc will
> > > receive it. IIRC, this was the intended behavior.
> > >
> > > It is rather trivial to make the change (it only involves 3 lines of code), but I´m not sure
> > > of what our intended behavior is supposed to be. Once we clarify that, it is also trivial
> > > to add another MCA param (you can never have too many!) to allow you to select the
> > > other behavior.
> > >
> > >
> > > On Jun 15, 2017, at 5:23 AM, Ted Sussman <***@adina.com > wrote:
> > >
> > > Hello Gilles,
> > >
> > > Thank you for your quick answer. I confirm that if exec is used, both processes
> > > immediately
> > > abort.
> > >
> > > Now suppose that the line
> > >
> > > echo "After aborttest:
> > > OMPI_COMM_WORLD_RANK="$OMPI_COMM_WORLD_RANK
> > >
> > > is added to the end of dum.sh.
> > >
> > > If Example 2 is run with Open MPI 1.4.3, the output is
> > >
> > > After aborttest: OMPI_COMM_WORLD_RANK=0
> > >
> > > which shows that the shell script for the process with rank 0 continues after the
> > > abort,
> > > but that the shell script for the process with rank 1 does not continue after the
> > > abort.
> > >
> > > If Example 2 is run with Open MPI 2.1.1, with exec used to invoke
> > > aborttest02.exe, then
> > > there is no such output, which shows that both shell scripts do not continue after
> > > the abort.
> > >
> > > I prefer the Open MPI 1.4.3 behavior because our original application depends
> > > upon the
> > > Open MPI 1.4.3 behavior. (Our original application will also work if both
> > > executables are
> > > aborted, and if both shell scripts continue after the abort.)
> > >
> > > It might be too much to expect, but is there a way to recover the Open MPI 1.4.3
> > > behavior
> > > using Open MPI 2.1.1?
> > >
> > > Sincerely,
> > >
> > > Ted Sussman
> > >
> > >
> > > On 15 Jun 2017 at 9:50, Gilles Gouaillardet wrote:
> > >
> > > Ted,
> > >
> > >
> > > fwiw, the 'master' branch has the behavior you expect.
> > >
> > >
> > > meanwhile, you can simple edit your 'dum.sh' script and replace
> > >
> > > /home/buildadina/src/aborttest02/aborttest02.exe
> > >
> > > with
> > >
> > > exec /home/buildadina/src/aborttest02/aborttest02.exe
> > >
> > >
> > > Cheers,
> > >
> > >
> > > Gilles
> > >
> > >
> > > On 6/15/2017 3:01 AM, Ted Sussman wrote:
> > > Hello,
> > >
> > > My question concerns MPI_ABORT, indirect execution of
> > > executables by mpirun and Open
> > > MPI 2.1.1. When mpirun runs executables directly, MPI_ABORT
> > > works as expected, but
> > > when mpirun runs executables indirectly, MPI_ABORT does not
> > > work as expected.
> > >
> > > If Open MPI 1.4.3 is used instead of Open MPI 2.1.1, MPI_ABORT
> > > works as expected in all
> > > cases.
> > >
> > > The examples given below have been simplified as far as possible
> > > to show the issues.
> > >
> > > ---
> > >
> > > Example 1
> > >
> > > Consider an MPI job run in the following way:
> > >
> > > mpirun ... -app addmpw1
> > >
> > > where the appfile addmpw1 lists two executables:
> > >
> > > -n 1 -host gulftown ... aborttest02.exe
> > > -n 1 -host gulftown ... aborttest02.exe
> > >
> > > The two executables are executed on the local node gulftown.
> > > aborttest02 calls MPI_ABORT
> > > for rank 0, then sleeps.
> > >
> > > The above MPI job runs as expected. Both processes immediately
> > > abort when rank 0 calls
> > > MPI_ABORT.
> > >
> > > ---
> > >
> > > Example 2
> > >
> > > Now change the above example as follows:
> > >
> > > mpirun ... -app addmpw2
> > >
> > > where the appfile addmpw2 lists shell scripts:
> > >
> > > -n 1 -host gulftown ... dum.sh
> > > -n 1 -host gulftown ... dum.sh
> > >
> > > dum.sh invokes aborttest02.exe. So aborttest02.exe is executed
> > > indirectly by mpirun.
> > >
> > > In this case, the MPI job only aborts process 0 when rank 0 calls
> > > MPI_ABORT. Process 1
> > > continues to run. This behavior is unexpected.
> > >
> > > ----
> > >
> > > I have attached all files to this E-mail. Since there are absolute
> > > pathnames in the files, to
> > > reproduce my findings, you will need to update the pathnames in the
> > > appfiles and shell
> > > scripts. To run example 1,
> > >
> > > sh run1.sh
> > >
> > > and to run example 2,
> > >
> > > sh run2.sh
> > >
> > > ---
> > >
> > > I have tested these examples with Open MPI 1.4.3 and 2.0.3. In
> > > Open MPI 1.4.3, both
> > > examples work as expected. Open MPI 2.0.3 has the same behavior
> > > as Open MPI 2.1.1.
> > >
> > > ---
> > >
> > > I would prefer that Open MPI 2.1.1 aborts both processes, even
> > > when the executables are
> > > invoked indirectly by mpirun. If there is an MCA setting that is
> > > needed to make Open MPI
> > > 2.1.1 abort both processes, please let me know.
> > >
> > >
> > > Sincerely,
> > >
> > > Theodore Sussman
> > >
> > >
> > > The following section of this message contains a file attachment
> > > prepared for transmission using the Internet MIME message format.
> > > If you are using Pegasus Mail, or any other MIME-compliant system,
> > > you should be able to save it or view it from within your mailer.
> > > If you cannot, please ask your system administrator for assistance.
> > >
> > > ---- File information -----------
> > > File: config.log.bz2
> > > Date: 14 Jun 2017, 13:35
> > > Size: 146548 bytes.
> > > Type: Binary
> > >
> > >
> > > The following section of this message contains a file attachment
> > > prepared for transmission using the Internet MIME message format.
> > > If you are using Pegasus Mail, or any other MIME-compliant system,
> > > you should be able to save it or view it from within your mailer.
> > > If you cannot, please ask your system administrator for assistance.
> > >
> > > ---- File information -----------
> > > File: ompi_info.bz2
> > > Date: 14 Jun 2017, 13:35
> > > Size: 24088 bytes.
> > > Type: Binary
> > >
> > >
> > > The following section of this message contains a file attachment
> > > prepared for transmission using the Internet MIME message format.
> > > If you are using Pegasus Mail, or any other MIME-compliant system,
> > > you should be able to save it or view it from within your mailer.
> > > If you cannot, please ask your system administrator for assistance.
> > >
> > > ---- File information -----------
> > > File: aborttest02.tgz
> > > Date: 14 Jun 2017, 13:52
> > > Size: 4285 bytes.
> > > Type: Binary
> > >
> > >
> > > _______________________________________________
> > > users mailing list
> > > ***@lists.open-mpi.org
> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > >
> > > _______________________________________________
> > > users mailing list
> > > ***@lists.open-mpi.org
> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > >
> > >
> > >
> > > _______________________________________________
> > > users mailing list
> > > ***@lists.open-mpi.org
> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > >
> > > _______________________________________________
> > > users mailing list
> > > ***@lists.open-mpi.org
> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > >
> > >
> > >
> > > _______________________________________________
> > > users mailing list
> > > ***@lists.open-mpi.org
> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > >
> >
> >
> > _______________________________________________
> > users mailing list
> > ***@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
>
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users


--
Jeff Squyres
***@cisco.com
Ted Sussman
2017-06-16 17:33:50 UTC
Permalink
Hello Jeff,

Thanks for your comments.

I am not seeing behavior #4, on the two computers that I have tested on, using Open MPI
2.1.1.

I wonder if you can duplicate my results with the files that I have uploaded.

Regarding what is the "correct" behavior, I am willing to modify my application to correspond
to Open MPI's behavior (whatever behavior the Open MPI developers decide is best) --
provided that Open MPI does in fact kill off both shells.

So my highest priority now is to find out why Open MPI 2.1.1 does not kill off both shells on
my computer.

Sincerely,

Ted Sussman

On 16 Jun 2017 at 16:35, Jeff Squyres (jsquyres) wrote:

> Ted --
>
> Sorry for jumping in late. Here's my $0.02...
>
> In the runtime, we can do 4 things:
>
> 1. Kill just the process that we forked.
> 2. Kill just the process(es) that call back and identify themselves as MPI processes (we don't track this right now, but we could add that functionality).
> 3. Union of #1 and #2.
> 4. Kill all processes (to include any intermediate processes that are not included in #1 and #2).
>
> In Open MPI 2.x, #4 is the intended behavior. There may be a bug or two that needs to get fixed (e.g., in your last mail, I don't see offhand why it waits until the MPI process finishes sleeping), but we should be killing the process group, which -- unless any of the descendant processes have explicitly left the process group -- should hit the entire process tree.
>
> Sidenote: there's actually a way to be a bit more aggressive and do a better job of ensuring that we kill *all* processes (via creative use of PR_SET_CHILD_SUBREAPER), but that's basically a future enhancement / optimization.
>
> I think Gilles and Ralph proposed a good point to you: if you want to be sure to be able to do cleanup after an MPI process terminates (normally or abnormally), you should trap signals in your intermediate processes to catch what Open MPI's runtime throws and therefore know that it is time to cleanup.
>
> Hypothetically, this should work in all versions of Open MPI...?
>
> I think Ralph made a pull request that adds an MCA param to change the default behavior from #4 to #1.
>
> Note, however, that there's a little time between when Open MPI sends the SIGTERM and the SIGKILL, so this solution could be racy. If you find that you're running out of time to cleanup, we might be able to make the delay between the SIGTERM and SIGKILL be configurable (e.g., via MCA param).
>
>
>
>
> > On Jun 16, 2017, at 10:08 AM, Ted Sussman <***@adina.com> wrote:
> >
> > Hello Gilles and Ralph,
> >
> > Thank you for your advice so far. I appreciate the time that you have spent to educate me about the details of Open MPI.
> >
> > But I think that there is something fundamental that I don't understand. Consider Example 2 run with Open MPI 2.1.1.
> >
> > mpirun --> shell for process 0 --> executable for process 0 --> MPI calls, MPI_Abort
> > --> shell for process 1 --> executable for process 1 --> MPI calls
> >
> > After the MPI_Abort is called, ps shows that both shells are running, and that the executable for process 1 is running (in this case, process 1 is sleeping). And mpirun does not exit until process 1 is finished sleeping.
> >
> > I cannot reconcile this observed behavior with the statement
> >
> > > > 2.x: each process is put into its own process group upon launch. When we issue a
> > > > "kill", we issue it to the process group. Thus, every child proc of that child proc will
> > > > receive it. IIRC, this was the intended behavior.
> >
> > I assume that, for my example, there are two process groups. The process group for process 0 contains the shell for process 0 and the executable for process 0; and the process group for process 1 contains the shell for process 1 and the executable for process 1. So what does MPI_ABORT do? MPI_ABORT does not kill the process group for process 0, since the shell for process 0 continues. And MPI_ABORT does not kill the process group for process 1, since both the shell and executable for process 1 continue.
> >
> > If I hit Ctrl-C after MPI_Abort is called, I get the message
> >
> > mpirun: abort is already in progress.. hit ctrl-c again to forcibly terminate
> >
> > but I don't need to hit Ctrl-C again because mpirun immediately exits.
> >
> > Can you shed some light on all of this?
> >
> > Sincerely,
> >
> > Ted Sussman
> >
> >
> > On 15 Jun 2017 at 14:44, ***@open-mpi.org wrote:
> >
> > >
> > > You have to understand that we have no way of knowing who is making MPI calls - all we see is
> > > the proc that we started, and we know someone of that rank is running (but we have no way of
> > > knowing which of the procs you sub-spawned it is).
> > >
> > > So the behavior you are seeking only occurred in some earlier release by sheer accident. Nor will
> > > you find it portable as there is no specification directing that behavior.
> > >
> > > The behavior I´ve provided is to either deliver the signal to _all_ child processes (including
> > > grandchildren etc.), or _only_ the immediate child of the daemon. It won´t do what you describe -
> > > kill the mPI proc underneath the shell, but not the shell itself.
> > >
> > > What you can eventually do is use PMIx to ask the runtime to selectively deliver signals to
> > > pid/procs for you. We don´t have that capability implemented just yet, I´m afraid.
> > >
> > > Meantime, when I get a chance, I can code an option that will record the pid of the subproc that
> > > calls MPI_Init, and then let´s you deliver signals to just that proc. No promises as to when that will
> > > be done.
> > >
> > >
> > > On Jun 15, 2017, at 1:37 PM, Ted Sussman <***@adina.com> wrote:
> > >
> > > Hello Ralph,
> > >
> > > I am just an Open MPI end user, so I will need to wait for the next official release.
> > >
> > > mpirun --> shell for process 0 --> executable for process 0 --> MPI calls
> > > --> shell for process 1 --> executable for process 1 --> MPI calls
> > > ...
> > >
> > > I guess the question is, should MPI_ABORT kill the executables or the shells? I naively
> > > thought, that, since it is the executables that make the MPI calls, it is the executables that
> > > should be aborted by the call to MPI_ABORT. Since the shells don't make MPI calls, the
> > > shells should not be aborted.
> > >
> > > And users might have several layers of shells in between mpirun and the executable.
> > >
> > > So now I will look for the latest version of Open MPI that has the 1.4.3 behavior.
> > >
> > > Sincerely,
> > >
> > > Ted Sussman
> > >
> > > On 15 Jun 2017 at 12:31, ***@open-mpi.org wrote:
> > >
> > > >
> > > > Yeah, things jittered a little there as we debated the "right" behavior. Generally, when we
> > > see that
> > > > happening it means that a param is required, but somehow we never reached that point.
> > > >
> > > > See if https://github.com/open-mpi/ompi/pull/3704 helps - if so, I can schedule it for the next
> > > 2.x
> > > > release if the RMs agree to take it
> > > >
> > > > Ralph
> > > >
> > > > On Jun 15, 2017, at 12:20 PM, Ted Sussman <***@adina.com > wrote:
> > > >
> > > > Thank you for your comments.
> > > >
> > > > Our application relies upon "dum.sh" to clean up after the process exits, either if the
> > > process
> > > > exits normally, or if the process exits abnormally because of MPI_ABORT. If the process
> > > > group is killed by MPI_ABORT, this clean up will not be performed. If exec is used to launch
> > > > the executable from dum.sh, then dum.sh is terminated by the exec, so dum.sh cannot
> > > > perform any clean up.
> > > >
> > > > I suppose that other user applications might work similarly, so it would be good to have an
> > > > MCA parameter to control the behavior of MPI_ABORT.
> > > >
> > > > We could rewrite our shell script that invokes mpirun, so that the cleanup that is now done
> > > > by
> > > > dum.sh is done by the invoking shell script after mpirun exits. Perhaps this technique is the
> > > > preferred way to clean up after mpirun is invoked.
> > > >
> > > > By the way, I have also tested with Open MPI 1.10.7, and Open MPI 1.10.7 has different
> > > > behavior than either Open MPI 1.4.3 or Open MPI 2.1.1. In this explanation, it is important to
> > > > know that the aborttest executable sleeps for 20 sec.
> > > >
> > > > When running example 2:
> > > >
> > > > 1.4.3: process 1 immediately aborts
> > > > 1.10.7: process 1 doesn't abort and never stops.
> > > > 2.1.1 process 1 doesn't abort, but stops after it is finished sleeping
> > > >
> > > > Sincerely,
> > > >
> > > > Ted Sussman
> > > >
> > > > On 15 Jun 2017 at 9:18, ***@open-mpi.org wrote:
> > > >
> > > > Here is how the system is working:
> > > >
> > > > Master: each process is put into its own process group upon launch. When we issue a
> > > > "kill", however, we only issue it to the individual process (instead of the process group
> > > > that is headed by that child process). This is probably a bug as I don´t believe that is
> > > > what we intended, but set that aside for now.
> > > >
> > > > 2.x: each process is put into its own process group upon launch. When we issue a
> > > > "kill", we issue it to the process group. Thus, every child proc of that child proc will
> > > > receive it. IIRC, this was the intended behavior.
> > > >
> > > > It is rather trivial to make the change (it only involves 3 lines of code), but I´m not sure
> > > > of what our intended behavior is supposed to be. Once we clarify that, it is also trivial
> > > > to add another MCA param (you can never have too many!) to allow you to select the
> > > > other behavior.
> > > >
> > > >
> > > > On Jun 15, 2017, at 5:23 AM, Ted Sussman <***@adina.com > wrote:
> > > >
> > > > Hello Gilles,
> > > >
> > > > Thank you for your quick answer. I confirm that if exec is used, both processes
> > > > immediately
> > > > abort.
> > > >
> > > > Now suppose that the line
> > > >
> > > > echo "After aborttest:
> > > > OMPI_COMM_WORLD_RANK="$OMPI_COMM_WORLD_RANK
> > > >
> > > > is added to the end of dum.sh.
> > > >
> > > > If Example 2 is run with Open MPI 1.4.3, the output is
> > > >
> > > > After aborttest: OMPI_COMM_WORLD_RANK=0
> > > >
> > > > which shows that the shell script for the process with rank 0 continues after the
> > > > abort,
> > > > but that the shell script for the process with rank 1 does not continue after the
> > > > abort.
> > > >
> > > > If Example 2 is run with Open MPI 2.1.1, with exec used to invoke
> > > > aborttest02.exe, then
> > > > there is no such output, which shows that both shell scripts do not continue after
> > > > the abort.
> > > >
> > > > I prefer the Open MPI 1.4.3 behavior because our original application depends
> > > > upon the
> > > > Open MPI 1.4.3 behavior. (Our original application will also work if both
> > > > executables are
> > > > aborted, and if both shell scripts continue after the abort.)
> > > >
> > > > It might be too much to expect, but is there a way to recover the Open MPI 1.4.3
> > > > behavior
> > > > using Open MPI 2.1.1?
> > > >
> > > > Sincerely,
> > > >
> > > > Ted Sussman
> > > >
> > > >
> > > > On 15 Jun 2017 at 9:50, Gilles Gouaillardet wrote:
> > > >
> > > > Ted,
> > > >
> > > >
> > > > fwiw, the 'master' branch has the behavior you expect.
> > > >
> > > >
> > > > meanwhile, you can simple edit your 'dum.sh' script and replace
> > > >
> > > > /home/buildadina/src/aborttest02/aborttest02.exe
> > > >
> > > > with
> > > >
> > > > exec /home/buildadina/src/aborttest02/aborttest02.exe
> > > >
> > > >
> > > > Cheers,
> > > >
> > > >
> > > > Gilles
> > > >
> > > >
> > > > On 6/15/2017 3:01 AM, Ted Sussman wrote:
> > > > Hello,
> > > >
> > > > My question concerns MPI_ABORT, indirect execution of
> > > > executables by mpirun and Open
> > > > MPI 2.1.1. When mpirun runs executables directly, MPI_ABORT
> > > > works as expected, but
> > > > when mpirun runs executables indirectly, MPI_ABORT does not
> > > > work as expected.
> > > >
> > > > If Open MPI 1.4.3 is used instead of Open MPI 2.1.1, MPI_ABORT
> > > > works as expected in all
> > > > cases.
> > > >
> > > > The examples given below have been simplified as far as possible
> > > > to show the issues.
> > > >
> > > > ---
> > > >
> > > > Example 1
> > > >
> > > > Consider an MPI job run in the following way:
> > > >
> > > > mpirun ... -app addmpw1
> > > >
> > > > where the appfile addmpw1 lists two executables:
> > > >
> > > > -n 1 -host gulftown ... aborttest02.exe
> > > > -n 1 -host gulftown ... aborttest02.exe
> > > >
> > > > The two executables are executed on the local node gulftown.
> > > > aborttest02 calls MPI_ABORT
> > > > for rank 0, then sleeps.
> > > >
> > > > The above MPI job runs as expected. Both processes immediately
> > > > abort when rank 0 calls
> > > > MPI_ABORT.
> > > >
> > > > ---
> > > >
> > > > Example 2
> > > >
> > > > Now change the above example as follows:
> > > >
> > > > mpirun ... -app addmpw2
> > > >
> > > > where the appfile addmpw2 lists shell scripts:
> > > >
> > > > -n 1 -host gulftown ... dum.sh
> > > > -n 1 -host gulftown ... dum.sh
> > > >
> > > > dum.sh invokes aborttest02.exe. So aborttest02.exe is executed
> > > > indirectly by mpirun.
> > > >
> > > > In this case, the MPI job only aborts process 0 when rank 0 calls
> > > > MPI_ABORT. Process 1
> > > > continues to run. This behavior is unexpected.
> > > >
> > > > ----
> > > >
> > > > I have attached all files to this E-mail. Since there are absolute
> > > > pathnames in the files, to
> > > > reproduce my findings, you will need to update the pathnames in the
> > > > appfiles and shell
> > > > scripts. To run example 1,
> > > >
> > > > sh run1.sh
> > > >
> > > > and to run example 2,
> > > >
> > > > sh run2.sh
> > > >
> > > > ---
> > > >
> > > > I have tested these examples with Open MPI 1.4.3 and 2.0.3. In
> > > > Open MPI 1.4.3, both
> > > > examples work as expected. Open MPI 2.0.3 has the same behavior
> > > > as Open MPI 2.1.1.
> > > >
> > > > ---
> > > >
> > > > I would prefer that Open MPI 2.1.1 aborts both processes, even
> > > > when the executables are
> > > > invoked indirectly by mpirun. If there is an MCA setting that is
> > > > needed to make Open MPI
> > > > 2.1.1 abort both processes, please let me know.
> > > >
> > > >
> > > > Sincerely,
> > > >
> > > > Theodore Sussman
> > > >
> > > >
> > > > The following section of this message contains a file attachment
> > > > prepared for transmission using the Internet MIME message format.
> > > > If you are using Pegasus Mail, or any other MIME-compliant system,
> > > > you should be able to save it or view it from within your mailer.
> > > > If you cannot, please ask your system administrator for assistance.
> > > >
> > > > ---- File information -----------
> > > > File: config.log.bz2
> > > > Date: 14 Jun 2017, 13:35
> > > > Size: 146548 bytes.
> > > > Type: Binary
> > > >
> > > >
> > > > The following section of this message contains a file attachment
> > > > prepared for transmission using the Internet MIME message format.
> > > > If you are using Pegasus Mail, or any other MIME-compliant system,
> > > > you should be able to save it or view it from within your mailer.
> > > > If you cannot, please ask your system administrator for assistance.
> > > >
> > > > ---- File information -----------
> > > > File: ompi_info.bz2
> > > > Date: 14 Jun 2017, 13:35
> > > > Size: 24088 bytes.
> > > > Type: Binary
> > > >
> > > >
> > > > The following section of this message contains a file attachment
> > > > prepared for transmission using the Internet MIME message format.
> > > > If you are using Pegasus Mail, or any other MIME-compliant system,
> > > > you should be able to save it or view it from within your mailer.
> > > > If you cannot, please ask your system administrator for assistance.
> > > >
> > > > ---- File information -----------
> > > > File: aborttest02.tgz
> > > > Date: 14 Jun 2017, 13:52
> > > > Size: 4285 bytes.
> > > > Type: Binary
> > > >
> > > >
> > > > _______________________________________________
> > > > users mailing list
> > > > ***@lists.open-mpi.org
> > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > > >
> > > > _______________________________________________
> > > > users mailing list
> > > > ***@lists.open-mpi.org
> > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > users mailing list
> > > > ***@lists.open-mpi.org
> > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > > >
> > > > _______________________________________________
> > > > users mailing list
> > > > ***@lists.open-mpi.org
> > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > users mailing list
> > > > ***@lists.open-mpi.org
> > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > > >
> > >
> > >
> > > _______________________________________________
> > > users mailing list
> > > ***@lists.open-mpi.org
> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > >
> >
> >
> > _______________________________________________
> > users mailing list
> > ***@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
> --
> Jeff Squyres
> ***@cisco.com
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
g***@rist.or.jp
2017-06-17 03:33:25 UTC
Permalink
Ted,

if you

mpirun --mca odls_base_verbose 10 ...

you will see which processes get killed and how

Best regards,


Gilles

----- Original Message -----
> Hello Jeff,
>
> Thanks for your comments.
>
> I am not seeing behavior #4, on the two computers that I have tested
on, using Open MPI
> 2.1.1.
>
> I wonder if you can duplicate my results with the files that I have
uploaded.
>
> Regarding what is the "correct" behavior, I am willing to modify my
application to correspond
> to Open MPI's behavior (whatever behavior the Open MPI developers
decide is best) --
> provided that Open MPI does in fact kill off both shells.
>
> So my highest priority now is to find out why Open MPI 2.1.1 does not
kill off both shells on
> my computer.
>
> Sincerely,
>
> Ted Sussman
>
> On 16 Jun 2017 at 16:35, Jeff Squyres (jsquyres) wrote:
>
> > Ted --
> >
> > Sorry for jumping in late. Here's my $0.02...
> >
> > In the runtime, we can do 4 things:
> >
> > 1. Kill just the process that we forked.
> > 2. Kill just the process(es) that call back and identify themselves
as MPI processes (we don't track this right now, but we could add that
functionality).
> > 3. Union of #1 and #2.
> > 4. Kill all processes (to include any intermediate processes that
are not included in #1 and #2).
> >
> > In Open MPI 2.x, #4 is the intended behavior. There may be a bug or
two that needs to get fixed (e.g., in your last mail, I don't see
offhand why it waits until the MPI process finishes sleeping), but we
should be killing the process group, which -- unless any of the
descendant processes have explicitly left the process group -- should
hit the entire process tree.
> >
> > Sidenote: there's actually a way to be a bit more aggressive and do
a better job of ensuring that we kill *all* processes (via creative use
of PR_SET_CHILD_SUBREAPER), but that's basically a future enhancement /
optimization.
> >
> > I think Gilles and Ralph proposed a good point to you: if you want
to be sure to be able to do cleanup after an MPI process terminates (
normally or abnormally), you should trap signals in your intermediate
processes to catch what Open MPI's runtime throws and therefore know
that it is time to cleanup.
> >
> > Hypothetically, this should work in all versions of Open MPI...?
> >
> > I think Ralph made a pull request that adds an MCA param to change
the default behavior from #4 to #1.
> >
> > Note, however, that there's a little time between when Open MPI
sends the SIGTERM and the SIGKILL, so this solution could be racy. If
you find that you're running out of time to cleanup, we might be able to
make the delay between the SIGTERM and SIGKILL be configurable (e.g.,
via MCA param).
> >
> >
> >
> >
> > > On Jun 16, 2017, at 10:08 AM, Ted Sussman <***@adina.com>
wrote:
> > >
> > > Hello Gilles and Ralph,
> > >
> > > Thank you for your advice so far. I appreciate the time that you
have spent to educate me about the details of Open MPI.
> > >
> > > But I think that there is something fundamental that I don't
understand. Consider Example 2 run with Open MPI 2.1.1.
> > >
> > > mpirun --> shell for process 0 --> executable for process 0 -->
MPI calls, MPI_Abort
> > > --> shell for process 1 --> executable for process 1 -->
MPI calls
> > >
> > > After the MPI_Abort is called, ps shows that both shells are
running, and that the executable for process 1 is running (in this case,
process 1 is sleeping). And mpirun does not exit until process 1 is
finished sleeping.
> > >
> > > I cannot reconcile this observed behavior with the statement
> > >
> > > > > 2.x: each process is put into its own process group
upon launch. When we issue a
> > > > > "kill", we issue it to the process group. Thus, every
child proc of that child proc will
> > > > > receive it. IIRC, this was the intended behavior.
> > >
> > > I assume that, for my example, there are two process groups. The
process group for process 0 contains the shell for process 0 and the
executable for process 0; and the process group for process 1 contains
the shell for process 1 and the executable for process 1. So what does
MPI_ABORT do? MPI_ABORT does not kill the process group for process 0,
since the shell for process 0 continues. And MPI_ABORT does not kill
the process group for process 1, since both the shell and executable for
process 1 continue.
> > >
> > > If I hit Ctrl-C after MPI_Abort is called, I get the message
> > >
> > > mpirun: abort is already in progress.. hit ctrl-c again to
forcibly terminate
> > >
> > > but I don't need to hit Ctrl-C again because mpirun immediately
exits.
> > >
> > > Can you shed some light on all of this?
> > >
> > > Sincerely,
> > >
> > > Ted Sussman
> > >
> > >
> > > On 15 Jun 2017 at 14:44, ***@open-mpi.org wrote:
> > >
> > > >
> > > > You have to understand that we have no way of knowing who is
making MPI calls - all we see is
> > > > the proc that we started, and we know someone of that rank is
running (but we have no way of
> > > > knowing which of the procs you sub-spawned it is).
> > > >
> > > > So the behavior you are seeking only occurred in some earlier
release by sheer accident. Nor will
> > > > you find it portable as there is no specification directing that
behavior.
> > > >
> > > > The behavior I´ve provided is to either deliver the signal to _
all_ child processes (including
> > > > grandchildren etc.), or _only_ the immediate child of the daemon.
It won´t do what you describe -
> > > > kill the mPI proc underneath the shell, but not the shell itself.
> > > >
> > > > What you can eventually do is use PMIx to ask the runtime to
selectively deliver signals to
> > > > pid/procs for you. We don´t have that capability implemented
just yet, I´m afraid.
> > > >
> > > > Meantime, when I get a chance, I can code an option that will
record the pid of the subproc that
> > > > calls MPI_Init, and then let´s you deliver signals to just that
proc. No promises as to when that will
> > > > be done.
> > > >
> > > >
> > > > On Jun 15, 2017, at 1:37 PM, Ted Sussman <***@adina.
com> wrote:
> > > >
> > > > Hello Ralph,
> > > >
> > > > I am just an Open MPI end user, so I will need to wait for
the next official release.
> > > >
> > > > mpirun --> shell for process 0 --> executable for process 0
--> MPI calls
> > > > --> shell for process 1 --> executable for process 1
--> MPI calls
> > > > ...
> > > >
> > > > I guess the question is, should MPI_ABORT kill the
executables or the shells? I naively
> > > > thought, that, since it is the executables that make the MPI
calls, it is the executables that
> > > > should be aborted by the call to MPI_ABORT. Since the
shells don't make MPI calls, the
> > > > shells should not be aborted.
> > > >
> > > > And users might have several layers of shells in between
mpirun and the executable.
> > > >
> > > > So now I will look for the latest version of Open MPI that
has the 1.4.3 behavior.
> > > >
> > > > Sincerely,
> > > >
> > > > Ted Sussman
> > > >
> > > > On 15 Jun 2017 at 12:31, ***@open-mpi.org wrote:
> > > >
> > > > >
> > > > > Yeah, things jittered a little there as we debated the "
right" behavior. Generally, when we
> > > > see that
> > > > > happening it means that a param is required, but somehow
we never reached that point.
> > > > >
> > > > > See if https://github.com/open-mpi/ompi/pull/3704 helps -
if so, I can schedule it for the next
> > > > 2.x
> > > > > release if the RMs agree to take it
> > > > >
> > > > > Ralph
> > > > >
> > > > > On Jun 15, 2017, at 12:20 PM, Ted Sussman <ted.sussman
@adina.com > wrote:
> > > > >
> > > > > Thank you for your comments.
> > > > >
> > > > > Our application relies upon "dum.sh" to clean up after
the process exits, either if the
> > > > process
> > > > > exits normally, or if the process exits abnormally
because of MPI_ABORT. If the process
> > > > > group is killed by MPI_ABORT, this clean up will not
be performed. If exec is used to launch
> > > > > the executable from dum.sh, then dum.sh is terminated
by the exec, so dum.sh cannot
> > > > > perform any clean up.
> > > > >
> > > > > I suppose that other user applications might work
similarly, so it would be good to have an
> > > > > MCA parameter to control the behavior of MPI_ABORT.
> > > > >
> > > > > We could rewrite our shell script that invokes mpirun,
so that the cleanup that is now done
> > > > > by
> > > > > dum.sh is done by the invoking shell script after
mpirun exits. Perhaps this technique is the
> > > > > preferred way to clean up after mpirun is invoked.
> > > > >
> > > > > By the way, I have also tested with Open MPI 1.10.7,
and Open MPI 1.10.7 has different
> > > > > behavior than either Open MPI 1.4.3 or Open MPI 2.1.1.
In this explanation, it is important to
> > > > > know that the aborttest executable sleeps for 20 sec.
> > > > >
> > > > > When running example 2:
> > > > >
> > > > > 1.4.3: process 1 immediately aborts
> > > > > 1.10.7: process 1 doesn't abort and never stops.
> > > > > 2.1.1 process 1 doesn't abort, but stops after it is
finished sleeping
> > > > >
> > > > > Sincerely,
> > > > >
> > > > > Ted Sussman
> > > > >
> > > > > On 15 Jun 2017 at 9:18, ***@open-mpi.org wrote:
> > > > >
> > > > > Here is how the system is working:
> > > > >
> > > > > Master: each process is put into its own process group
upon launch. When we issue a
> > > > > "kill", however, we only issue it to the individual
process (instead of the process group
> > > > > that is headed by that child process). This is
probably a bug as I don´t believe that is
> > > > > what we intended, but set that aside for now.
> > > > >
> > > > > 2.x: each process is put into its own process group
upon launch. When we issue a
> > > > > "kill", we issue it to the process group. Thus, every
child proc of that child proc will
> > > > > receive it. IIRC, this was the intended behavior.
> > > > >
> > > > > It is rather trivial to make the change (it only
involves 3 lines of code), but I´m not sure
> > > > > of what our intended behavior is supposed to be. Once
we clarify that, it is also trivial
> > > > > to add another MCA param (you can never have too many!)
to allow you to select the
> > > > > other behavior.
> > > > >
> > > > >
> > > > > On Jun 15, 2017, at 5:23 AM, Ted Sussman <ted.sussman@
adina.com > wrote:
> > > > >
> > > > > Hello Gilles,
> > > > >
> > > > > Thank you for your quick answer. I confirm that if
exec is used, both processes
> > > > > immediately
> > > > > abort.
> > > > >
> > > > > Now suppose that the line
> > > > >
> > > > > echo "After aborttest:
> > > > > OMPI_COMM_WORLD_RANK="$OMPI_COMM_WORLD_RANK
> > > > >
> > > > > is added to the end of dum.sh.
> > > > >
> > > > > If Example 2 is run with Open MPI 1.4.3, the output is
> > > > >
> > > > > After aborttest: OMPI_COMM_WORLD_RANK=0
> > > > >
> > > > > which shows that the shell script for the process with
rank 0 continues after the
> > > > > abort,
> > > > > but that the shell script for the process with rank 1
does not continue after the
> > > > > abort.
> > > > >
> > > > > If Example 2 is run with Open MPI 2.1.1, with exec
used to invoke
> > > > > aborttest02.exe, then
> > > > > there is no such output, which shows that both shell
scripts do not continue after
> > > > > the abort.
> > > > >
> > > > > I prefer the Open MPI 1.4.3 behavior because our
original application depends
> > > > > upon the
> > > > > Open MPI 1.4.3 behavior. (Our original application
will also work if both
> > > > > executables are
> > > > > aborted, and if both shell scripts continue after the
abort.)
> > > > >
> > > > > It might be too much to expect, but is there a way to
recover the Open MPI 1.4.3
> > > > > behavior
> > > > > using Open MPI 2.1.1?
> > > > >
> > > > > Sincerely,
> > > > >
> > > > > Ted Sussman
> > > > >
> > > > >
> > > > > On 15 Jun 2017 at 9:50, Gilles Gouaillardet wrote:
> > > > >
> > > > > Ted,
> > > > >
> > > > >
> > > > > fwiw, the 'master' branch has the behavior you expect.
> > > > >
> > > > >
> > > > > meanwhile, you can simple edit your 'dum.sh' script
and replace
> > > > >
> > > > > /home/buildadina/src/aborttest02/aborttest02.exe
> > > > >
> > > > > with
> > > > >
> > > > > exec /home/buildadina/src/aborttest02/aborttest02.exe
> > > > >
> > > > >
> > > > > Cheers,
> > > > >
> > > > >
> > > > > Gilles
> > > > >
> > > > >
> > > > > On 6/15/2017 3:01 AM, Ted Sussman wrote:
> > > > > Hello,
> > > > >
> > > > > My question concerns MPI_ABORT, indirect execution of
> > > > > executables by mpirun and Open
> > > > > MPI 2.1.1. When mpirun runs executables directly, MPI
_ABORT
> > > > > works as expected, but
> > > > > when mpirun runs executables indirectly, MPI_ABORT
does not
> > > > > work as expected.
> > > > >
> > > > > If Open MPI 1.4.3 is used instead of Open MPI 2.1.1,
MPI_ABORT
> > > > > works as expected in all
> > > > > cases.
> > > > >
> > > > > The examples given below have been simplified as far
as possible
> > > > > to show the issues.
> > > > >
> > > > > ---
> > > > >
> > > > > Example 1
> > > > >
> > > > > Consider an MPI job run in the following way:
> > > > >
> > > > > mpirun ... -app addmpw1
> > > > >
> > > > > where the appfile addmpw1 lists two executables:
> > > > >
> > > > > -n 1 -host gulftown ... aborttest02.exe
> > > > > -n 1 -host gulftown ... aborttest02.exe
> > > > >
> > > > > The two executables are executed on the local node
gulftown.
> > > > > aborttest02 calls MPI_ABORT
> > > > > for rank 0, then sleeps.
> > > > >
> > > > > The above MPI job runs as expected. Both processes
immediately
> > > > > abort when rank 0 calls
> > > > > MPI_ABORT.
> > > > >
> > > > > ---
> > > > >
> > > > > Example 2
> > > > >
> > > > > Now change the above example as follows:
> > > > >
> > > > > mpirun ... -app addmpw2
> > > > >
> > > > > where the appfile addmpw2 lists shell scripts:
> > > > >
> > > > > -n 1 -host gulftown ... dum.sh
> > > > > -n 1 -host gulftown ... dum.sh
> > > > >
> > > > > dum.sh invokes aborttest02.exe. So aborttest02.exe is
executed
> > > > > indirectly by mpirun.
> > > > >
> > > > > In this case, the MPI job only aborts process 0 when
rank 0 calls
> > > > > MPI_ABORT. Process 1
> > > > > continues to run. This behavior is unexpected.
> > > > >
> > > > > ----
> > > > >
> > > > > I have attached all files to this E-mail. Since there
are absolute
> > > > > pathnames in the files, to
> > > > > reproduce my findings, you will need to update the
pathnames in the
> > > > > appfiles and shell
> > > > > scripts. To run example 1,
> > > > >
> > > > > sh run1.sh
> > > > >
> > > > > and to run example 2,
> > > > >
> > > > > sh run2.sh
> > > > >
> > > > > ---
> > > > >
> > > > > I have tested these examples with Open MPI 1.4.3 and 2.
0.3. In
> > > > > Open MPI 1.4.3, both
> > > > > examples work as expected. Open MPI 2.0.3 has the
same behavior
> > > > > as Open MPI 2.1.1.
> > > > >
> > > > > ---
> > > > >
> > > > > I would prefer that Open MPI 2.1.1 aborts both
processes, even
> > > > > when the executables are
> > > > > invoked indirectly by mpirun. If there is an MCA
setting that is
> > > > > needed to make Open MPI
> > > > > 2.1.1 abort both processes, please let me know.
> > > > >
> > > > >
> > > > > Sincerely,
> > > > >
> > > > > Theodore Sussman
> > > > >
> > > > >
> > > > > The following section of this message contains a file
attachment
> > > > > prepared for transmission using the Internet MIME
message format.
> > > > > If you are using Pegasus Mail, or any other MIME-
compliant system,
> > > > > you should be able to save it or view it from within
your mailer.
> > > > > If you cannot, please ask your system administrator
for assistance.
> > > > >
> > > > > ---- File information -----------
> > > > > File: config.log.bz2
> > > > > Date: 14 Jun 2017, 13:35
> > > > > Size: 146548 bytes.
> > > > > Type: Binary
> > > > >
> > > > >
> > > > > The following section of this message contains a file
attachment
> > > > > prepared for transmission using the Internet MIME
message format.
> > > > > If you are using Pegasus Mail, or any other MIME-
compliant system,
> > > > > you should be able to save it or view it from within
your mailer.
> > > > > If you cannot, please ask your system administrator
for assistance.
> > > > >
> > > > > ---- File information -----------
> > > > > File: ompi_info.bz2
> > > > > Date: 14 Jun 2017, 13:35
> > > > > Size: 24088 bytes.
> > > > > Type: Binary
> > > > >
> > > > >
> > > > > The following section of this message contains a file
attachment
> > > > > prepared for transmission using the Internet MIME
message format.
> > > > > If you are using Pegasus Mail, or any other MIME-
compliant system,
> > > > > you should be able to save it or view it from within
your mailer.
> > > > > If you cannot, please ask your system administrator
for assistance.
> > > > >
> > > > > ---- File information -----------
> > > > > File: aborttest02.tgz
> > > > > Date: 14 Jun 2017, 13:52
> > > > > Size: 4285 bytes.
> > > > > Type: Binary
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > users mailing list
> > > > > ***@lists.open-mpi.org
> > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users

> > > > >
> > > > > _______________________________________________
> > > > > users mailing list
> > > > > ***@lists.open-mpi.org
> > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users

> > > > >
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > users mailing list
> > > > > ***@lists.open-mpi.org
> > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users

> > > > >
> > > > > _______________________________________________
> > > > > users mailing list
> > > > > ***@lists.open-mpi.org
> > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users

> > > > >
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > users mailing list
> > > > > ***@lists.open-mpi.org
> > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users

> > > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > users mailing list
> > > > ***@lists.open-mpi.org
> > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > > >
> > >
> > >
> > > _______________________________________________
> > > users mailing list
> > > ***@lists.open-mpi.org
> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
> >
> > --
> > Jeff Squyres
> > ***@cisco.com
> >
> > _______________________________________________
> > users mailing list
> > ***@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
g***@rist.or.jp
2017-06-17 07:02:13 UTC
Permalink
Ted,

i do not observe the same behavior you describe with Open MPI 2.1.1

# mpirun -np 2 -mca btl tcp,self --mca odls_base_verbose 5 ./abort.sh

abort.sh 31361 launching abort
abort.sh 31362 launching abort
I am rank 0 with pid 31363
I am rank 1 with pid 31364
------------------------------------------------------------------------
--
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
------------------------------------------------------------------------
--
[linux:31356] [[18199,0],0] odls:kill_local_proc working on WILDCARD
[linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
[[18199,1],0]
[linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],0]
[linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 31361
SUCCESS
[linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
[[18199,1],1]
[linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],1]
[linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 31362
SUCCESS
[linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],0]
[linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 31361
SUCCESS
[linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],1]
[linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 31362
SUCCESS
[linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],0]
[linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 31361
SUCCESS
[linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],1]
[linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 31362
SUCCESS
[linux:31356] [[18199,0],0] odls:kill_local_proc working on WILDCARD
[linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
[[18199,1],0]
[linux:31356] [[18199,0],0] odls:kill_local_proc child [[18199,1],0] is
not alive
[linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
[[18199,1],1]
[linux:31356] [[18199,0],0] odls:kill_local_proc child [[18199,1],1] is
not alive


Open MPI did kill both shells, and they were indeed killed as evidenced
by ps

#ps -fu gilles --forest
UID PID PPID C STIME TTY TIME CMD
gilles 1564 1561 0 15:39 ? 00:00:01 sshd: ***@pts/1
gilles 1565 1564 0 15:39 pts/1 00:00:00 \_ -bash
gilles 31356 1565 3 15:57 pts/1 00:00:00 \_ /home/gilles/
local/ompi-v2.x/bin/mpirun -np 2 -mca btl tcp,self --mca odls_base
gilles 31364 1 1 15:57 pts/1 00:00:00 ./abort


so trapping SIGTERM in your shell and manually killing the MPI task
should work
(as Jeff explained, as long as the shell script is fast enough to do
that between SIGTERM and SIGKILL)


if you observe a different behavior, please double check your Open MPI
version and post the outputs of the same commands.

btw, are you running from a batch manager ? if yes, which one ?

Cheers,

Gilles

----- Original Message -----
> Ted,
>
> if you
>
> mpirun --mca odls_base_verbose 10 ...
>
> you will see which processes get killed and how
>
> Best regards,
>
>
> Gilles
>
> ----- Original Message -----
> > Hello Jeff,
> >
> > Thanks for your comments.
> >
> > I am not seeing behavior #4, on the two computers that I have tested
> on, using Open MPI
> > 2.1.1.
> >
> > I wonder if you can duplicate my results with the files that I have
> uploaded.
> >
> > Regarding what is the "correct" behavior, I am willing to modify my
> application to correspond
> > to Open MPI's behavior (whatever behavior the Open MPI developers
> decide is best) --
> > provided that Open MPI does in fact kill off both shells.
> >
> > So my highest priority now is to find out why Open MPI 2.1.1 does
not
> kill off both shells on
> > my computer.
> >
> > Sincerely,
> >
> > Ted Sussman
> >
> > On 16 Jun 2017 at 16:35, Jeff Squyres (jsquyres) wrote:
> >
> > > Ted --
> > >
> > > Sorry for jumping in late. Here's my $0.02...
> > >
> > > In the runtime, we can do 4 things:
> > >
> > > 1. Kill just the process that we forked.
> > > 2. Kill just the process(es) that call back and identify
themselves
> as MPI processes (we don't track this right now, but we could add that
> functionality).
> > > 3. Union of #1 and #2.
> > > 4. Kill all processes (to include any intermediate processes that
> are not included in #1 and #2).
> > >
> > > In Open MPI 2.x, #4 is the intended behavior. There may be a bug
or
> two that needs to get fixed (e.g., in your last mail, I don't see
> offhand why it waits until the MPI process finishes sleeping), but we
> should be killing the process group, which -- unless any of the
> descendant processes have explicitly left the process group -- should
> hit the entire process tree.
> > >
> > > Sidenote: there's actually a way to be a bit more aggressive and
do
> a better job of ensuring that we kill *all* processes (via creative
use
> of PR_SET_CHILD_SUBREAPER), but that's basically a future enhancement
/
> optimization.
> > >
> > > I think Gilles and Ralph proposed a good point to you: if you want
> to be sure to be able to do cleanup after an MPI process terminates (
> normally or abnormally), you should trap signals in your intermediate
> processes to catch what Open MPI's runtime throws and therefore know
> that it is time to cleanup.
> > >
> > > Hypothetically, this should work in all versions of Open MPI...?
> > >
> > > I think Ralph made a pull request that adds an MCA param to change
> the default behavior from #4 to #1.
> > >
> > > Note, however, that there's a little time between when Open MPI
> sends the SIGTERM and the SIGKILL, so this solution could be racy. If
> you find that you're running out of time to cleanup, we might be able
to
> make the delay between the SIGTERM and SIGKILL be configurable (e.g.,
> via MCA param).
> > >
> > >
> > >
> > >
> > > > On Jun 16, 2017, at 10:08 AM, Ted Sussman <***@adina.com
>
> wrote:
> > > >
> > > > Hello Gilles and Ralph,
> > > >
> > > > Thank you for your advice so far. I appreciate the time that
you
> have spent to educate me about the details of Open MPI.
> > > >
> > > > But I think that there is something fundamental that I don't
> understand. Consider Example 2 run with Open MPI 2.1.1.
> > > >
> > > > mpirun --> shell for process 0 --> executable for process 0 -->
> MPI calls, MPI_Abort
> > > > --> shell for process 1 --> executable for process 1 -->
> MPI calls
> > > >
> > > > After the MPI_Abort is called, ps shows that both shells are
> running, and that the executable for process 1 is running (in this
case,
> process 1 is sleeping). And mpirun does not exit until process 1 is
> finished sleeping.
> > > >
> > > > I cannot reconcile this observed behavior with the statement
> > > >
> > > > > > 2.x: each process is put into its own process group
> upon launch. When we issue a
> > > > > > "kill", we issue it to the process group. Thus,
every
> child proc of that child proc will
> > > > > > receive it. IIRC, this was the intended behavior.
> > > >
> > > > I assume that, for my example, there are two process groups.
The
> process group for process 0 contains the shell for process 0 and the
> executable for process 0; and the process group for process 1 contains
> the shell for process 1 and the executable for process 1. So what
does
> MPI_ABORT do? MPI_ABORT does not kill the process group for process 0,

> since the shell for process 0 continues. And MPI_ABORT does not kill
> the process group for process 1, since both the shell and executable
for
> process 1 continue.
> > > >
> > > > If I hit Ctrl-C after MPI_Abort is called, I get the message
> > > >
> > > > mpirun: abort is already in progress.. hit ctrl-c again to
> forcibly terminate
> > > >
> > > > but I don't need to hit Ctrl-C again because mpirun immediately
> exits.
> > > >
> > > > Can you shed some light on all of this?
> > > >
> > > > Sincerely,
> > > >
> > > > Ted Sussman
> > > >
> > > >
> > > > On 15 Jun 2017 at 14:44, ***@open-mpi.org wrote:
> > > >
> > > > >
> > > > > You have to understand that we have no way of knowing who is
> making MPI calls - all we see is
> > > > > the proc that we started, and we know someone of that rank is
> running (but we have no way of
> > > > > knowing which of the procs you sub-spawned it is).
> > > > >
> > > > > So the behavior you are seeking only occurred in some earlier
> release by sheer accident. Nor will
> > > > > you find it portable as there is no specification directing
that
> behavior.
> > > > >
> > > > > The behavior I´ve provided is to either deliver the signal to
_
> all_ child processes (including
> > > > > grandchildren etc.), or _only_ the immediate child of the
daemon.
> It won´t do what you describe -
> > > > > kill the mPI proc underneath the shell, but not the shell
itself.
> > > > >
> > > > > What you can eventually do is use PMIx to ask the runtime to
> selectively deliver signals to
> > > > > pid/procs for you. We don´t have that capability implemented
> just yet, I´m afraid.
> > > > >
> > > > > Meantime, when I get a chance, I can code an option that will
> record the pid of the subproc that
> > > > > calls MPI_Init, and then let´s you deliver signals to just
that
> proc. No promises as to when that will
> > > > > be done.
> > > > >
> > > > >
> > > > > On Jun 15, 2017, at 1:37 PM, Ted Sussman <ted.sussman@
adina.
> com> wrote:
> > > > >
> > > > > Hello Ralph,
> > > > >
> > > > > I am just an Open MPI end user, so I will need to wait for
> the next official release.
> > > > >
> > > > > mpirun --> shell for process 0 --> executable for process
0
> --> MPI calls
> > > > > --> shell for process 1 --> executable for process
1
> --> MPI calls
> > > > > ...
> > > > >
> > > > > I guess the question is, should MPI_ABORT kill the
> executables or the shells? I naively
> > > > > thought, that, since it is the executables that make the
MPI
> calls, it is the executables that
> > > > > should be aborted by the call to MPI_ABORT. Since the
> shells don't make MPI calls, the
> > > > > shells should not be aborted.
> > > > >
> > > > > And users might have several layers of shells in between
> mpirun and the executable.
> > > > >
> > > > > So now I will look for the latest version of Open MPI that
> has the 1.4.3 behavior.
> > > > >
> > > > > Sincerely,
> > > > >
> > > > > Ted Sussman
> > > > >
> > > > > On 15 Jun 2017 at 12:31, ***@open-mpi.org wrote:
> > > > >
> > > > > >
> > > > > > Yeah, things jittered a little there as we debated the "
> right" behavior. Generally, when we
> > > > > see that
> > > > > > happening it means that a param is required, but somehow
> we never reached that point.
> > > > > >
> > > > > > See if https://github.com/open-mpi/ompi/pull/3704 helps
-
> if so, I can schedule it for the next
> > > > > 2.x
> > > > > > release if the RMs agree to take it
> > > > > >
> > > > > > Ralph
> > > > > >
> > > > > > On Jun 15, 2017, at 12:20 PM, Ted Sussman <ted.
sussman
> @adina.com > wrote:
> > > > > >
> > > > > > Thank you for your comments.
> > > > > >
> > > > > > Our application relies upon "dum.sh" to clean up
after
> the process exits, either if the
> > > > > process
> > > > > > exits normally, or if the process exits abnormally
> because of MPI_ABORT. If the process
> > > > > > group is killed by MPI_ABORT, this clean up will not
> be performed. If exec is used to launch
> > > > > > the executable from dum.sh, then dum.sh is
terminated
> by the exec, so dum.sh cannot
> > > > > > perform any clean up.
> > > > > >
> > > > > > I suppose that other user applications might work
> similarly, so it would be good to have an
> > > > > > MCA parameter to control the behavior of MPI_ABORT.
> > > > > >
> > > > > > We could rewrite our shell script that invokes
mpirun,
> so that the cleanup that is now done
> > > > > > by
> > > > > > dum.sh is done by the invoking shell script after
> mpirun exits. Perhaps this technique is the
> > > > > > preferred way to clean up after mpirun is invoked.
> > > > > >
> > > > > > By the way, I have also tested with Open MPI 1.10.7,
> and Open MPI 1.10.7 has different
> > > > > > behavior than either Open MPI 1.4.3 or Open MPI 2.1.
1.
> In this explanation, it is important to
> > > > > > know that the aborttest executable sleeps for 20 sec.
> > > > > >
> > > > > > When running example 2:
> > > > > >
> > > > > > 1.4.3: process 1 immediately aborts
> > > > > > 1.10.7: process 1 doesn't abort and never stops.
> > > > > > 2.1.1 process 1 doesn't abort, but stops after it is
> finished sleeping
> > > > > >
> > > > > > Sincerely,
> > > > > >
> > > > > > Ted Sussman
> > > > > >
> > > > > > On 15 Jun 2017 at 9:18, ***@open-mpi.org wrote:
> > > > > >
> > > > > > Here is how the system is working:
> > > > > >
> > > > > > Master: each process is put into its own process
group
> upon launch. When we issue a
> > > > > > "kill", however, we only issue it to the individual
> process (instead of the process group
> > > > > > that is headed by that child process). This is
> probably a bug as I don´t believe that is
> > > > > > what we intended, but set that aside for now.
> > > > > >
> > > > > > 2.x: each process is put into its own process group
> upon launch. When we issue a
> > > > > > "kill", we issue it to the process group. Thus,
every
> child proc of that child proc will
> > > > > > receive it. IIRC, this was the intended behavior.
> > > > > >
> > > > > > It is rather trivial to make the change (it only
> involves 3 lines of code), but I´m not sure
> > > > > > of what our intended behavior is supposed to be.
Once
> we clarify that, it is also trivial
> > > > > > to add another MCA param (you can never have too
many!)
> to allow you to select the
> > > > > > other behavior.
> > > > > >
> > > > > >
> > > > > > On Jun 15, 2017, at 5:23 AM, Ted Sussman <ted.
sussman@
> adina.com > wrote:
> > > > > >
> > > > > > Hello Gilles,
> > > > > >
> > > > > > Thank you for your quick answer. I confirm that if
> exec is used, both processes
> > > > > > immediately
> > > > > > abort.
> > > > > >
> > > > > > Now suppose that the line
> > > > > >
> > > > > > echo "After aborttest:
> > > > > > OMPI_COMM_WORLD_RANK="$OMPI_COMM_WORLD_RANK
> > > > > >
> > > > > > is added to the end of dum.sh.
> > > > > >
> > > > > > If Example 2 is run with Open MPI 1.4.3, the output
is
> > > > > >
> > > > > > After aborttest: OMPI_COMM_WORLD_RANK=0
> > > > > >
> > > > > > which shows that the shell script for the process
with
> rank 0 continues after the
> > > > > > abort,
> > > > > > but that the shell script for the process with rank
1
> does not continue after the
> > > > > > abort.
> > > > > >
> > > > > > If Example 2 is run with Open MPI 2.1.1, with exec
> used to invoke
> > > > > > aborttest02.exe, then
> > > > > > there is no such output, which shows that both shell
> scripts do not continue after
> > > > > > the abort.
> > > > > >
> > > > > > I prefer the Open MPI 1.4.3 behavior because our
> original application depends
> > > > > > upon the
> > > > > > Open MPI 1.4.3 behavior. (Our original application
> will also work if both
> > > > > > executables are
> > > > > > aborted, and if both shell scripts continue after
the
> abort.)
> > > > > >
> > > > > > It might be too much to expect, but is there a way
to
> recover the Open MPI 1.4.3
> > > > > > behavior
> > > > > > using Open MPI 2.1.1?
> > > > > >
> > > > > > Sincerely,
> > > > > >
> > > > > > Ted Sussman
> > > > > >
> > > > > >
> > > > > > On 15 Jun 2017 at 9:50, Gilles Gouaillardet wrote:
> > > > > >
> > > > > > Ted,
> > > > > >
> > > > > >
> > > > > > fwiw, the 'master' branch has the behavior you
expect.
> > > > > >
> > > > > >
> > > > > > meanwhile, you can simple edit your 'dum.sh' script
> and replace
> > > > > >
> > > > > > /home/buildadina/src/aborttest02/aborttest02.exe
> > > > > >
> > > > > > with
> > > > > >
> > > > > > exec /home/buildadina/src/aborttest02/aborttest02.
exe
> > > > > >
> > > > > >
> > > > > > Cheers,
> > > > > >
> > > > > >
> > > > > > Gilles
> > > > > >
> > > > > >
> > > > > > On 6/15/2017 3:01 AM, Ted Sussman wrote:
> > > > > > Hello,
> > > > > >
> > > > > > My question concerns MPI_ABORT, indirect execution
of
> > > > > > executables by mpirun and Open
> > > > > > MPI 2.1.1. When mpirun runs executables directly,
MPI
> _ABORT
> > > > > > works as expected, but
> > > > > > when mpirun runs executables indirectly, MPI_ABORT
> does not
> > > > > > work as expected.
> > > > > >
> > > > > > If Open MPI 1.4.3 is used instead of Open MPI 2.1.1,
> MPI_ABORT
> > > > > > works as expected in all
> > > > > > cases.
> > > > > >
> > > > > > The examples given below have been simplified as far
> as possible
> > > > > > to show the issues.
> > > > > >
> > > > > > ---
> > > > > >
> > > > > > Example 1
> > > > > >
> > > > > > Consider an MPI job run in the following way:
> > > > > >
> > > > > > mpirun ... -app addmpw1
> > > > > >
> > > > > > where the appfile addmpw1 lists two executables:
> > > > > >
> > > > > > -n 1 -host gulftown ... aborttest02.exe
> > > > > > -n 1 -host gulftown ... aborttest02.exe
> > > > > >
> > > > > > The two executables are executed on the local node
> gulftown.
> > > > > > aborttest02 calls MPI_ABORT
> > > > > > for rank 0, then sleeps.
> > > > > >
> > > > > > The above MPI job runs as expected. Both processes
> immediately
> > > > > > abort when rank 0 calls
> > > > > > MPI_ABORT.
> > > > > >
> > > > > > ---
> > > > > >
> > > > > > Example 2
> > > > > >
> > > > > > Now change the above example as follows:
> > > > > >
> > > > > > mpirun ... -app addmpw2
> > > > > >
> > > > > > where the appfile addmpw2 lists shell scripts:
> > > > > >
> > > > > > -n 1 -host gulftown ... dum.sh
> > > > > > -n 1 -host gulftown ... dum.sh
> > > > > >
> > > > > > dum.sh invokes aborttest02.exe. So aborttest02.exe
is
> executed
> > > > > > indirectly by mpirun.
> > > > > >
> > > > > > In this case, the MPI job only aborts process 0 when
> rank 0 calls
> > > > > > MPI_ABORT. Process 1
> > > > > > continues to run. This behavior is unexpected.
> > > > > >
> > > > > > ----
> > > > > >
> > > > > > I have attached all files to this E-mail. Since
there
> are absolute
> > > > > > pathnames in the files, to
> > > > > > reproduce my findings, you will need to update the
> pathnames in the
> > > > > > appfiles and shell
> > > > > > scripts. To run example 1,
> > > > > >
> > > > > > sh run1.sh
> > > > > >
> > > > > > and to run example 2,
> > > > > >
> > > > > > sh run2.sh
> > > > > >
> > > > > > ---
> > > > > >
> > > > > > I have tested these examples with Open MPI 1.4.3 and
2.
> 0.3. In
> > > > > > Open MPI 1.4.3, both
> > > > > > examples work as expected. Open MPI 2.0.3 has the
> same behavior
> > > > > > as Open MPI 2.1.1.
> > > > > >
> > > > > > ---
> > > > > >
> > > > > > I would prefer that Open MPI 2.1.1 aborts both
> processes, even
> > > > > > when the executables are
> > > > > > invoked indirectly by mpirun. If there is an MCA
> setting that is
> > > > > > needed to make Open MPI
> > > > > > 2.1.1 abort both processes, please let me know.
> > > > > >
> > > > > >
> > > > > > Sincerely,
> > > > > >
> > > > > > Theodore Sussman
> > > > > >
> > > > > >
> > > > > > The following section of this message contains a
file
> attachment
> > > > > > prepared for transmission using the Internet MIME
> message format.
> > > > > > If you are using Pegasus Mail, or any other MIME-
> compliant system,
> > > > > > you should be able to save it or view it from within
> your mailer.
> > > > > > If you cannot, please ask your system administrator
> for assistance.
> > > > > >
> > > > > > ---- File information -----------
> > > > > > File: config.log.bz2
> > > > > > Date: 14 Jun 2017, 13:35
> > > > > > Size: 146548 bytes.
> > > > > > Type: Binary
> > > > > >
> > > > > >
> > > > > > The following section of this message contains a
file
> attachment
> > > > > > prepared for transmission using the Internet MIME
> message format.
> > > > > > If you are using Pegasus Mail, or any other MIME-
> compliant system,
> > > > > > you should be able to save it or view it from within
> your mailer.
> > > > > > If you cannot, please ask your system administrator
> for assistance.
> > > > > >
> > > > > > ---- File information -----------
> > > > > > File: ompi_info.bz2
> > > > > > Date: 14 Jun 2017, 13:35
> > > > > > Size: 24088 bytes.
> > > > > > Type: Binary
> > > > > >
> > > > > >
> > > > > > The following section of this message contains a
file
> attachment
> > > > > > prepared for transmission using the Internet MIME
> message format.
> > > > > > If you are using Pegasus Mail, or any other MIME-
> compliant system,
> > > > > > you should be able to save it or view it from within
> your mailer.
> > > > > > If you cannot, please ask your system administrator
> for assistance.
> > > > > >
> > > > > > ---- File information -----------
> > > > > > File: aborttest02.tgz
> > > > > > Date: 14 Jun 2017, 13:52
> > > > > > Size: 4285 bytes.
> > > > > > Type: Binary
> > > > > >
> > > > > >
> > > > > > _______________________________________________
> > > > > > users mailing list
> > > > > > ***@lists.open-mpi.org
> > > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users

>
> > > > > >
> > > > > > _______________________________________________
> > > > > > users mailing list
> > > > > > ***@lists.open-mpi.org
> > > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users

>
> > > > > >
> > > > > >
> > > > > >
> > > > > > _______________________________________________
> > > > > > users mailing list
> > > > > > ***@lists.open-mpi.org
> > > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users

>
> > > > > >
> > > > > > _______________________________________________
> > > > > > users mailing list
> > > > > > ***@lists.open-mpi.org
> > > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users

>
> > > > > >
> > > > > >
> > > > > >
> > > > > > _______________________________________________
> > > > > > users mailing list
> > > > > > ***@lists.open-mpi.org
> > > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users

>
> > > > > >
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > users mailing list
> > > > > ***@lists.open-mpi.org
> > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > users mailing list
> > > > ***@lists.open-mpi.org
> > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > >
> > >
> > > --
> > > Jeff Squyres
> > > ***@cisco.com
> > >
> > > _______________________________________________
> > > users mailing list
> > > ***@lists.open-mpi.org
> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > ***@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
Ted Sussman
2017-06-19 13:33:34 UTC
Permalink
Hello Gilles,

I retried my example, with the same results as I observed before. The process with rank 1
does not get killed by MPI_ABORT.

I have attached to this E-mail:

config.log.bz2
ompi_info.bz2 (uses ompi_info -a)
aborttest09.tgz

This testing is done on a computer running Linux 3.10.0. This is a different computer than
the computer that I previously used for testing. You can confirm that I am using Open MPI
2.1.1.

tar xvzf aborttest09.tgz
cd aborttest09
./sh run2.sh

run2.sh contains the command

/opt/openmpi-2.1.1-GNU/bin/mpirun -np 2 -mca btl tcp,self --mca odls_base_verbose 10
./dum.sh

The output from this run is in aborttest09/run2.sh.out.

The output shows that the the "default" component is selected by odls.

The only messages from odls are: odls: launch spawning child ... (two messages). There are
no messages from odls with "kill" and I see no SENDING SIGCONT / SIGKILL messages.

I am not running from within any batch manager.

Sincerely,

Ted Sussman

On 17 Jun 2017 at 16:02, ***@rist.or.jp wrote:

> Ted,
>
> i do not observe the same behavior you describe with Open MPI 2.1.1
>
> # mpirun -np 2 -mca btl tcp,self --mca odls_base_verbose 5 ./abort.sh
>
> abort.sh 31361 launching abort
> abort.sh 31362 launching abort
> I am rank 0 with pid 31363
> I am rank 1 with pid 31364
> ------------------------------------------------------------------------
> --
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> with errorcode 1.
>
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> ------------------------------------------------------------------------
> --
> [linux:31356] [[18199,0],0] odls:kill_local_proc working on WILDCARD
> [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> [[18199,1],0]
> [linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],0]
> [linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 31361
> SUCCESS
> [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> [[18199,1],1]
> [linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],1]
> [linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 31362
> SUCCESS
> [linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],0]
> [linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 31361
> SUCCESS
> [linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],1]
> [linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 31362
> SUCCESS
> [linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],0]
> [linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 31361
> SUCCESS
> [linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],1]
> [linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 31362
> SUCCESS
> [linux:31356] [[18199,0],0] odls:kill_local_proc working on WILDCARD
> [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> [[18199,1],0]
> [linux:31356] [[18199,0],0] odls:kill_local_proc child [[18199,1],0] is
> not alive
> [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> [[18199,1],1]
> [linux:31356] [[18199,0],0] odls:kill_local_proc child [[18199,1],1] is
> not alive
>
>
> Open MPI did kill both shells, and they were indeed killed as evidenced
> by ps
>
> #ps -fu gilles --forest
> UID PID PPID C STIME TTY TIME CMD
> gilles 1564 1561 0 15:39 ? 00:00:01 sshd: ***@pts/1
> gilles 1565 1564 0 15:39 pts/1 00:00:00 \_ -bash
> gilles 31356 1565 3 15:57 pts/1 00:00:00 \_ /home/gilles/
> local/ompi-v2.x/bin/mpirun -np 2 -mca btl tcp,self --mca odls_base
> gilles 31364 1 1 15:57 pts/1 00:00:00 ./abort
>
>
> so trapping SIGTERM in your shell and manually killing the MPI task
> should work
> (as Jeff explained, as long as the shell script is fast enough to do
> that between SIGTERM and SIGKILL)
>
>
> if you observe a different behavior, please double check your Open MPI
> version and post the outputs of the same commands.
>
> btw, are you running from a batch manager ? if yes, which one ?
>
> Cheers,
>
> Gilles
>
> ----- Original Message -----
> > Ted,
> >
> > if you
> >
> > mpirun --mca odls_base_verbose 10 ...
> >
> > you will see which processes get killed and how
> >
> > Best regards,
> >
> >
> > Gilles
> >
> > ----- Original Message -----
> > > Hello Jeff,
> > >
> > > Thanks for your comments.
> > >
> > > I am not seeing behavior #4, on the two computers that I have tested
> > on, using Open MPI
> > > 2.1.1.
> > >
> > > I wonder if you can duplicate my results with the files that I have
> > uploaded.
> > >
> > > Regarding what is the "correct" behavior, I am willing to modify my
> > application to correspond
> > > to Open MPI's behavior (whatever behavior the Open MPI developers
> > decide is best) --
> > > provided that Open MPI does in fact kill off both shells.
> > >
> > > So my highest priority now is to find out why Open MPI 2.1.1 does
> not
> > kill off both shells on
> > > my computer.
> > >
> > > Sincerely,
> > >
> > > Ted Sussman
> > >
> > > On 16 Jun 2017 at 16:35, Jeff Squyres (jsquyres) wrote:
> > >
> > > > Ted --
> > > >
> > > > Sorry for jumping in late. Here's my $0.02...
> > > >
> > > > In the runtime, we can do 4 things:
> > > >
> > > > 1. Kill just the process that we forked.
> > > > 2. Kill just the process(es) that call back and identify
> themselves
> > as MPI processes (we don't track this right now, but we could add that
> > functionality).
> > > > 3. Union of #1 and #2.
> > > > 4. Kill all processes (to include any intermediate processes that
> > are not included in #1 and #2).
> > > >
> > > > In Open MPI 2.x, #4 is the intended behavior. There may be a bug
> or
> > two that needs to get fixed (e.g., in your last mail, I don't see
> > offhand why it waits until the MPI process finishes sleeping), but we
> > should be killing the process group, which -- unless any of the
> > descendant processes have explicitly left the process group -- should
> > hit the entire process tree.
> > > >
> > > > Sidenote: there's actually a way to be a bit more aggressive and
> do
> > a better job of ensuring that we kill *all* processes (via creative
> use
> > of PR_SET_CHILD_SUBREAPER), but that's basically a future enhancement
> /
> > optimization.
> > > >
> > > > I think Gilles and Ralph proposed a good point to you: if you want
> > to be sure to be able to do cleanup after an MPI process terminates (
> > normally or abnormally), you should trap signals in your intermediate
> > processes to catch what Open MPI's runtime throws and therefore know
> > that it is time to cleanup.
> > > >
> > > > Hypothetically, this should work in all versions of Open MPI...?
> > > >
> > > > I think Ralph made a pull request that adds an MCA param to change
> > the default behavior from #4 to #1.
> > > >
> > > > Note, however, that there's a little time between when Open MPI
> > sends the SIGTERM and the SIGKILL, so this solution could be racy. If
> > you find that you're running out of time to cleanup, we might be able
> to
> > make the delay between the SIGTERM and SIGKILL be configurable (e.g.,
> > via MCA param).
> > > >
> > > >
> > > >
> > > >
> > > > > On Jun 16, 2017, at 10:08 AM, Ted Sussman <***@adina.com
> >
> > wrote:
> > > > >
> > > > > Hello Gilles and Ralph,
> > > > >
> > > > > Thank you for your advice so far. I appreciate the time that
> you
> > have spent to educate me about the details of Open MPI.
> > > > >
> > > > > But I think that there is something fundamental that I don't
> > understand. Consider Example 2 run with Open MPI 2.1.1.
> > > > >
> > > > > mpirun --> shell for process 0 --> executable for process 0 -->
> > MPI calls, MPI_Abort
> > > > > --> shell for process 1 --> executable for process 1 -->
> > MPI calls
> > > > >
> > > > > After the MPI_Abort is called, ps shows that both shells are
> > running, and that the executable for process 1 is running (in this
> case,
> > process 1 is sleeping). And mpirun does not exit until process 1 is
> > finished sleeping.
> > > > >
> > > > > I cannot reconcile this observed behavior with the statement
> > > > >
> > > > > > > 2.x: each process is put into its own process group
> > upon launch. When we issue a
> > > > > > > "kill", we issue it to the process group. Thus,
> every
> > child proc of that child proc will
> > > > > > > receive it. IIRC, this was the intended behavior.
> > > > >
> > > > > I assume that, for my example, there are two process groups.
> The
> > process group for process 0 contains the shell for process 0 and the
> > executable for process 0; and the process group for process 1 contains
> > the shell for process 1 and the executable for process 1. So what
> does
> > MPI_ABORT do? MPI_ABORT does not kill the process group for process 0,
>
> > since the shell for process 0 continues. And MPI_ABORT does not kill
> > the process group for process 1, since both the shell and executable
> for
> > process 1 continue.
> > > > >
> > > > > If I hit Ctrl-C after MPI_Abort is called, I get the message
> > > > >
> > > > > mpirun: abort is already in progress.. hit ctrl-c again to
> > forcibly terminate
> > > > >
> > > > > but I don't need to hit Ctrl-C again because mpirun immediately
> > exits.
> > > > >
> > > > > Can you shed some light on all of this?
> > > > >
> > > > > Sincerely,
> > > > >
> > > > > Ted Sussman
> > > > >
> > > > >
> > > > > On 15 Jun 2017 at 14:44, ***@open-mpi.org wrote:
> > > > >
> > > > > >
> > > > > > You have to understand that we have no way of knowing who is
> > making MPI calls - all we see is
> > > > > > the proc that we started, and we know someone of that rank is
> > running (but we have no way of
> > > > > > knowing which of the procs you sub-spawned it is).
> > > > > >
> > > > > > So the behavior you are seeking only occurred in some earlier
> > release by sheer accident. Nor will
> > > > > > you find it portable as there is no specification directing
> that
> > behavior.
> > > > > >
> > > > > > The behavior IŽve provided is to either deliver the signal to
> _
> > all_ child processes (including
> > > > > > grandchildren etc.), or _only_ the immediate child of the
> daemon.
> > It wonŽt do what you describe -
> > > > > > kill the mPI proc underneath the shell, but not the shell
> itself.
> > > > > >
> > > > > > What you can eventually do is use PMIx to ask the runtime to
> > selectively deliver signals to
> > > > > > pid/procs for you. We donŽt have that capability implemented
> > just yet, IŽm afraid.
> > > > > >
> > > > > > Meantime, when I get a chance, I can code an option that will
> > record the pid of the subproc that
> > > > > > calls MPI_Init, and then letŽs you deliver signals to just
> that
> > proc. No promises as to when that will
> > > > > > be done.
> > > > > >
> > > > > >
> > > > > > On Jun 15, 2017, at 1:37 PM, Ted Sussman <ted.sussman@
> adina.
> > com> wrote:
> > > > > >
> > > > > > Hello Ralph,
> > > > > >
> > > > > > I am just an Open MPI end user, so I will need to wait for
> > the next official release.
> > > > > >
> > > > > > mpirun --> shell for process 0 --> executable for process
> 0
> > --> MPI calls
> > > > > > --> shell for process 1 --> executable for process
> 1
> > --> MPI calls
> > > > > > ...
> > > > > >
> > > > > > I guess the question is, should MPI_ABORT kill the
> > executables or the shells? I naively
> > > > > > thought, that, since it is the executables that make the
> MPI
> > calls, it is the executables that
> > > > > > should be aborted by the call to MPI_ABORT. Since the
> > shells don't make MPI calls, the
> > > > > > shells should not be aborted.
> > > > > >
> > > > > > And users might have several layers of shells in between
> > mpirun and the executable.
> > > > > >
> > > > > > So now I will look for the latest version of Open MPI that
> > has the 1.4.3 behavior.
> > > > > >
> > > > > > Sincerely,
> > > > > >
> > > > > > Ted Sussman
> > > > > >
> > > > > > On 15 Jun 2017 at 12:31, ***@open-mpi.org wrote:
> > > > > >
> > > > > > >
> > > > > > > Yeah, things jittered a little there as we debated the "
> > right" behavior. Generally, when we
> > > > > > see that
> > > > > > > happening it means that a param is required, but somehow
> > we never reached that point.
> > > > > > >
> > > > > > > See if https://github.com/open-mpi/ompi/pull/3704 helps
> -
> > if so, I can schedule it for the next
> > > > > > 2.x
> > > > > > > release if the RMs agree to take it
> > > > > > >
> > > > > > > Ralph
> > > > > > >
> > > > > > > On Jun 15, 2017, at 12:20 PM, Ted Sussman <ted.
> sussman
> > @adina.com > wrote:
> > > > > > >
> > > > > > > Thank you for your comments.
> > > > > > >
> > > > > > > Our application relies upon "dum.sh" to clean up
> after
> > the process exits, either if the
> > > > > > process
> > > > > > > exits normally, or if the process exits abnormally
> > because of MPI_ABORT. If the process
> > > > > > > group is killed by MPI_ABORT, this clean up will not
> > be performed. If exec is used to launch
> > > > > > > the executable from dum.sh, then dum.sh is
> terminated
> > by the exec, so dum.sh cannot
> > > > > > > perform any clean up.
> > > > > > >
> > > > > > > I suppose that other user applications might work
> > similarly, so it would be good to have an
> > > > > > > MCA parameter to control the behavior of MPI_ABORT.
> > > > > > >
> > > > > > > We could rewrite our shell script that invokes
> mpirun,
> > so that the cleanup that is now done
> > > > > > > by
> > > > > > > dum.sh is done by the invoking shell script after
> > mpirun exits. Perhaps this technique is the
> > > > > > > preferred way to clean up after mpirun is invoked.
> > > > > > >
> > > > > > > By the way, I have also tested with Open MPI 1.10.7,
> > and Open MPI 1.10.7 has different
> > > > > > > behavior than either Open MPI 1.4.3 or Open MPI 2.1.
> 1.
> > In this explanation, it is important to
> > > > > > > know that the aborttest executable sleeps for 20 sec.
> > > > > > >
> > > > > > > When running example 2:
> > > > > > >
> > > > > > > 1.4.3: process 1 immediately aborts
> > > > > > > 1.10.7: process 1 doesn't abort and never stops.
> > > > > > > 2.1.1 process 1 doesn't abort, but stops after it is
> > finished sleeping
> > > > > > >
> > > > > > > Sincerely,
> > > > > > >
> > > > > > > Ted Sussman
> > > > > > >
> > > > > > > On 15 Jun 2017 at 9:18, ***@open-mpi.org wrote:
> > > > > > >
> > > > > > > Here is how the system is working:
> > > > > > >
> > > > > > > Master: each process is put into its own process
> group
> > upon launch. When we issue a
> > > > > > > "kill", however, we only issue it to the individual
> > process (instead of the process group
> > > > > > > that is headed by that child process). This is
> > probably a bug as I donŽt believe that is
> > > > > > > what we intended, but set that aside for now.
> > > > > > >
> > > > > > > 2.x: each process is put into its own process group
> > upon launch. When we issue a
> > > > > > > "kill", we issue it to the process group. Thus,
> every
> > child proc of that child proc will
> > > > > > > receive it. IIRC, this was the intended behavior.
> > > > > > >
> > > > > > > It is rather trivial to make the change (it only
> > involves 3 lines of code), but IŽm not sure
> > > > > > > of what our intended behavior is supposed to be.
> Once
> > we clarify that, it is also trivial
> > > > > > > to add another MCA param (you can never have too
> many!)
> > to allow you to select the
> > > > > > > other behavior.
> > > > > > >
> > > > > > >
> > > > > > > On Jun 15, 2017, at 5:23 AM, Ted Sussman <ted.
> sussman@
> > adina.com > wrote:
> > > > > > >
> > > > > > > Hello Gilles,
> > > > > > >
> > > > > > > Thank you for your quick answer. I confirm that if
> > exec is used, both processes
> > > > > > > immediately
> > > > > > > abort.
> > > > > > >
> > > > > > > Now suppose that the line
> > > > > > >
> > > > > > > echo "After aborttest:
> > > > > > > OMPI_COMM_WORLD_RANK="$OMPI_COMM_WORLD_RANK
> > > > > > >
> > > > > > > is added to the end of dum.sh.
> > > > > > >
> > > > > > > If Example 2 is run with Open MPI 1.4.3, the output
> is
> > > > > > >
> > > > > > > After aborttest: OMPI_COMM_WORLD_RANK=0
> > > > > > >
> > > > > > > which shows that the shell script for the process
> with
> > rank 0 continues after the
> > > > > > > abort,
> > > > > > > but that the shell script for the process with rank
> 1
> > does not continue after the
> > > > > > > abort.
> > > > > > >
> > > > > > > If Example 2 is run with Open MPI 2.1.1, with exec
> > used to invoke
> > > > > > > aborttest02.exe, then
> > > > > > > there is no such output, which shows that both shell
> > scripts do not continue after
> > > > > > > the abort.
> > > > > > >
> > > > > > > I prefer the Open MPI 1.4.3 behavior because our
> > original application depends
> > > > > > > upon the
> > > > > > > Open MPI 1.4.3 behavior. (Our original application
> > will also work if both
> > > > > > > executables are
> > > > > > > aborted, and if both shell scripts continue after
> the
> > abort.)
> > > > > > >
> > > > > > > It might be too much to expect, but is there a way
> to
> > recover the Open MPI 1.4.3
> > > > > > > behavior
> > > > > > > using Open MPI 2.1.1?
> > > > > > >
> > > > > > > Sincerely,
> > > > > > >
> > > > > > > Ted Sussman
> > > > > > >
> > > > > > >
> > > > > > > On 15 Jun 2017 at 9:50, Gilles Gouaillardet wrote:
> > > > > > >
> > > > > > > Ted,
> > > > > > >
> > > > > > >
> > > > > > > fwiw, the 'master' branch has the behavior you
> expect.
> > > > > > >
> > > > > > >
> > > > > > > meanwhile, you can simple edit your 'dum.sh' script
> > and replace
> > > > > > >
> > > > > > > /home/buildadina/src/aborttest02/aborttest02.exe
> > > > > > >
> > > > > > > with
> > > > > > >
> > > > > > > exec /home/buildadina/src/aborttest02/aborttest02.
> exe
> > > > > > >
> > > > > > >
> > > > > > > Cheers,
> > > > > > >
> > > > > > >
> > > > > > > Gilles
> > > > > > >
> > > > > > >
> > > > > > > On 6/15/2017 3:01 AM, Ted Sussman wrote:
> > > > > > > Hello,
> > > > > > >
> > > > > > > My question concerns MPI_ABORT, indirect execution
> of
> > > > > > > executables by mpirun and Open
> > > > > > > MPI 2.1.1. When mpirun runs executables directly,
> MPI
> > _ABORT
> > > > > > > works as expected, but
> > > > > > > when mpirun runs executables indirectly, MPI_ABORT
> > does not
> > > > > > > work as expected.
> > > > > > >
> > > > > > > If Open MPI 1.4.3 is used instead of Open MPI 2.1.1,
> > MPI_ABORT
> > > > > > > works as expected in all
> > > > > > > cases.
> > > > > > >
> > > > > > > The examples given below have been simplified as far
> > as possible
> > > > > > > to show the issues.
> > > > > > >
> > > > > > > ---
> > > > > > >
> > > > > > > Example 1
> > > > > > >
> > > > > > > Consider an MPI job run in the following way:
> > > > > > >
> > > > > > > mpirun ... -app addmpw1
> > > > > > >
> > > > > > > where the appfile addmpw1 lists two executables:
> > > > > > >
> > > > > > > -n 1 -host gulftown ... aborttest02.exe
> > > > > > > -n 1 -host gulftown ... aborttest02.exe
> > > > > > >
> > > > > > > The two executables are executed on the local node
> > gulftown.
> > > > > > > aborttest02 calls MPI_ABORT
> > > > > > > for rank 0, then sleeps.
> > > > > > >
> > > > > > > The above MPI job runs as expected. Both processes
> > immediately
> > > > > > > abort when rank 0 calls
> > > > > > > MPI_ABORT.
> > > > > > >
> > > > > > > ---
> > > > > > >
> > > > > > > Example 2
> > > > > > >
> > > > > > > Now change the above example as follows:
> > > > > > >
> > > > > > > mpirun ... -app addmpw2
> > > > > > >
> > > > > > > where the appfile addmpw2 lists shell scripts:
> > > > > > >
> > > > > > > -n 1 -host gulftown ... dum.sh
> > > > > > > -n 1 -host gulftown ... dum.sh
> > > > > > >
> > > > > > > dum.sh invokes aborttest02.exe. So aborttest02.exe
> is
> > executed
> > > > > > > indirectly by mpirun.
> > > > > > >
> > > > > > > In this case, the MPI job only aborts process 0 when
> > rank 0 calls
> > > > > > > MPI_ABORT. Process 1
> > > > > > > continues to run. This behavior is unexpected.
> > > > > > >
> > > > > > > ----
> > > > > > >
> > > > > > > I have attached all files to this E-mail. Since
> there
> > are absolute
> > > > > > > pathnames in the files, to
> > > > > > > reproduce my findings, you will need to update the
> > pathnames in the
> > > > > > > appfiles and shell
> > > > > > > scripts. To run example 1,
> > > > > > >
> > > > > > > sh run1.sh
> > > > > > >
> > > > > > > and to run example 2,
> > > > > > >
> > > > > > > sh run2.sh
> > > > > > >
> > > > > > > ---
> > > > > > >
> > > > > > > I have tested these examples with Open MPI 1.4.3 and
> 2.
> > 0.3. In
> > > > > > > Open MPI 1.4.3, both
> > > > > > > examples work as expected. Open MPI 2.0.3 has the
> > same behavior
> > > > > > > as Open MPI 2.1.1.
> > > > > > >
> > > > > > > ---
> > > > > > >
> > > > > > > I would prefer that Open MPI 2.1.1 aborts both
> > processes, even
> > > > > > > when the executables are
> > > > > > > invoked indirectly by mpirun. If there is an MCA
> > setting that is
> > > > > > > needed to make Open MPI
> > > > > > > 2.1.1 abort both processes, please let me know.
> > > > > > >
> > > > > > >
> > > > > > > Sincerely,
> > > > > > >
> > > > > > > Theodore Sussman
> > > > > > >
> > > > > > >
> > > > > > > The following section of this message contains a
> file
> > attachment
> > > > > > > prepared for transmission using the Internet MIME
> > message format.
> > > > > > > If you are using Pegasus Mail, or any other MIME-
> > compliant system,
> > > > > > > you should be able to save it or view it from within
> > your mailer.
> > > > > > > If you cannot, please ask your system administrator
> > for assistance.
> > > > > > >
> > > > > > > ---- File information -----------
> > > > > > > File: config.log.bz2
> > > > > > > Date: 14 Jun 2017, 13:35
> > > > > > > Size: 146548 bytes.
> > > > > > > Type: Binary
> > > > > > >
> > > > > > >
> > > > > > > The following section of this message contains a
> file
> > attachment
> > > > > > > prepared for transmission using the Internet MIME
> > message format.
> > > > > > > If you are using Pegasus Mail, or any other MIME-
> > compliant system,
> > > > > > > you should be able to save it or view it from within
> > your mailer.
> > > > > > > If you cannot, please ask your system administrator
> > for assistance.
> > > > > > >
> > > > > > > ---- File information -----------
> > > > > > > File: ompi_info.bz2
> > > > > > > Date: 14 Jun 2017, 13:35
> > > > > > > Size: 24088 bytes.
> > > > > > > Type: Binary
> > > > > > >
> > > > > > >
> > > > > > > The following section of this message contains a
> file
> > attachment
> > > > > > > prepared for transmission using the Internet MIME
> > message format.
> > > > > > > If you are using Pegasus Mail, or any other MIME-
> > compliant system,
> > > > > > > you should be able to save it or view it from within
> > your mailer.
> > > > > > > If you cannot, please ask your system administrator
> > for assistance.
> > > > > > >
> > > > > > > ---- File information -----------
> > > > > > > File: aborttest02.tgz
> > > > > > > Date: 14 Jun 2017, 13:52
> > > > > > > Size: 4285 bytes.
> > > > > > > Type: Binary
> > > > > > >
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > users mailing list
> > > > > > > ***@lists.open-mpi.org
> > > > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
> >
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > users mailing list
> > > > > > > ***@lists.open-mpi.org
> > > > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
> >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > users mailing list
> > > > > > > ***@lists.open-mpi.org
> > > > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
> >
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > users mailing list
> > > > > > > ***@lists.open-mpi.org
> > > > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
> >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > users mailing list
> > > > > > > ***@lists.open-mpi.org
> > > > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
> >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > _______________________________________________
> > > > > > users mailing list
> > > > > > ***@lists.open-mpi.org
> > > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > > > > >
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > users mailing list
> > > > > ***@lists.open-mpi.org
> > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > > >
> > > >
> > > > --
> > > > Jeff Squyres
> > > > ***@cisco.com
> > > >
> > > > _______________________________________________
> > > > users mailing list
> > > > ***@lists.open-mpi.org
> > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > >
> > >
> > >
> > > _______________________________________________
> > > users mailing list
> > > ***@lists.open-mpi.org
> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > >
> > _______________________________________________
> > users mailing list
> > ***@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
g***@rist.or.jp
2017-06-19 14:06:04 UTC
Permalink
Ted,

some traces are missing because you did not configure with --enable-
debug

i am afraid you have to do it (and you probably want to install that
debug version in an other location since its performances are not good
for production) in order to get all the logs.

Cheers,

Gilles

----- Original Message -----

Hello Gilles,

I retried my example, with the same results as I observed before.
The process with rank 1 does not get killed by MPI_ABORT.

I have attached to this E-mail:

config.log.bz2
ompi_info.bz2 (uses ompi_info -a)
aborttest09.tgz

This testing is done on a computer running Linux 3.10.0. This is a
different computer than the computer that I previously used for testing.
You can confirm that I am using Open MPI 2.1.1.

tar xvzf aborttest09.tgz
cd aborttest09
./sh run2.sh

run2.sh contains the command

/opt/openmpi-2.1.1-GNU/bin/mpirun -np 2 -mca btl tcp,self --mca odls
_base_verbose 10 ./dum.sh

The output from this run is in aborttest09/run2.sh.out.

The output shows that the the "default" component is selected by
odls.

The only messages from odls are: odls: launch spawning child ... (
two messages). There are no messages from odls with "kill" and I see no
SENDING SIGCONT / SIGKILL messages.

I am not running from within any batch manager.

Sincerely,

Ted Sussman

On 17 Jun 2017 at 16:02, ***@rist.or.jp wrote:

> Ted,
>
> i do not observe the same behavior you describe with Open MPI 2.1.
1
>
> # mpirun -np 2 -mca btl tcp,self --mca odls_base_verbose 5 ./abort.
sh
>
> abort.sh 31361 launching abort
> abort.sh 31362 launching abort
> I am rank 0 with pid 31363
> I am rank 1 with pid 31364
> ------------------------------------------------------------------
------
> --
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> with errorcode 1.
>
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> ------------------------------------------------------------------
------
> --
> [linux:31356] [[18199,0],0] odls:kill_local_proc working on
WILDCARD
> [linux:31356] [[18199,0],0] odls:kill_local_proc checking child
process
> [[18199,1],0]
> [linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],0]
> [linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 31361
> SUCCESS
> [linux:31356] [[18199,0],0] odls:kill_local_proc checking child
process
> [[18199,1],1]
> [linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],1]
> [linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 31362
> SUCCESS
> [linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],0]
> [linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 31361
> SUCCESS
> [linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],1]
> [linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 31362
> SUCCESS
> [linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],0]
> [linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 31361
> SUCCESS
> [linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],1]
> [linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 31362
> SUCCESS
> [linux:31356] [[18199,0],0] odls:kill_local_proc working on
WILDCARD
> [linux:31356] [[18199,0],0] odls:kill_local_proc checking child
process
> [[18199,1],0]
> [linux:31356] [[18199,0],0] odls:kill_local_proc child [[18199,1],
0] is
> not alive
> [linux:31356] [[18199,0],0] odls:kill_local_proc checking child
process
> [[18199,1],1]
> [linux:31356] [[18199,0],0] odls:kill_local_proc child [[18199,1],
1] is
> not alive
>
>
> Open MPI did kill both shells, and they were indeed killed as
evidenced
> by ps
>
> #ps -fu gilles --forest
> UID PID PPID C STIME TTY TIME CMD
> gilles 1564 1561 0 15:39 ? 00:00:01 sshd: ***@pts/1
> gilles 1565 1564 0 15:39 pts/1 00:00:00 \_ -bash
> gilles 31356 1565 3 15:57 pts/1 00:00:00 \_ /home/
gilles/
> local/ompi-v2.x/bin/mpirun -np 2 -mca btl tcp,self --mca odls_base
> gilles 31364 1 1 15:57 pts/1 00:00:00 ./abort
>
>
> so trapping SIGTERM in your shell and manually killing the MPI
task
> should work
> (as Jeff explained, as long as the shell script is fast enough to
do
> that between SIGTERM and SIGKILL)
>
>
> if you observe a different behavior, please double check your Open
MPI
> version and post the outputs of the same commands.
>
> btw, are you running from a batch manager ? if yes, which one ?
>
> Cheers,
>
> Gilles
>
> ----- Original Message -----
> > Ted,
> >
> > if you
> >
> > mpirun --mca odls_base_verbose 10 ...
> >
> > you will see which processes get killed and how
> >
> > Best regards,
> >
> >
> > Gilles
> >
> > ----- Original Message -----
> > > Hello Jeff,
> > >
> > > Thanks for your comments.
> > >
> > > I am not seeing behavior #4, on the two computers that I have
tested
> > on, using Open MPI
> > > 2.1.1.
> > >
> > > I wonder if you can duplicate my results with the files that I
have
> > uploaded.
> > >
> > > Regarding what is the "correct" behavior, I am willing to
modify my
> > application to correspond
> > > to Open MPI's behavior (whatever behavior the Open MPI
developers
> > decide is best) --
> > > provided that Open MPI does in fact kill off both shells.
> > >
> > > So my highest priority now is to find out why Open MPI 2.1.1
does
> not
> > kill off both shells on
> > > my computer.
> > >
> > > Sincerely,
> > >
> > > Ted Sussman
> > >
> > > On 16 Jun 2017 at 16:35, Jeff Squyres (jsquyres) wrote:
> > >
> > > > Ted --
> > > >
> > > > Sorry for jumping in late. Here's my $0.02...
> > > >
> > > > In the runtime, we can do 4 things:
> > > >
> > > > 1. Kill just the process that we forked.
> > > > 2. Kill just the process(es) that call back and identify
> themselves
> > as MPI processes (we don't track this right now, but we could
add that
> > functionality).
> > > > 3. Union of #1 and #2.
> > > > 4. Kill all processes (to include any intermediate processes
that
> > are not included in #1 and #2).
> > > >
> > > > In Open MPI 2.x, #4 is the intended behavior. There may be
a bug
> or
> > two that needs to get fixed (e.g., in your last mail, I don't
see
> > offhand why it waits until the MPI process finishes sleeping),
but we
> > should be killing the process group, which -- unless any of the
> > descendant processes have explicitly left the process group --
should
> > hit the entire process tree.
> > > >
> > > > Sidenote: there's actually a way to be a bit more aggressive
and
> do
> > a better job of ensuring that we kill *all* processes (via
creative
> use
> > of PR_SET_CHILD_SUBREAPER), but that's basically a future
enhancement
> /
> > optimization.
> > > >
> > > > I think Gilles and Ralph proposed a good point to you: if
you want
> > to be sure to be able to do cleanup after an MPI process
terminates (
> > normally or abnormally), you should trap signals in your
intermediate
> > processes to catch what Open MPI's runtime throws and therefore
know
> > that it is time to cleanup.
> > > >
> > > > Hypothetically, this should work in all versions of Open MPI.
..?
> > > >
> > > > I think Ralph made a pull request that adds an MCA param to
change
> > the default behavior from #4 to #1.
> > > >
> > > > Note, however, that there's a little time between when Open
MPI
> > sends the SIGTERM and the SIGKILL, so this solution could be
racy. If
> > you find that you're running out of time to cleanup, we might be
able
> to
> > make the delay between the SIGTERM and SIGKILL be configurable (
e.g.,
> > via MCA param).
> > > >
> > > >
> > > >
> > > >
> > > > > On Jun 16, 2017, at 10:08 AM, Ted Sussman <ted.sussman@
adina.com
> >
> > wrote:
> > > > >
> > > > > Hello Gilles and Ralph,
> > > > >
> > > > > Thank you for your advice so far. I appreciate the time
that
> you
> > have spent to educate me about the details of Open MPI.
> > > > >
> > > > > But I think that there is something fundamental that I don
't
> > understand. Consider Example 2 run with Open MPI 2.1.1.
> > > > >
> > > > > mpirun --> shell for process 0 --> executable for process
0 -->
> > MPI calls, MPI_Abort
> > > > > --> shell for process 1 --> executable for process
1 -->
> > MPI calls
> > > > >
> > > > > After the MPI_Abort is called, ps shows that both shells
are
> > running, and that the executable for process 1 is running (in
this
> case,
> > process 1 is sleeping). And mpirun does not exit until process
1 is
> > finished sleeping.
> > > > >
> > > > > I cannot reconcile this observed behavior with the
statement
> > > > >
> > > > > > > 2.x: each process is put into its own process
group
> > upon launch. When we issue a
> > > > > > > "kill", we issue it to the process group. Thus,

> every
> > child proc of that child proc will
> > > > > > > receive it. IIRC, this was the intended
behavior.
> > > > >
> > > > > I assume that, for my example, there are two process
groups.
> The
> > process group for process 0 contains the shell for process 0 and
the
> > executable for process 0; and the process group for process 1
contains
> > the shell for process 1 and the executable for process 1. So
what
> does
> > MPI_ABORT do? MPI_ABORT does not kill the process group for
process 0,
>
> > since the shell for process 0 continues. And MPI_ABORT does not
kill
> > the process group for process 1, since both the shell and
executable
> for
> > process 1 continue.
> > > > >
> > > > > If I hit Ctrl-C after MPI_Abort is called, I get the
message
> > > > >
> > > > > mpirun: abort is already in progress.. hit ctrl-c again to
> > forcibly terminate
> > > > >
> > > > > but I don't need to hit Ctrl-C again because mpirun
immediately
> > exits.
> > > > >
> > > > > Can you shed some light on all of this?
> > > > >
> > > > > Sincerely,
> > > > >
> > > > > Ted Sussman
> > > > >
> > > > >
> > > > > On 15 Jun 2017 at 14:44, ***@open-mpi.org wrote:
> > > > >
> > > > > >
> > > > > > You have to understand that we have no way of knowing
who is
> > making MPI calls - all we see is
> > > > > > the proc that we started, and we know someone of that
rank is
> > running (but we have no way of
> > > > > > knowing which of the procs you sub-spawned it is).
> > > > > >
> > > > > > So the behavior you are seeking only occurred in some
earlier
> > release by sheer accident. Nor will
> > > > > > you find it portable as there is no specification
directing
> that
> > behavior.
> > > > > >
> > > > > > The behavior IŽve provided is to either deliver the
signal to
> _
> > all_ child processes (including
> > > > > > grandchildren etc.), or _only_ the immediate child of
the
> daemon.
> > It wonŽt do what you describe -
> > > > > > kill the mPI proc underneath the shell, but not the
shell
> itself.
> > > > > >
> > > > > > What you can eventually do is use PMIx to ask the
runtime to
> > selectively deliver signals to
> > > > > > pid/procs for you. We donŽt have that capability
implemented
> > just yet, IŽm afraid.
> > > > > >
> > > > > > Meantime, when I get a chance, I can code an option that
will
> > record the pid of the subproc that
> > > > > > calls MPI_Init, and then letŽs you deliver signals to
just
> that
> > proc. No promises as to when that will
> > > > > > be done.
> > > > > >
> > > > > >
> > > > > > On Jun 15, 2017, at 1:37 PM, Ted Sussman <ted.
sussman@
> adina.
> > com> wrote:
> > > > > >
> > > > > > Hello Ralph,
> > > > > >
> > > > > > I am just an Open MPI end user, so I will need to
wait for
> > the next official release.
> > > > > >
> > > > > > mpirun --> shell for process 0 --> executable for
process
> 0
> > --> MPI calls
> > > > > > --> shell for process 1 --> executable for
process
> 1
> > --> MPI calls
> > > > > > ...
> > > > > >
> > > > > > I guess the question is, should MPI_ABORT kill the
> > executables or the shells? I naively
> > > > > > thought, that, since it is the executables that make
the
> MPI
> > calls, it is the executables that
> > > > > > should be aborted by the call to MPI_ABORT. Since
the
> > shells don't make MPI calls, the
> > > > > > shells should not be aborted.
> > > > > >
> > > > > > And users might have several layers of shells in
between
> > mpirun and the executable.
> > > > > >
> > > > > > So now I will look for the latest version of Open
MPI that
> > has the 1.4.3 behavior.
> > > > > >
> > > > > > Sincerely,
> > > > > >
> > > > > > Ted Sussman
> > > > > >
> > > > > > On 15 Jun 2017 at 12:31, ***@open-mpi.org wrote:
> > > > > >
> > > > > > >
> > > > > > > Yeah, things jittered a little there as we debated
the "
> > right" behavior. Generally, when we
> > > > > > see that
> > > > > > > happening it means that a param is required, but
somehow
> > we never reached that point.
> > > > > > >
> > > > > > > See if https://github.com/open-mpi/ompi/pull/3704
helps
> -
> > if so, I can schedule it for the next
> > > > > > 2.x
> > > > > > > release if the RMs agree to take it
> > > > > > >
> > > > > > > Ralph
> > > > > > >
> > > > > > > On Jun 15, 2017, at 12:20 PM, Ted Sussman <ted.
> sussman
> > @adina.com > wrote:
> > > > > > >
> > > > > > > Thank you for your comments.
> > > > > > >
> > > > > > > Our application relies upon "dum.sh" to clean
up
> after
> > the process exits, either if the
> > > > > > process
> > > > > > > exits normally, or if the process exits
abnormally
> > because of MPI_ABORT. If the process
> > > > > > > group is killed by MPI_ABORT, this clean up
will not
> > be performed. If exec is used to launch
> > > > > > > the executable from dum.sh, then dum.sh is
> terminated
> > by the exec, so dum.sh cannot
> > > > > > > perform any clean up.
> > > > > > >
> > > > > > > I suppose that other user applications might
work
> > similarly, so it would be good to have an
> > > > > > > MCA parameter to control the behavior of MPI_
ABORT.
> > > > > > >
> > > > > > > We could rewrite our shell script that invokes
> mpirun,
> > so that the cleanup that is now done
> > > > > > > by
> > > > > > > dum.sh is done by the invoking shell script
after
> > mpirun exits. Perhaps this technique is the
> > > > > > > preferred way to clean up after mpirun is
invoked.
> > > > > > >
> > > > > > > By the way, I have also tested with Open MPI 1.
10.7,
> > and Open MPI 1.10.7 has different
> > > > > > > behavior than either Open MPI 1.4.3 or Open
MPI 2.1.
> 1.
> > In this explanation, it is important to
> > > > > > > know that the aborttest executable sleeps for
20 sec.
> > > > > > >
> > > > > > > When running example 2:
> > > > > > >
> > > > > > > 1.4.3: process 1 immediately aborts
> > > > > > > 1.10.7: process 1 doesn't abort and never
stops.
> > > > > > > 2.1.1 process 1 doesn't abort, but stops after
it is
> > finished sleeping
> > > > > > >
> > > > > > > Sincerely,
> > > > > > >
> > > > > > > Ted Sussman
> > > > > > >
> > > > > > > On 15 Jun 2017 at 9:18, ***@open-mpi.org wrote:
> > > > > > >
> > > > > > > Here is how the system is working:
> > > > > > >
> > > > > > > Master: each process is put into its own
process
> group
> > upon launch. When we issue a
> > > > > > > "kill", however, we only issue it to the
individual
> > process (instead of the process group
> > > > > > > that is headed by that child process). This is
> > probably a bug as I donŽt believe that is
> > > > > > > what we intended, but set that aside for now.
> > > > > > >
> > > > > > > 2.x: each process is put into its own process
group
> > upon launch. When we issue a
> > > > > > > "kill", we issue it to the process group. Thus,

> every
> > child proc of that child proc will
> > > > > > > receive it. IIRC, this was the intended
behavior.
> > > > > > >
> > > > > > > It is rather trivial to make the change (it
only
> > involves 3 lines of code), but IŽm not sure
> > > > > > > of what our intended behavior is supposed to
be.
> Once
> > we clarify that, it is also trivial
> > > > > > > to add another MCA param (you can never have
too
> many!)
> > to allow you to select the
> > > > > > > other behavior.
> > > > > > >
> > > > > > >
> > > > > > > On Jun 15, 2017, at 5:23 AM, Ted Sussman <ted.
> sussman@
> > adina.com > wrote:
> > > > > > >
> > > > > > > Hello Gilles,
> > > > > > >
> > > > > > > Thank you for your quick answer. I confirm
that if
> > exec is used, both processes
> > > > > > > immediately
> > > > > > > abort.
> > > > > > >
> > > > > > > Now suppose that the line
> > > > > > >
> > > > > > > echo "After aborttest:
> > > > > > > OMPI_COMM_WORLD_RANK="$OMPI_COMM_WORLD_RANK
> > > > > > >
> > > > > > > is added to the end of dum.sh.
> > > > > > >
> > > > > > > If Example 2 is run with Open MPI 1.4.3, the
output
> is
> > > > > > >
> > > > > > > After aborttest: OMPI_COMM_WORLD_RANK=0
> > > > > > >
> > > > > > > which shows that the shell script for the
process
> with
> > rank 0 continues after the
> > > > > > > abort,
> > > > > > > but that the shell script for the process with
rank
> 1
> > does not continue after the
> > > > > > > abort.
> > > > > > >
> > > > > > > If Example 2 is run with Open MPI 2.1.1, with
exec
> > used to invoke
> > > > > > > aborttest02.exe, then
> > > > > > > there is no such output, which shows that both
shell
> > scripts do not continue after
> > > > > > > the abort.
> > > > > > >
> > > > > > > I prefer the Open MPI 1.4.3 behavior because
our
> > original application depends
> > > > > > > upon the
> > > > > > > Open MPI 1.4.3 behavior. (Our original
application
> > will also work if both
> > > > > > > executables are
> > > > > > > aborted, and if both shell scripts continue
after
> the
> > abort.)
> > > > > > >
> > > > > > > It might be too much to expect, but is there a
way
> to
> > recover the Open MPI 1.4.3
> > > > > > > behavior
> > > > > > > using Open MPI 2.1.1?
> > > > > > >
> > > > > > > Sincerely,
> > > > > > >
> > > > > > > Ted Sussman
> > > > > > >
> > > > > > >
> > > > > > > On 15 Jun 2017 at 9:50, Gilles Gouaillardet
wrote:
> > > > > > >
> > > > > > > Ted,
> > > > > > >
> > > > > > >
> > > > > > > fwiw, the 'master' branch has the behavior you
> expect.
> > > > > > >
> > > > > > >
> > > > > > > meanwhile, you can simple edit your 'dum.sh'
script
> > and replace
> > > > > > >
> > > > > > > /home/buildadina/src/aborttest02/aborttest02.
exe
> > > > > > >
> > > > > > > with
> > > > > > >
> > > > > > > exec /home/buildadina/src/aborttest02/
aborttest02.
> exe
> > > > > > >
> > > > > > >
> > > > > > > Cheers,
> > > > > > >
> > > > > > >
> > > > > > > Gilles
> > > > > > >
> > > > > > >
> > > > > > > On 6/15/2017 3:01 AM, Ted Sussman wrote:
> > > > > > > Hello,
> > > > > > >
> > > > > > > My question concerns MPI_ABORT, indirect
execution
> of
> > > > > > > executables by mpirun and Open
> > > > > > > MPI 2.1.1. When mpirun runs executables
directly,
> MPI
> > _ABORT
> > > > > > > works as expected, but
> > > > > > > when mpirun runs executables indirectly, MPI_
ABORT
> > does not
> > > > > > > work as expected.
> > > > > > >
> > > > > > > If Open MPI 1.4.3 is used instead of Open MPI
2.1.1,
> > MPI_ABORT
> > > > > > > works as expected in all
> > > > > > > cases.
> > > > > > >
> > > > > > > The examples given below have been simplified
as far
> > as possible
> > > > > > > to show the issues.
> > > > > > >
> > > > > > > ---
> > > > > > >
> > > > > > > Example 1
> > > > > > >
> > > > > > > Consider an MPI job run in the following way:
> > > > > > >
> > > > > > > mpirun ... -app addmpw1
> > > > > > >
> > > > > > > where the appfile addmpw1 lists two
executables:
> > > > > > >
> > > > > > > -n 1 -host gulftown ... aborttest02.exe
> > > > > > > -n 1 -host gulftown ... aborttest02.exe
> > > > > > >
> > > > > > > The two executables are executed on the local
node
> > gulftown.
> > > > > > > aborttest02 calls MPI_ABORT
> > > > > > > for rank 0, then sleeps.
> > > > > > >
> > > > > > > The above MPI job runs as expected. Both
processes
> > immediately
> > > > > > > abort when rank 0 calls
> > > > > > > MPI_ABORT.
> > > > > > >
> > > > > > > ---
> > > > > > >
> > > > > > > Example 2
> > > > > > >
> > > > > > > Now change the above example as follows:
> > > > > > >
> > > > > > > mpirun ... -app addmpw2
> > > > > > >
> > > > > > > where the appfile addmpw2 lists shell scripts:
> > > > > > >
> > > > > > > -n 1 -host gulftown ... dum.sh
> > > > > > > -n 1 -host gulftown ... dum.sh
> > > > > > >
> > > > > > > dum.sh invokes aborttest02.exe. So
aborttest02.exe
> is
> > executed
> > > > > > > indirectly by mpirun.
> > > > > > >
> > > > > > > In this case, the MPI job only aborts process
0 when
> > rank 0 calls
> > > > > > > MPI_ABORT. Process 1
> > > > > > > continues to run. This behavior is unexpected.
> > > > > > >
> > > > > > > ----
> > > > > > >
> > > > > > > I have attached all files to this E-mail.
Since
> there
> > are absolute
> > > > > > > pathnames in the files, to
> > > > > > > reproduce my findings, you will need to update
the
> > pathnames in the
> > > > > > > appfiles and shell
> > > > > > > scripts. To run example 1,
> > > > > > >
> > > > > > > sh run1.sh
> > > > > > >
> > > > > > > and to run example 2,
> > > > > > >
> > > > > > > sh run2.sh
> > > > > > >
> > > > > > > ---
> > > > > > >
> > > > > > > I have tested these examples with Open MPI 1.4.
3 and
> 2.
> > 0.3. In
> > > > > > > Open MPI 1.4.3, both
> > > > > > > examples work as expected. Open MPI 2.0.3 has
the
> > same behavior
> > > > > > > as Open MPI 2.1.1.
> > > > > > >
> > > > > > > ---
> > > > > > >
> > > > > > > I would prefer that Open MPI 2.1.1 aborts both
> > processes, even
> > > > > > > when the executables are
> > > > > > > invoked indirectly by mpirun. If there is an
MCA
> > setting that is
> > > > > > > needed to make Open MPI
> > > > > > > 2.1.1 abort both processes, please let me know.
> > > > > > >
> > > > > > >
> > > > > > > Sincerely,
> > > > > > >
> > > > > > > Theodore Sussman
> > > > > > >
> > > > > > >
> > > > > > > The following section of this message contains
a
> file
> > attachment
> > > > > > > prepared for transmission using the Internet
MIME
> > message format.
> > > > > > > If you are using Pegasus Mail, or any other
MIME-
> > compliant system,
> > > > > > > you should be able to save it or view it from
within
> > your mailer.
> > > > > > > If you cannot, please ask your system
administrator
> > for assistance.
> > > > > > >
> > > > > > > ---- File information -----------
> > > > > > > File: config.log.bz2
> > > > > > > Date: 14 Jun 2017, 13:35
> > > > > > > Size: 146548 bytes.
> > > > > > > Type: Binary
> > > > > > >
> > > > > > >
> > > > > > > The following section of this message contains
a
> file
> > attachment
> > > > > > > prepared for transmission using the Internet
MIME
> > message format.
> > > > > > > If you are using Pegasus Mail, or any other
MIME-
> > compliant system,
> > > > > > > you should be able to save it or view it from
within
> > your mailer.
> > > > > > > If you cannot, please ask your system
administrator
> > for assistance.
> > > > > > >
> > > > > > > ---- File information -----------
> > > > > > > File: ompi_info.bz2
> > > > > > > Date: 14 Jun 2017, 13:35
> > > > > > > Size: 24088 bytes.
> > > > > > > Type: Binary
> > > > > > >
> > > > > > >
> > > > > > > The following section of this message contains
a
> file
> > attachment
> > > > > > > prepared for transmission using the Internet
MIME
> > message format.
> > > > > > > If you are using Pegasus Mail, or any other
MIME-
> > compliant system,
> > > > > > > you should be able to save it or view it from
within
> > your mailer.
> > > > > > > If you cannot, please ask your system
administrator
> > for assistance.
> > > > > > >
> > > > > > > ---- File information -----------
> > > > > > > File: aborttest02.tgz
> > > > > > > Date: 14 Jun 2017, 13:52
> > > > > > > Size: 4285 bytes.
> > > > > > > Type: Binary
> > > > > > >
> > > > > > >
> > > > > > > ______________________________________________
_
> > > > > > > users mailing list
> > > > > > > ***@lists.open-mpi.org
> > > > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users

>
> >
> > > > > > >
> > > > > > > ______________________________________________
_
> > > > > > > users mailing list
> > > > > > > ***@lists.open-mpi.org
> > > > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users

>
> >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > ______________________________________________
_
> > > > > > > users mailing list
> > > > > > > ***@lists.open-mpi.org
> > > > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users

>
> >
> > > > > > >
> > > > > > > ______________________________________________
_
> > > > > > > users mailing list
> > > > > > > ***@lists.open-mpi.org
> > > > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users

>
> >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > ______________________________________________
_
> > > > > > > users mailing list
> > > > > > > ***@lists.open-mpi.org
> > > > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users

>
> >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > _______________________________________________
> > > > > > users mailing list
> > > > > > ***@lists.open-mpi.org
> > > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users

> > > > > >
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > users mailing list
> > > > > ***@lists.open-mpi.org
> > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > > >
> > > >
> > > > --
> > > > Jeff Squyres
> > > > ***@cisco.com
> > > >
> > > > _______________________________________________
> > > > users mailing list
> > > > ***@lists.open-mpi.org
> > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > >
> > >
> > >
> > > _______________________________________________
> > > users mailing list
> > > ***@lists.open-mpi.org
> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > >
> > _______________________________________________
> > users mailing list
> > ***@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Ted Sussman
2017-06-19 16:58:15 UTC
Permalink
Hello,

I have rebuilt Open MPI 2.1.1 on the same computer, including --enable-debug.

I have attached the abort test program aborttest10.tgz. This version sleeps for 5 sec before
calling MPI_ABORT, so that I can check the pids using ps.

This is what happens (see run2.sh.out).

Open MPI invokes two instances of dum.sh. Each instance of dum.sh invokes aborttest.exe.

Pid Process
-------------------
19565 dum.sh
19566 dum.sh
19567 aborttest10.exe
19568 aborttest10.exe

When MPI_ABORT is called, Open MPI sends SIGCONT, SIGTERM and SIGKILL to both
instances of dum.sh (pids 19565 and 19566).

ps shows that both the shell processes vanish, and that one of the aborttest10.exe processes
vanishes. But the other aborttest10.exe remains and continues until it is finished sleeping.

Hope that this information is useful.

Sincerely,

Ted Sussman



On 19 Jun 2017 at 23:06, ***@rist.or.jp wrote:

>
>  Ted,
>  
> some traces are missing  because you did not configure with --enable-debug
> i am afraid you have to do it (and you probably want to install that debug version in an other
> location since its performances are not good for production) in order to get all the logs.
>  
> Cheers,
>  
> Gilles
>  
> ----- Original Message -----
> Hello Gilles,
>
> I retried my example, with the same results as I observed before.  The process with rank 1
> does not get killed by MPI_ABORT.
>
> I have attached to this E-mail:
>
>   config.log.bz2
>   ompi_info.bz2  (uses ompi_info -a)
>   aborttest09.tgz
>
> This testing is done on a computer running Linux 3.10.0.  This is a different computer than
> the computer that I previously used for testing.  You can confirm that I am using Open MPI
> 2.1.1.
>
> tar xvzf aborttest09.tgz
> cd aborttest09
> ./sh run2.sh
>
> run2.sh contains the command
>
> /opt/openmpi-2.1.1-GNU/bin/mpirun -np 2 -mca btl tcp,self --mca odls_base_verbose 10
> ./dum.sh
>
> The output from this run is in aborttest09/run2.sh.out.
>
> The output shows that the the "default" component is selected by odls.
>
> The only messages from odls are: odls: launch spawning child ...  (two messages). There
> are no messages from odls with "kill" and I see no SENDING SIGCONT / SIGKILL
> messages.
>
> I am not running from within any batch manager.
>
> Sincerely,
>
> Ted Sussman
>
> On 17 Jun 2017 at 16:02, ***@rist.or.jp wrote:
>
> > Ted,
> >
> > i do not observe the same behavior you describe with Open MPI 2.1.1
> >
> > # mpirun -np 2 -mca btl tcp,self --mca odls_base_verbose 5 ./abort.sh
> >
> > abort.sh 31361 launching abort
> > abort.sh 31362 launching abort
> > I am rank 0 with pid 31363
> > I am rank 1 with pid 31364
> > ------------------------------------------------------------------------
> > --
> > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> > with errorcode 1.
> >
> > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> > You may or may not see output from other processes, depending on
> > exactly when Open MPI kills them.
> > ------------------------------------------------------------------------
> > --
> > [linux:31356] [[18199,0],0] odls:kill_local_proc working on WILDCARD
> > [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> > [[18199,1],0]
> > [linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],0]
> > [linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 31361
> > SUCCESS
> > [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> > [[18199,1],1]
> > [linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],1]
> > [linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 31362
> > SUCCESS
> > [linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],0]
> > [linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 31361
> > SUCCESS
> > [linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],1]
> > [linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 31362
> > SUCCESS
> > [linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],0]
> > [linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 31361
> > SUCCESS
> > [linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],1]
> > [linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 31362
> > SUCCESS
> > [linux:31356] [[18199,0],0] odls:kill_local_proc working on WILDCARD
> > [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> > [[18199,1],0]
> > [linux:31356] [[18199,0],0] odls:kill_local_proc child [[18199,1],0] is
> > not alive
> > [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> > [[18199,1],1]
> > [linux:31356] [[18199,0],0] odls:kill_local_proc child [[18199,1],1] is
> > not alive
> >
> >
> > Open MPI did kill both shells, and they were indeed killed as evidenced
> > by ps
> >
> > #ps -fu gilles --forest
> > UID        PID  PPID  C STIME TTY          TIME CMD
> > gilles    1564  1561  0 15:39 ?        00:00:01 sshd: ***@pts/1
> > gilles    1565  1564  0 15:39 pts/1    00:00:00  \_ -bash
> > gilles   31356  1565  3 15:57 pts/1    00:00:00      \_ /home/gilles/
> > local/ompi-v2.x/bin/mpirun -np 2 -mca btl tcp,self --mca odls_base
> > gilles   31364     1  1 15:57 pts/1    00:00:00 ./abort
> >
> >
> > so trapping SIGTERM in your shell and manually killing the MPI task
> > should work
> > (as Jeff explained, as long as the shell script is fast enough to do
> > that between SIGTERM and SIGKILL)
> >
> >
> > if you observe a different behavior, please double check your Open MPI
> > version and post the outputs of the same commands.
> >
> > btw, are you running from a batch manager ? if yes, which one ?
> >
> > Cheers,
> >
> > Gilles
> >
> > ----- Original Message -----
> > > Ted,
> > >
> > > if you
> > >
> > > mpirun --mca odls_base_verbose 10 ...
> > >
> > > you will see which processes get killed and how
> > >
> > > Best regards,
> > >
> > >
> > > Gilles
> > >
> > > ----- Original Message -----
> > > > Hello Jeff,
> > > >
> > > > Thanks for your comments.
> > > >
> > > > I am not seeing behavior #4, on the two computers that I have tested
> > > on, using Open MPI
> > > > 2.1.1.
> > > >
> > > > I wonder if you can duplicate my results with the files that I have
> > > uploaded.
> > > >
> > > > Regarding what is the "correct" behavior, I am willing to modify my
> > > application to correspond
> > > > to Open MPI's behavior (whatever behavior the Open MPI developers
> > > decide is best) --
> > > > provided that Open MPI does in fact kill off both shells.
> > > >
> > > > So my highest priority now is to find out why Open MPI 2.1.1 does
> > not
> > > kill off both shells on
> > > > my computer.
> > > >
> > > > Sincerely,
> > > >
> > > > Ted Sussman
> > > >
> > > >  On 16 Jun 2017 at 16:35, Jeff Squyres (jsquyres) wrote:
> > > >
> > > > > Ted --
> > > > >
> > > > > Sorry for jumping in late.  Here's my $0.02...
> > > > >
> > > > > In the runtime, we can do 4 things:
> > > > >
> > > > > 1. Kill just the process that we forked.
> > > > > 2. Kill just the process(es) that call back and identify
> > themselves
> > > as MPI processes (we don't track this right now, but we could add that
> > > functionality).
> > > > > 3. Union of #1 and #2.
> > > > > 4. Kill all processes (to include any intermediate processes that
> > > are not included in #1 and #2).
> > > > >
> > > > > In Open MPI 2.x, #4 is the intended behavior.  There may be a bug
> > or
> > > two that needs to get fixed (e.g., in your last mail, I don't see
> > > offhand why it waits until the MPI process finishes sleeping), but we
> > > should be killing the process group, which -- unless any of the
> > > descendant processes have explicitly left the process group -- should
> > > hit the entire process tree. 
> > > > >
> > > > > Sidenote: there's actually a way to be a bit more aggressive and
> > do
> > > a better job of ensuring that we kill *all* processes (via creative
> > use
> > > of PR_SET_CHILD_SUBREAPER), but that's basically a future enhancement
> > /
> > > optimization.
> > > > >
> > > > > I think Gilles and Ralph proposed a good point to you: if you want
> > > to be sure to be able to do cleanup after an MPI process terminates (
> > > normally or abnormally), you should trap signals in your intermediate
> > > processes to catch what Open MPI's runtime throws and therefore know
> > > that it is time to cleanup. 
> > > > >
> > > > > Hypothetically, this should work in all versions of Open MPI...?
> > > > >
> > > > > I think Ralph made a pull request that adds an MCA param to change
> > > the default behavior from #4 to #1.
> > > > >
> > > > > Note, however, that there's a little time between when Open MPI
> > > sends the SIGTERM and the SIGKILL, so this solution could be racy.  If
> > > you find that you're running out of time to cleanup, we might be able
> > to
> > > make the delay between the SIGTERM and SIGKILL be configurable (e.g.,
> > > via MCA param).
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > On Jun 16, 2017, at 10:08 AM, Ted Sussman <***@adina.com
> > >
> > > wrote:
> > > > > >
> > > > > > Hello Gilles and Ralph,
> > > > > >
> > > > > > Thank you for your advice so far.  I appreciate the time that
> > you
> > > have spent to educate me about the details of Open MPI.
> > > > > >
> > > > > > But I think that there is something fundamental that I don't
> > > understand.  Consider Example 2 run with Open MPI 2.1.1.
> > > > > >
> > > > > > mpirun --> shell for process 0 -->  executable for process 0 -->
> > > MPI calls, MPI_Abort
> > > > > >        --> shell for process 1 -->  executable for process 1 -->
> > > MPI calls
> > > > > >
> > > > > > After the MPI_Abort is called, ps shows that both shells are
> > > running, and that the executable for process 1 is running (in this
> > case,
> > > process 1 is sleeping).  And mpirun does not exit until process 1 is
> > > finished sleeping.
> > > > > >
> > > > > > I cannot reconcile this observed behavior with the statement
> > > > > >
> > > > > > >     >     2.x: each process is put into its own process group
> > > upon launch. When we issue a
> > > > > > >     >     "kill", we issue it to the process group. Thus,
> > every
> > > child proc of that child proc will
> > > > > > >     >     receive it. IIRC, this was the intended behavior.
> > > > > >
> > > > > > I assume that, for my example, there are two process groups. 
> > The
> > > process group for process 0 contains the shell for process 0 and the
> > > executable for process 0; and the process group for process 1 contains
> > > the shell for process 1 and the executable for process 1.  So what
> > does
> > > MPI_ABORT do?  MPI_ABORT does not kill the process group for process 0,
> > 
> > > since the shell for process 0 continues.  And MPI_ABORT does not kill
> > > the process group for process 1, since both the shell and executable
> > for
> > > process 1 continue.
> > > > > >
> > > > > > If I hit Ctrl-C after MPI_Abort is called, I get the message
> > > > > >
> > > > > > mpirun: abort is already in progress.. hit ctrl-c again to
> > > forcibly terminate
> > > > > >
> > > > > > but I don't need to hit Ctrl-C again because mpirun immediately
> > > exits.
> > > > > >
> > > > > > Can you shed some light on all of this?
> > > > > >
> > > > > > Sincerely,
> > > > > >
> > > > > > Ted Sussman
> > > > > >
> > > > > >
> > > > > > On 15 Jun 2017 at 14:44, ***@open-mpi.org wrote:
> > > > > >
> > > > > > >
> > > > > > > You have to understand that we have no way of knowing who is
> > > making MPI calls - all we see is
> > > > > > > the proc that we started, and we know someone of that rank is
> > > running (but we have no way of
> > > > > > > knowing which of the procs you sub-spawned it is).
> > > > > > >
> > > > > > > So the behavior you are seeking only occurred in some earlier
> > > release by sheer accident. Nor will
> > > > > > > you find it portable as there is no specification directing
> > that
> > > behavior.
> > > > > > >
> > > > > > > The behavior IŽve provided is to either deliver the signal to
> > _
> > > all_ child processes (including
> > > > > > > grandchildren etc.), or _only_ the immediate child of the
> > daemon.
> > >  It wonŽt do what you describe -
> > > > > > > kill the mPI proc underneath the shell, but not the shell
> > itself.
> > > > > > >
> > > > > > > What you can eventually do is use PMIx to ask the runtime to
> > > selectively deliver signals to
> > > > > > > pid/procs for you. We donŽt have that capability implemented
> > > just yet, IŽm afraid.
> > > > > > >
> > > > > > > Meantime, when I get a chance, I can code an option that will
> > > record the pid of the subproc that
> > > > > > > calls MPI_Init, and then letŽs you deliver signals to just
> > that
> > > proc. No promises as to when that will
> > > > > > > be done.
> > > > > > >
> > > > > > >
> > > > > > >     On Jun 15, 2017, at 1:37 PM, Ted Sussman <ted.sussman@
> > adina.
> > > com> wrote:
> > > > > > >
> > > > > > >     Hello Ralph,
> > > > > > >
> > > > > > >     I am just an Open MPI end user, so I will need to wait for
> > > the next official release.
> > > > > > >
> > > > > > >     mpirun --> shell for process 0 -->  executable for process
> > 0
> > > --> MPI calls
> > > > > > >            --> shell for process 1 -->  executable for process
> > 1
> > > --> MPI calls
> > > > > > >                                     ...
> > > > > > >
> > > > > > >     I guess the question is, should MPI_ABORT kill the
> > > executables or the shells?  I naively
> > > > > > >     thought, that, since it is the executables that make the
> > MPI
> > > calls, it is the executables that
> > > > > > >     should be aborted by the call to MPI_ABORT.  Since the
> > > shells don't make MPI calls, the
> > > > > > >     shells should not be aborted.
> > > > > > >
> > > > > > >     And users might have several layers of shells in between
> > > mpirun and the executable.
> > > > > > >
> > > > > > >     So now I will look for the latest version of Open MPI that
> > > has the 1.4.3 behavior.
> > > > > > >
> > > > > > >     Sincerely,
> > > > > > >
> > > > > > >     Ted Sussman
> > > > > > >
> > > > > > >     On 15 Jun 2017 at 12:31, ***@open-mpi.org wrote:
> > > > > > >
> > > > > > >     >
> > > > > > >     > Yeah, things jittered a little there as we debated the "
> > > right" behavior. Generally, when we
> > > > > > >     see that
> > > > > > >     > happening it means that a param is required, but somehow
> > > we never reached that point.
> > > > > > >     >
> > > > > > >     > See if https://github.com/open-mpi/ompi/pull/3704  helps
> > -
> > > if so, I can schedule it for the next
> > > > > > >     2.x
> > > > > > >     > release if the RMs agree to take it
> > > > > > >     >
> > > > > > >     > Ralph
> > > > > > >     >
> > > > > > >     >     On Jun 15, 2017, at 12:20 PM, Ted Sussman <ted.
> > sussman
> > > @adina.com > wrote:
> > > > > > >     >
> > > > > > >     >     Thank you for your comments.
> > > > > > >     >   
> > > > > > >     >     Our application relies upon "dum.sh" to clean up
> > after
> > > the process exits, either if the
> > > > > > >     process
> > > > > > >     >     exits normally, or if the process exits abnormally
> > > because of MPI_ABORT.  If the process
> > > > > > >     >     group is killed by MPI_ABORT, this clean up will not
> > > be performed.  If exec is used to launch
> > > > > > >     >     the executable from dum.sh, then dum.sh is
> > terminated
> > > by the exec, so dum.sh cannot
> > > > > > >     >     perform any clean up.
> > > > > > >     >   
> > > > > > >     >     I suppose that other user applications might work
> > > similarly, so it would be good to have an
> > > > > > >     >     MCA parameter to control the behavior of MPI_ABORT.
> > > > > > >     >   
> > > > > > >     >     We could rewrite our shell script that invokes
> > mpirun,
> > > so that the cleanup that is now done
> > > > > > >     >     by
> > > > > > >     >     dum.sh is done by the invoking shell script after
> > > mpirun exits.  Perhaps this technique is the
> > > > > > >     >     preferred way to clean up after mpirun is invoked.
> > > > > > >     >   
> > > > > > >     >     By the way, I have also tested with Open MPI 1.10.7,
> > > and Open MPI 1.10.7 has different
> > > > > > >     >     behavior than either Open MPI 1.4.3 or Open MPI 2.1.
> > 1.
> > >  In this explanation, it is important to
> > > > > > >     >     know that the aborttest executable sleeps for 20 sec.
> > > > > > >     >   
> > > > > > >     >     When running example 2:
> > > > > > >     >   
> > > > > > >     >     1.4.3: process 1 immediately aborts
> > > > > > >     >     1.10.7: process 1 doesn't abort and never stops.
> > > > > > >     >     2.1.1 process 1 doesn't abort, but stops after it is
> > > finished sleeping
> > > > > > >     >   
> > > > > > >     >     Sincerely,
> > > > > > >     >   
> > > > > > >     >     Ted Sussman
> > > > > > >     >   
> > > > > > >     >     On 15 Jun 2017 at 9:18, ***@open-mpi.org wrote:
> > > > > > >     >
> > > > > > >     >     Here is how the system is working:
> > > > > > >     >   
> > > > > > >     >     Master: each process is put into its own process
> > group
> > > upon launch. When we issue a
> > > > > > >     >     "kill", however, we only issue it to the individual
> > > process (instead of the process group
> > > > > > >     >     that is headed by that child process). This is
> > > probably a bug as I donŽt believe that is
> > > > > > >     >     what we intended, but set that aside for now.
> > > > > > >     >   
> > > > > > >     >     2.x: each process is put into its own process group
> > > upon launch. When we issue a
> > > > > > >     >     "kill", we issue it to the process group. Thus,
> > every
> > > child proc of that child proc will
> > > > > > >     >     receive it. IIRC, this was the intended behavior.
> > > > > > >     >   
> > > > > > >     >     It is rather trivial to make the change (it only
> > > involves 3 lines of code), but IŽm not sure
> > > > > > >     >     of what our intended behavior is supposed to be.
> > Once
> > > we clarify that, it is also trivial
> > > > > > >     >     to add another MCA param (you can never have too
> > many!)
> > >  to allow you to select the
> > > > > > >     >     other behavior.
> > > > > > >     >   
> > > > > > >     >
> > > > > > >     >     On Jun 15, 2017, at 5:23 AM, Ted Sussman <ted.
> > sussman@
> > > adina.com > wrote:
> > > > > > >     >   
> > > > > > >     >     Hello Gilles,
> > > > > > >     >   
> > > > > > >     >     Thank you for your quick answer.  I confirm that if
> > > exec is used, both processes
> > > > > > >     >     immediately
> > > > > > >     >     abort.
> > > > > > >     >   
> > > > > > >     >     Now suppose that the line
> > > > > > >     >   
> > > > > > >     >     echo "After aborttest:
> > > > > > >     >     OMPI_COMM_WORLD_RANK="$OMPI_COMM_WORLD_RANK
> > > > > > >     >   
> > > > > > >     >     is added to the end of dum.sh.
> > > > > > >     >   
> > > > > > >     >     If Example 2 is run with Open MPI 1.4.3, the output
> > is
> > > > > > >     >   
> > > > > > >     >     After aborttest: OMPI_COMM_WORLD_RANK=0
> > > > > > >     >   
> > > > > > >     >     which shows that the shell script for the process
> > with
> > > rank 0 continues after the
> > > > > > >     >     abort,
> > > > > > >     >     but that the shell script for the process with rank
> > 1
> > > does not continue after the
> > > > > > >     >     abort.
> > > > > > >     >   
> > > > > > >     >     If Example 2 is run with Open MPI 2.1.1, with exec
> > > used to invoke
> > > > > > >     >     aborttest02.exe, then
> > > > > > >     >     there is no such output, which shows that both shell
> > > scripts do not continue after
> > > > > > >     >     the abort.
> > > > > > >     >   
> > > > > > >     >     I prefer the Open MPI 1.4.3 behavior because our
> > > original application depends
> > > > > > >     >     upon the
> > > > > > >     >     Open MPI 1.4.3 behavior.  (Our original application
> > > will also work if both
> > > > > > >     >     executables are
> > > > > > >     >     aborted, and if both shell scripts continue after
> > the
> > > abort.)
> > > > > > >     >   
> > > > > > >     >     It might be too much to expect, but is there a way
> > to
> > > recover the Open MPI 1.4.3
> > > > > > >     >     behavior
> > > > > > >     >     using Open MPI 2.1.1? 
> > > > > > >     >   
> > > > > > >     >     Sincerely,
> > > > > > >     >   
> > > > > > >     >     Ted Sussman
> > > > > > >     >   
> > > > > > >     >   
> > > > > > >     >     On 15 Jun 2017 at 9:50, Gilles Gouaillardet wrote:
> > > > > > >     >
> > > > > > >     >     Ted,
> > > > > > >     >   
> > > > > > >     >   
> > > > > > >     >     fwiw, the 'master' branch has the behavior you
> > expect.
> > > > > > >     >   
> > > > > > >     >   
> > > > > > >     >     meanwhile, you can simple edit your 'dum.sh' script
> > > and replace
> > > > > > >     >   
> > > > > > >     >     /home/buildadina/src/aborttest02/aborttest02.exe
> > > > > > >     >   
> > > > > > >     >     with
> > > > > > >     >   
> > > > > > >     >     exec /home/buildadina/src/aborttest02/aborttest02.
> > exe
> > > > > > >     >   
> > > > > > >     >   
> > > > > > >     >     Cheers,
> > > > > > >     >   
> > > > > > >     >   
> > > > > > >     >     Gilles
> > > > > > >     >   
> > > > > > >     >   
> > > > > > >     >     On 6/15/2017 3:01 AM, Ted Sussman wrote:
> > > > > > >     >     Hello,
> > > > > > >     >   
> > > > > > >     >     My question concerns MPI_ABORT, indirect execution
> > of
> > > > > > >     >     executables by mpirun and Open
> > > > > > >     >     MPI 2.1.1.  When mpirun runs executables directly,
> > MPI
> > > _ABORT
> > > > > > >     >     works as expected, but
> > > > > > >     >     when mpirun runs executables indirectly, MPI_ABORT
> > > does not
> > > > > > >     >     work as expected.
> > > > > > >     >   
> > > > > > >     >     If Open MPI 1.4.3 is used instead of Open MPI 2.1.1,
> > > MPI_ABORT
> > > > > > >     >     works as expected in all
> > > > > > >     >     cases.
> > > > > > >     >   
> > > > > > >     >     The examples given below have been simplified as far
> > > as possible
> > > > > > >     >     to show the issues.
> > > > > > >     >   
> > > > > > >     >     ---
> > > > > > >     >   
> > > > > > >     >     Example 1
> > > > > > >     >   
> > > > > > >     >     Consider an MPI job run in the following way:
> > > > > > >     >   
> > > > > > >     >     mpirun ... -app addmpw1
> > > > > > >     >   
> > > > > > >     >     where the appfile addmpw1 lists two executables:
> > > > > > >     >   
> > > > > > >     >     -n 1 -host gulftown ... aborttest02.exe
> > > > > > >     >     -n 1 -host gulftown ... aborttest02.exe
> > > > > > >     >   
> > > > > > >     >     The two executables are executed on the local node
> > > gulftown.
> > > > > > >     >      aborttest02 calls MPI_ABORT
> > > > > > >     >     for rank 0, then sleeps.
> > > > > > >     >   
> > > > > > >     >     The above MPI job runs as expected.  Both processes
> > > immediately
> > > > > > >     >     abort when rank 0 calls
> > > > > > >     >     MPI_ABORT.
> > > > > > >     >   
> > > > > > >     >     ---
> > > > > > >     >   
> > > > > > >     >     Example 2
> > > > > > >     >   
> > > > > > >     >     Now change the above example as follows:
> > > > > > >     >   
> > > > > > >     >     mpirun ... -app addmpw2
> > > > > > >     >   
> > > > > > >     >     where the appfile addmpw2 lists shell scripts:
> > > > > > >     >   
> > > > > > >     >     -n 1 -host gulftown ... dum.sh
> > > > > > >     >     -n 1 -host gulftown ... dum.sh
> > > > > > >     >   
> > > > > > >     >     dum.sh invokes aborttest02.exe.  So aborttest02.exe
> > is
> > > executed
> > > > > > >     >     indirectly by mpirun.
> > > > > > >     >   
> > > > > > >     >     In this case, the MPI job only aborts process 0 when
> > > rank 0 calls
> > > > > > >     >     MPI_ABORT.  Process 1
> > > > > > >     >     continues to run.  This behavior is unexpected.
> > > > > > >     >   
> > > > > > >     >     ----
> > > > > > >     >   
> > > > > > >     >     I have attached all files to this E-mail.  Since
> > there
> > > are absolute
> > > > > > >     >     pathnames in the files, to
> > > > > > >     >     reproduce my findings, you will need to update the
> > > pathnames in the
> > > > > > >     >     appfiles and shell
> > > > > > >     >     scripts.  To run example 1,
> > > > > > >     >   
> > > > > > >     >     sh run1.sh
> > > > > > >     >   
> > > > > > >     >     and to run example 2,
> > > > > > >     >   
> > > > > > >     >     sh run2.sh
> > > > > > >     >   
> > > > > > >     >     ---
> > > > > > >     >   
> > > > > > >     >     I have tested these examples with Open MPI 1.4.3 and
> > 2.
> > > 0.3.  In
> > > > > > >     >     Open MPI 1.4.3, both
> > > > > > >     >     examples work as expected.  Open MPI 2.0.3 has the
> > > same behavior
> > > > > > >     >     as Open MPI 2.1.1.
> > > > > > >     >   
> > > > > > >     >     ---
> > > > > > >     >   
> > > > > > >     >     I would prefer that Open MPI 2.1.1 aborts both
> > > processes, even
> > > > > > >     >     when the executables are
> > > > > > >     >     invoked indirectly by mpirun.  If there is an MCA
> > > setting that is
> > > > > > >     >     needed to make Open MPI
> > > > > > >     >     2.1.1 abort both processes, please let me know.
> > > > > > >     >   
> > > > > > >     >   
> > > > > > >     >     Sincerely,
> > > > > > >     >   
> > > > > > >     >     Theodore Sussman
> > > > > > >     >   
> > > > > > >     >   
> > > > > > >     >     The following section of this message contains a
> > file
> > > attachment
> > > > > > >     >     prepared for transmission using the Internet MIME
> > > message format.
> > > > > > >     >     If you are using Pegasus Mail, or any other MIME-
> > > compliant system,
> > > > > > >     >     you should be able to save it or view it from within
> > > your mailer.
> > > > > > >     >     If you cannot, please ask your system administrator
> > > for assistance.
> > > > > > >     >   
> > > > > > >     >       ---- File information -----------
> > > > > > >     >         File:  config.log.bz2
> > > > > > >     >         Date:  14 Jun 2017, 13:35
> > > > > > >     >         Size:  146548 bytes.
> > > > > > >     >         Type:  Binary
> > > > > > >     >   
> > > > > > >     >   
> > > > > > >     >     The following section of this message contains a
> > file
> > > attachment
> > > > > > >     >     prepared for transmission using the Internet MIME
> > > message format.
> > > > > > >     >     If you are using Pegasus Mail, or any other MIME-
> > > compliant system,
> > > > > > >     >     you should be able to save it or view it from within
> > > your mailer.
> > > > > > >     >     If you cannot, please ask your system administrator
> > > for assistance.
> > > > > > >     >   
> > > > > > >     >       ---- File information -----------
> > > > > > >     >         File:  ompi_info.bz2
> > > > > > >     >         Date:  14 Jun 2017, 13:35
> > > > > > >     >         Size:  24088 bytes.
> > > > > > >     >         Type:  Binary
> > > > > > >     >   
> > > > > > >     >   
> > > > > > >     >     The following section of this message contains a
> > file
> > > attachment
> > > > > > >     >     prepared for transmission using the Internet MIME
> > > message format.
> > > > > > >     >     If you are using Pegasus Mail, or any other MIME-
> > > compliant system,
> > > > > > >     >     you should be able to save it or view it from within
> > > your mailer.
> > > > > > >     >     If you cannot, please ask your system administrator
> > > for assistance.
> > > > > > >     >   
> > > > > > >     >       ---- File information -----------
> > > > > > >     >         File:  aborttest02.tgz
> > > > > > >     >         Date:  14 Jun 2017, 13:52
> > > > > > >     >         Size:  4285 bytes.
> > > > > > >     >         Type:  Binary
> > > > > > >     >   
> > > > > > >     >   
> > > > > > >     >     _______________________________________________
> > > > > > >     >     users mailing list
> > > > > > >     >     ***@lists.open-mpi.org
> > > > > > >     >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
> > >
> > > > > > >     >   
> > > > > > >     >     _______________________________________________
> > > > > > >     >     users mailing list
> > > > > > >     >     ***@lists.open-mpi.org
> > > > > > >     >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
> > >
> > > > > > >     >   
> > > > > > >     >   
> > > > > > >     >   
> > > > > > >     >     _______________________________________________
> > > > > > >     >     users mailing list
> > > > > > >     >     ***@lists.open-mpi.org
> > > > > > >     >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
> > >
> > > > > > >     >   
> > > > > > >     >     _______________________________________________
> > > > > > >     >     users mailing list
> > > > > > >     >     ***@lists.open-mpi.org
> > > > > > >     >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
> > >
> > > > > > >     >   
> > > > > > >     >   
> > > > > > >     >   
> > > > > > >     >     _______________________________________________
> > > > > > >     >     users mailing list
> > > > > > >     >     ***@lists.open-mpi.org
> > > > > > >     >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
> > >
> > > > > > >     >
> > > > > > >
> > > > > > >      
> > > > > > >     _______________________________________________
> > > > > > >     users mailing list
> > > > > > >     ***@lists.open-mpi.org
> > > > > > >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > > > > > >
> > > > > >
> > > > > >  
> > > > > > _______________________________________________
> > > > > > users mailing list
> > > > > > ***@lists.open-mpi.org
> > > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > > > >
> > > > >
> > > > > --
> > > > > Jeff Squyres
> > > > > ***@cisco.com
> > > > >
> > > > > _______________________________________________
> > > > > users mailing list
> > > > > ***@lists.open-mpi.org
> > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > users mailing list
> > > > ***@lists.open-mpi.org
> > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > > >
> > > _______________________________________________
> > > users mailing list
> > > ***@lists.open-mpi.org
> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > >
> > _______________________________________________
> > users mailing list
> > ***@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>   
r***@open-mpi.org
2017-06-19 17:10:30 UTC
Permalink
That is typical behavior when you throw something into “sleep” - not much we can do about it, I think.

> On Jun 19, 2017, at 9:58 AM, Ted Sussman <***@adina.com> wrote:
>
> Hello,
>
> I have rebuilt Open MPI 2.1.1 on the same computer, including --enable-debug.
>
> I have attached the abort test program aborttest10.tgz. This version sleeps for 5 sec before
> calling MPI_ABORT, so that I can check the pids using ps.
>
> This is what happens (see run2.sh.out).
>
> Open MPI invokes two instances of dum.sh. Each instance of dum.sh invokes aborttest.exe.
>
> Pid Process
> -------------------
> 19565 dum.sh
> 19566 dum.sh
> 19567 aborttest10.exe
> 19568 aborttest10.exe
>
> When MPI_ABORT is called, Open MPI sends SIGCONT, SIGTERM and SIGKILL to both
> instances of dum.sh (pids 19565 and 19566).
>
> ps shows that both the shell processes vanish, and that one of the aborttest10.exe processes
> vanishes. But the other aborttest10.exe remains and continues until it is finished sleeping.
>
> Hope that this information is useful.
>
> Sincerely,
>
> Ted Sussman
>
>
>
> On 19 Jun 2017 at 23:06, ***@rist.or.jp <mailto:***@rist.or.jp> wrote:
>
>>
>> Ted,
>>
>> some traces are missing because you did not configure with --enable-debug
>> i am afraid you have to do it (and you probably want to install that debug version in an other
>> location since its performances are not good for production) in order to get all the logs.
>>
>> Cheers,
>>
>> Gilles
>>
>> ----- Original Message -----
>> Hello Gilles,
>>
>> I retried my example, with the same results as I observed before. The process with rank 1
>> does not get killed by MPI_ABORT.
>>
>> I have attached to this E-mail:
>>
>> config.log.bz2
>> ompi_info.bz2 (uses ompi_info -a)
>> aborttest09.tgz
>>
>> This testing is done on a computer running Linux 3.10.0. This is a different computer than
>> the computer that I previously used for testing. You can confirm that I am using Open MPI
>> 2.1.1.
>>
>> tar xvzf aborttest09.tgz
>> cd aborttest09
>> ./sh run2.sh
>>
>> run2.sh contains the command
>>
>> /opt/openmpi-2.1.1-GNU/bin/mpirun -np 2 -mca btl tcp,self --mca odls_base_verbose 10
>> ./dum.sh
>>
>> The output from this run is in aborttest09/run2.sh.out.
>>
>> The output shows that the the "default" component is selected by odls.
>>
>> The only messages from odls are: odls: launch spawning child ... (two messages). There
>> are no messages from odls with "kill" and I see no SENDING SIGCONT / SIGKILL
>> messages.
>>
>> I am not running from within any batch manager.
>>
>> Sincerely,
>>
>> Ted Sussman
>>
>> On 17 Jun 2017 at 16:02, ***@rist.or.jp wrote:
>>
>>> Ted,
>>>
>>> i do not observe the same behavior you describe with Open MPI 2.1.1
>>>
>>> # mpirun -np 2 -mca btl tcp,self --mca odls_base_verbose 5 ./abort.sh
>>>
>>> abort.sh 31361 launching abort
>>> abort.sh 31362 launching abort
>>> I am rank 0 with pid 31363
>>> I am rank 1 with pid 31364
>>> ------------------------------------------------------------------------
>>> --
>>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
>>> with errorcode 1.
>>>
>>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>>> You may or may not see output from other processes, depending on
>>> exactly when Open MPI kills them.
>>> ------------------------------------------------------------------------
>>> --
>>> [linux:31356] [[18199,0],0] odls:kill_local_proc working on WILDCARD
>>> [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
>>> [[18199,1],0]
>>> [linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],0]
>>> [linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 31361
>>> SUCCESS
>>> [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
>>> [[18199,1],1]
>>> [linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],1]
>>> [linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 31362
>>> SUCCESS
>>> [linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],0]
>>> [linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 31361
>>> SUCCESS
>>> [linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],1]
>>> [linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 31362
>>> SUCCESS
>>> [linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],0]
>>> [linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 31361
>>> SUCCESS
>>> [linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],1]
>>> [linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 31362
>>> SUCCESS
>>> [linux:31356] [[18199,0],0] odls:kill_local_proc working on WILDCARD
>>> [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
>>> [[18199,1],0]
>>> [linux:31356] [[18199,0],0] odls:kill_local_proc child [[18199,1],0] is
>>> not alive
>>> [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
>>> [[18199,1],1]
>>> [linux:31356] [[18199,0],0] odls:kill_local_proc child [[18199,1],1] is
>>> not alive
>>>
>>>
>>> Open MPI did kill both shells, and they were indeed killed as evidenced
>>> by ps
>>>
>>> #ps -fu gilles --forest
>>> UID PID PPID C STIME TTY TIME CMD
>>> gilles 1564 1561 0 15:39 ? 00:00:01 sshd: ***@pts/1
>>> gilles 1565 1564 0 15:39 pts/1 00:00:00 \_ -bash
>>> gilles 31356 1565 3 15:57 pts/1 00:00:00 \_ /home/gilles/
>>> local/ompi-v2.x/bin/mpirun -np 2 -mca btl tcp,self --mca odls_base
>>> gilles 31364 1 1 15:57 pts/1 00:00:00 ./abort
>>>
>>>
>>> so trapping SIGTERM in your shell and manually killing the MPI task
>>> should work
>>> (as Jeff explained, as long as the shell script is fast enough to do
>>> that between SIGTERM and SIGKILL)
>>>
>>>
>>> if you observe a different behavior, please double check your Open MPI
>>> version and post the outputs of the same commands.
>>>
>>> btw, are you running from a batch manager ? if yes, which one ?
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>> ----- Original Message -----
>>>> Ted,
>>>>
>>>> if you
>>>>
>>>> mpirun --mca odls_base_verbose 10 ...
>>>>
>>>> you will see which processes get killed and how
>>>>
>>>> Best regards,
>>>>
>>>>
>>>> Gilles
>>>>
>>>> ----- Original Message -----
>>>>> Hello Jeff,
>>>>>
>>>>> Thanks for your comments.
>>>>>
>>>>> I am not seeing behavior #4, on the two computers that I have tested
>>>> on, using Open MPI
>>>>> 2.1.1.
>>>>>
>>>>> I wonder if you can duplicate my results with the files that I have
>>>> uploaded.
>>>>>
>>>>> Regarding what is the "correct" behavior, I am willing to modify my
>>>> application to correspond
>>>>> to Open MPI's behavior (whatever behavior the Open MPI developers
>>>> decide is best) --
>>>>> provided that Open MPI does in fact kill off both shells.
>>>>>
>>>>> So my highest priority now is to find out why Open MPI 2.1.1 does
>>> not
>>>> kill off both shells on
>>>>> my computer.
>>>>>
>>>>> Sincerely,
>>>>>
>>>>> Ted Sussman
>>>>>
>>>>> On 16 Jun 2017 at 16:35, Jeff Squyres (jsquyres) wrote:
>>>>>
>>>>>> Ted --
>>>>>>
>>>>>> Sorry for jumping in late. Here's my $0.02...
>>>>>>
>>>>>> In the runtime, we can do 4 things:
>>>>>>
>>>>>> 1. Kill just the process that we forked.
>>>>>> 2. Kill just the process(es) that call back and identify
>>> themselves
>>>> as MPI processes (we don't track this right now, but we could add that
>>>> functionality).
>>>>>> 3. Union of #1 and #2.
>>>>>> 4. Kill all processes (to include any intermediate processes that
>>>> are not included in #1 and #2).
>>>>>>
>>>>>> In Open MPI 2.x, #4 is the intended behavior. There may be a bug
>>> or
>>>> two that needs to get fixed (e.g., in your last mail, I don't see
>>>> offhand why it waits until the MPI process finishes sleeping), but we
>>>> should be killing the process group, which -- unless any of the
>>>> descendant processes have explicitly left the process group -- should
>>>> hit the entire process tree.
>>>>>>
>>>>>> Sidenote: there's actually a way to be a bit more aggressive and
>>> do
>>>> a better job of ensuring that we kill *all* processes (via creative
>>> use
>>>> of PR_SET_CHILD_SUBREAPER), but that's basically a future enhancement
>>> /
>>>> optimization.
>>>>>>
>>>>>> I think Gilles and Ralph proposed a good point to you: if you want
>>>> to be sure to be able to do cleanup after an MPI process terminates (
>>>> normally or abnormally), you should trap signals in your intermediate
>>>> processes to catch what Open MPI's runtime throws and therefore know
>>>> that it is time to cleanup.
>>>>>>
>>>>>> Hypothetically, this should work in all versions of Open MPI...?
>>>>>>
>>>>>> I think Ralph made a pull request that adds an MCA param to change
>>>> the default behavior from #4 to #1.
>>>>>>
>>>>>> Note, however, that there's a little time between when Open MPI
>>>> sends the SIGTERM and the SIGKILL, so this solution could be racy. If
>>>> you find that you're running out of time to cleanup, we might be able
>>> to
>>>> make the delay between the SIGTERM and SIGKILL be configurable (e.g.,
>>>> via MCA param).
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On Jun 16, 2017, at 10:08 AM, Ted Sussman <***@adina.com
>>>>
>>>> wrote:
>>>>>>>
>>>>>>> Hello Gilles and Ralph,
>>>>>>>
>>>>>>> Thank you for your advice so far. I appreciate the time that
>>> you
>>>> have spent to educate me about the details of Open MPI.
>>>>>>>
>>>>>>> But I think that there is something fundamental that I don't
>>>> understand. Consider Example 2 run with Open MPI 2.1.1.
>>>>>>>
>>>>>>> mpirun --> shell for process 0 --> executable for process 0 -->
>>>> MPI calls, MPI_Abort
>>>>>>> --> shell for process 1 --> executable for process 1 -->
>>>> MPI calls
>>>>>>>
>>>>>>> After the MPI_Abort is called, ps shows that both shells are
>>>> running, and that the executable for process 1 is running (in this
>>> case,
>>>> process 1 is sleeping). And mpirun does not exit until process 1 is
>>>> finished sleeping.
>>>>>>>
>>>>>>> I cannot reconcile this observed behavior with the statement
>>>>>>>
>>>>>>>> > 2.x: each process is put into its own process group
>>>> upon launch. When we issue a
>>>>>>>> > "kill", we issue it to the process group. Thus,
>>> every
>>>> child proc of that child proc will
>>>>>>>> > receive it. IIRC, this was the intended behavior.
>>>>>>>
>>>>>>> I assume that, for my example, there are two process groups.
>>> The
>>>> process group for process 0 contains the shell for process 0 and the
>>>> executable for process 0; and the process group for process 1 contains
>>>> the shell for process 1 and the executable for process 1. So what
>>> does
>>>> MPI_ABORT do? MPI_ABORT does not kill the process group for process 0,
>>>
>>>> since the shell for process 0 continues. And MPI_ABORT does not kill
>>>> the process group for process 1, since both the shell and executable
>>> for
>>>> process 1 continue.
>>>>>>>
>>>>>>> If I hit Ctrl-C after MPI_Abort is called, I get the message
>>>>>>>
>>>>>>> mpirun: abort is already in progress.. hit ctrl-c again to
>>>> forcibly terminate
>>>>>>>
>>>>>>> but I don't need to hit Ctrl-C again because mpirun immediately
>>>> exits.
>>>>>>>
>>>>>>> Can you shed some light on all of this?
>>>>>>>
>>>>>>> Sincerely,
>>>>>>>
>>>>>>> Ted Sussman
>>>>>>>
>>>>>>>
>>>>>>> On 15 Jun 2017 at 14:44, ***@open-mpi.org wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> You have to understand that we have no way of knowing who is
>>>> making MPI calls - all we see is
>>>>>>>> the proc that we started, and we know someone of that rank is
>>>> running (but we have no way of
>>>>>>>> knowing which of the procs you sub-spawned it is).
>>>>>>>>
>>>>>>>> So the behavior you are seeking only occurred in some earlier
>>>> release by sheer accident. Nor will
>>>>>>>> you find it portable as there is no specification directing
>>> that
>>>> behavior.
>>>>>>>>
>>>>>>>> The behavior IÂŽve provided is to either deliver the signal to
>>> _
>>>> all_ child processes (including
>>>>>>>> grandchildren etc.), or _only_ the immediate child of the
>>> daemon.
>>>> It wonÂŽt do what you describe -
>>>>>>>> kill the mPI proc underneath the shell, but not the shell
>>> itself.
>>>>>>>>
>>>>>>>> What you can eventually do is use PMIx to ask the runtime to
>>>> selectively deliver signals to
>>>>>>>> pid/procs for you. We donÂŽt have that capability implemented
>>>> just yet, IÂŽm afraid.
>>>>>>>>
>>>>>>>> Meantime, when I get a chance, I can code an option that will
>>>> record the pid of the subproc that
>>>>>>>> calls MPI_Init, and then letÂŽs you deliver signals to just
>>> that
>>>> proc. No promises as to when that will
>>>>>>>> be done.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Jun 15, 2017, at 1:37 PM, Ted Sussman <ted.sussman@
>>> adina.
>>>> com> wrote:
>>>>>>>>
>>>>>>>> Hello Ralph,
>>>>>>>>
>>>>>>>> I am just an Open MPI end user, so I will need to wait for
>>>> the next official release.
>>>>>>>>
>>>>>>>> mpirun --> shell for process 0 --> executable for process
>>> 0
>>>> --> MPI calls
>>>>>>>> --> shell for process 1 --> executable for process
>>> 1
>>>> --> MPI calls
>>>>>>>> ...
>>>>>>>>
>>>>>>>> I guess the question is, should MPI_ABORT kill the
>>>> executables or the shells? I naively
>>>>>>>> thought, that, since it is the executables that make the
>>> MPI
>>>> calls, it is the executables that
>>>>>>>> should be aborted by the call to MPI_ABORT. Since the
>>>> shells don't make MPI calls, the
>>>>>>>> shells should not be aborted.
>>>>>>>>
>>>>>>>> And users might have several layers of shells in between
>>>> mpirun and the executable.
>>>>>>>>
>>>>>>>> So now I will look for the latest version of Open MPI that
>>>> has the 1.4.3 behavior.
>>>>>>>>
>>>>>>>> Sincerely,
>>>>>>>>
>>>>>>>> Ted Sussman
>>>>>>>>
>>>>>>>> On 15 Jun 2017 at 12:31, ***@open-mpi.org wrote:
>>>>>>>>
>>>>>>>> >
>>>>>>>> > Yeah, things jittered a little there as we debated the "
>>>> right" behavior. Generally, when we
>>>>>>>> see that
>>>>>>>> > happening it means that a param is required, but somehow
>>>> we never reached that point.
>>>>>>>> >
>>>>>>>> > See if https://github.com/open-mpi/ompi/pull/3704 helps
>>> -
>>>> if so, I can schedule it for the next
>>>>>>>> 2.x
>>>>>>>> > release if the RMs agree to take it
>>>>>>>> >
>>>>>>>> > Ralph
>>>>>>>> >
>>>>>>>> > On Jun 15, 2017, at 12:20 PM, Ted Sussman <ted.
>>> sussman
>>>> @adina.com > wrote:
>>>>>>>> >
>>>>>>>> > Thank you for your comments.
>>>>>>>> >
>>>>>>>> > Our application relies upon "dum.sh" to clean up
>>> after
>>>> the process exits, either if the
>>>>>>>> process
>>>>>>>> > exits normally, or if the process exits abnormally
>>>> because of MPI_ABORT. If the process
>>>>>>>> > group is killed by MPI_ABORT, this clean up will not
>>>> be performed. If exec is used to launch
>>>>>>>> > the executable from dum.sh, then dum.sh is
>>> terminated
>>>> by the exec, so dum.sh cannot
>>>>>>>> > perform any clean up.
>>>>>>>> >
>>>>>>>> > I suppose that other user applications might work
>>>> similarly, so it would be good to have an
>>>>>>>> > MCA parameter to control the behavior of MPI_ABORT.
>>>>>>>> >
>>>>>>>> > We could rewrite our shell script that invokes
>>> mpirun,
>>>> so that the cleanup that is now done
>>>>>>>> > by
>>>>>>>> > dum.sh is done by the invoking shell script after
>>>> mpirun exits. Perhaps this technique is the
>>>>>>>> > preferred way to clean up after mpirun is invoked.
>>>>>>>> >
>>>>>>>> > By the way, I have also tested with Open MPI 1.10.7,
>>>> and Open MPI 1.10.7 has different
>>>>>>>> > behavior than either Open MPI 1.4.3 or Open MPI 2.1.
>>> 1.
>>>> In this explanation, it is important to
>>>>>>>> > know that the aborttest executable sleeps for 20 sec.
>>>>>>>> >
>>>>>>>> > When running example 2:
>>>>>>>> >
>>>>>>>> > 1.4.3: process 1 immediately aborts
>>>>>>>> > 1.10.7: process 1 doesn't abort and never stops.
>>>>>>>> > 2.1.1 process 1 doesn't abort, but stops after it is
>>>> finished sleeping
>>>>>>>> >
>>>>>>>> > Sincerely,
>>>>>>>> >
>>>>>>>> > Ted Sussman
>>>>>>>> >
>>>>>>>> > On 15 Jun 2017 at 9:18, ***@open-mpi.org wrote:
>>>>>>>> >
>>>>>>>> > Here is how the system is working:
>>>>>>>> >
>>>>>>>> > Master: each process is put into its own process
>>> group
>>>> upon launch. When we issue a
>>>>>>>> > "kill", however, we only issue it to the individual
>>>> process (instead of the process group
>>>>>>>> > that is headed by that child process). This is
>>>> probably a bug as I donÂŽt believe that is
>>>>>>>> > what we intended, but set that aside for now.
>>>>>>>> >
>>>>>>>> > 2.x: each process is put into its own process group
>>>> upon launch. When we issue a
>>>>>>>> > "kill", we issue it to the process group. Thus,
>>> every
>>>> child proc of that child proc will
>>>>>>>> > receive it. IIRC, this was the intended behavior.
>>>>>>>> >
>>>>>>>> > It is rather trivial to make the change (it only
>>>> involves 3 lines of code), but IÂŽm not sure
>>>>>>>> > of what our intended behavior is supposed to be.
>>> Once
>>>> we clarify that, it is also trivial
>>>>>>>> > to add another MCA param (you can never have too
>>> many!)
>>>> to allow you to select the
>>>>>>>> > other behavior.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On Jun 15, 2017, at 5:23 AM, Ted Sussman <ted.
>>> sussman@
>>>> adina.com > wrote:
>>>>>>>> >
>>>>>>>> > Hello Gilles,
>>>>>>>> >
>>>>>>>> > Thank you for your quick answer. I confirm that if
>>>> exec is used, both processes
>>>>>>>> > immediately
>>>>>>>> > abort.
>>>>>>>> >
>>>>>>>> > Now suppose that the line
>>>>>>>> >
>>>>>>>> > echo "After aborttest:
>>>>>>>> > OMPI_COMM_WORLD_RANK="$OMPI_COMM_WORLD_RANK
>>>>>>>> >
>>>>>>>> > is added to the end of dum.sh.
>>>>>>>> >
>>>>>>>> > If Example 2 is run with Open MPI 1.4.3, the output
>>> is
>>>>>>>> >
>>>>>>>> > After aborttest: OMPI_COMM_WORLD_RANK=0
>>>>>>>> >
>>>>>>>> > which shows that the shell script for the process
>>> with
>>>> rank 0 continues after the
>>>>>>>> > abort,
>>>>>>>> > but that the shell script for the process with rank
>>> 1
>>>> does not continue after the
>>>>>>>> > abort.
>>>>>>>> >
>>>>>>>> > If Example 2 is run with Open MPI 2.1.1, with exec
>>>> used to invoke
>>>>>>>> > aborttest02.exe, then
>>>>>>>> > there is no such output, which shows that both shell
>>>> scripts do not continue after
>>>>>>>> > the abort.
>>>>>>>> >
>>>>>>>> > I prefer the Open MPI 1.4.3 behavior because our
>>>> original application depends
>>>>>>>> > upon the
>>>>>>>> > Open MPI 1.4.3 behavior. (Our original application
>>>> will also work if both
>>>>>>>> > executables are
>>>>>>>> > aborted, and if both shell scripts continue after
>>> the
>>>> abort.)
>>>>>>>> >
>>>>>>>> > It might be too much to expect, but is there a way
>>> to
>>>> recover the Open MPI 1.4.3
>>>>>>>> > behavior
>>>>>>>> > using Open MPI 2.1.1?
>>>>>>>> >
>>>>>>>> > Sincerely,
>>>>>>>> >
>>>>>>>> > Ted Sussman
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On 15 Jun 2017 at 9:50, Gilles Gouaillardet wrote:
>>>>>>>> >
>>>>>>>> > Ted,
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > fwiw, the 'master' branch has the behavior you
>>> expect.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > meanwhile, you can simple edit your 'dum.sh' script
>>>> and replace
>>>>>>>> >
>>>>>>>> > /home/buildadina/src/aborttest02/aborttest02.exe
>>>>>>>> >
>>>>>>>> > with
>>>>>>>> >
>>>>>>>> > exec /home/buildadina/src/aborttest02/aborttest02.
>>> exe
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Cheers,
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Gilles
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On 6/15/2017 3:01 AM, Ted Sussman wrote:
>>>>>>>> > Hello,
>>>>>>>> >
>>>>>>>> > My question concerns MPI_ABORT, indirect execution
>>> of
>>>>>>>> > executables by mpirun and Open
>>>>>>>> > MPI 2.1.1. When mpirun runs executables directly,
>>> MPI
>>>> _ABORT
>>>>>>>> > works as expected, but
>>>>>>>> > when mpirun runs executables indirectly, MPI_ABORT
>>>> does not
>>>>>>>> > work as expected.
>>>>>>>> >
>>>>>>>> > If Open MPI 1.4.3 is used instead of Open MPI 2.1.1,
>>>> MPI_ABORT
>>>>>>>> > works as expected in all
>>>>>>>> > cases.
>>>>>>>> >
>>>>>>>> > The examples given below have been simplified as far
>>>> as possible
>>>>>>>> > to show the issues.
>>>>>>>> >
>>>>>>>> > ---
>>>>>>>> >
>>>>>>>> > Example 1
>>>>>>>> >
>>>>>>>> > Consider an MPI job run in the following way:
>>>>>>>> >
>>>>>>>> > mpirun ... -app addmpw1
>>>>>>>> >
>>>>>>>> > where the appfile addmpw1 lists two executables:
>>>>>>>> >
>>>>>>>> > -n 1 -host gulftown ... aborttest02.exe
>>>>>>>> > -n 1 -host gulftown ... aborttest02.exe
>>>>>>>> >
>>>>>>>> > The two executables are executed on the local node
>>>> gulftown.
>>>>>>>> > aborttest02 calls MPI_ABORT
>>>>>>>> > for rank 0, then sleeps.
>>>>>>>> >
>>>>>>>> > The above MPI job runs as expected. Both processes
>>>> immediately
>>>>>>>> > abort when rank 0 calls
>>>>>>>> > MPI_ABORT.
>>>>>>>> >
>>>>>>>> > ---
>>>>>>>> >
>>>>>>>> > Example 2
>>>>>>>> >
>>>>>>>> > Now change the above example as follows:
>>>>>>>> >
>>>>>>>> > mpirun ... -app addmpw2
>>>>>>>> >
>>>>>>>> > where the appfile addmpw2 lists shell scripts:
>>>>>>>> >
>>>>>>>> > -n 1 -host gulftown ... dum.sh
>>>>>>>> > -n 1 -host gulftown ... dum.sh
>>>>>>>> >
>>>>>>>> > dum.sh invokes aborttest02.exe. So aborttest02.exe
>>> is
>>>> executed
>>>>>>>> > indirectly by mpirun.
>>>>>>>> >
>>>>>>>> > In this case, the MPI job only aborts process 0 when
>>>> rank 0 calls
>>>>>>>> > MPI_ABORT. Process 1
>>>>>>>> > continues to run. This behavior is unexpected.
>>>>>>>> >
>>>>>>>> > ----
>>>>>>>> >
>>>>>>>> > I have attached all files to this E-mail. Since
>>> there
>>>> are absolute
>>>>>>>> > pathnames in the files, to
>>>>>>>> > reproduce my findings, you will need to update the
>>>> pathnames in the
>>>>>>>> > appfiles and shell
>>>>>>>> > scripts. To run example 1,
>>>>>>>> >
>>>>>>>> > sh run1.sh
>>>>>>>> >
>>>>>>>> > and to run example 2,
>>>>>>>> >
>>>>>>>> > sh run2.sh
>>>>>>>> >
>>>>>>>> > ---
>>>>>>>> >
>>>>>>>> > I have tested these examples with Open MPI 1.4.3 and
>>> 2.
>>>> 0.3. In
>>>>>>>> > Open MPI 1.4.3, both
>>>>>>>> > examples work as expected. Open MPI 2.0.3 has the
>>>> same behavior
>>>>>>>> > as Open MPI 2.1.1.
>>>>>>>> >
>>>>>>>> > ---
>>>>>>>> >
>>>>>>>> > I would prefer that Open MPI 2.1.1 aborts both
>>>> processes, even
>>>>>>>> > when the executables are
>>>>>>>> > invoked indirectly by mpirun. If there is an MCA
>>>> setting that is
>>>>>>>> > needed to make Open MPI
>>>>>>>> > 2.1.1 abort both processes, please let me know.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Sincerely,
>>>>>>>> >
>>>>>>>> > Theodore Sussman
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > The following section of this message contains a
>>> file
>>>> attachment
>>>>>>>> > prepared for transmission using the Internet MIME
>>>> message format.
>>>>>>>> > If you are using Pegasus Mail, or any other MIME-
>>>> compliant system,
>>>>>>>> > you should be able to save it or view it from within
>>>> your mailer.
>>>>>>>> > If you cannot, please ask your system administrator
>>>> for assistance.
>>>>>>>> >
>>>>>>>> > ---- File information -----------
>>>>>>>> > File: config.log.bz2
>>>>>>>> > Date: 14 Jun 2017, 13:35
>>>>>>>> > Size: 146548 bytes.
>>>>>>>> > Type: Binary
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > The following section of this message contains a
>>> file
>>>> attachment
>>>>>>>> > prepared for transmission using the Internet MIME
>>>> message format.
>>>>>>>> > If you are using Pegasus Mail, or any other MIME-
>>>> compliant system,
>>>>>>>> > you should be able to save it or view it from within
>>>> your mailer.
>>>>>>>> > If you cannot, please ask your system administrator
>>>> for assistance.
>>>>>>>> >
>>>>>>>> > ---- File information -----------
>>>>>>>> > File: ompi_info.bz2
>>>>>>>> > Date: 14 Jun 2017, 13:35
>>>>>>>> > Size: 24088 bytes.
>>>>>>>> > Type: Binary
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > The following section of this message contains a
>>> file
>>>> attachment
>>>>>>>> > prepared for transmission using the Internet MIME
>>>> message format.
>>>>>>>> > If you are using Pegasus Mail, or any other MIME-
>>>> compliant system,
>>>>>>>> > you should be able to save it or view it from within
>>>> your mailer.
>>>>>>>> > If you cannot, please ask your system administrator
>>>> for assistance.
>>>>>>>> >
>>>>>>>> > ---- File information -----------
>>>>>>>> > File: aborttest02.tgz
>>>>>>>> > Date: 14 Jun 2017, 13:52
>>>>>>>> > Size: 4285 bytes.
>>>>>>>> > Type: Binary
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > _______________________________________________
>>>>>>>> > users mailing list
>>>>>>>> > ***@lists.open-mpi.org
>>>>>>>> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>
>>>>
>>>>>>>> >
>>>>>>>> > _______________________________________________
>>>>>>>> > users mailing list
>>>>>>>> > ***@lists.open-mpi.org
>>>>>>>> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>
>>>>
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > _______________________________________________
>>>>>>>> > users mailing list
>>>>>>>> > ***@lists.open-mpi.org
>>>>>>>> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>
>>>>
>>>>>>>> >
>>>>>>>> > _______________________________________________
>>>>>>>> > users mailing list
>>>>>>>> > ***@lists.open-mpi.org
>>>>>>>> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>
>>>>
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > _______________________________________________
>>>>>>>> > users mailing list
>>>>>>>> > ***@lists.open-mpi.org
>>>>>>>> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>
>>>>
>>>>>>>> >
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> ***@lists.open-mpi.org
>>>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> ***@lists.open-mpi.org
>>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jeff Squyres
>>>>>> ***@cisco.com
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> ***@lists.open-mpi.org
>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> ***@lists.open-mpi.org
>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> ***@lists.open-mpi.org
>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>>
>>> _______________________________________________
>>> users mailing list
>>> ***@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
>>
>
>
> The following section of this message contains a file attachment
> prepared for transmission using the Internet MIME message format.
> If you are using Pegasus Mail, or any other MIME-compliant system,
> you should be able to save it or view it from within your mailer.
> If you cannot, please ask your system administrator for assistance.
>
> ---- File information -----------
> File: aborttest10.tgz
> Date: 19 Jun 2017, 12:42
> Size: 4740 bytes.
> Type: Binary
> <aborttest10.tgz>_______________________________________________
> users mailing list
> ***@lists.open-mpi.org <mailto:***@lists.open-mpi.org>
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
Ted Sussman
2017-06-19 17:19:47 UTC
Permalink
If I replace the sleep with an infinite loop, I get the same behavior. One "aborttest" process
remains after all the signals are sent.

On 19 Jun 2017 at 10:10, ***@open-mpi.org wrote:

>
> That is typical behavior when you throw something into "sleep" - not much we can do about it, I
> think.
>
> On Jun 19, 2017, at 9:58 AM, Ted Sussman <***@adina.com> wrote:
>
> Hello,
>
> I have rebuilt Open MPI 2.1.1 on the same computer, including --enable-debug.
>
> I have attached the abort test program aborttest10.tgz.  This version sleeps for 5 sec before
> calling MPI_ABORT, so that I can check the pids using ps.
>
> This is what happens (see run2.sh.out).
>
> Open MPI invokes two instances of dum.sh.  Each instance of dum.sh invokes aborttest.exe.
>
> Pid    Process
> -------------------
> 19565  dum.sh
> 19566  dum.sh
> 19567 aborttest10.exe
> 19568 aborttest10.exe
>
> When MPI_ABORT is called, Open MPI sends SIGCONT, SIGTERM and SIGKILL to both
> instances of dum.sh (pids 19565 and 19566).
>
> ps shows that both the shell processes vanish, and that one of the aborttest10.exe processes
> vanishes.  But the other aborttest10.exe remains and continues until it is finished sleeping.
>
> Hope that this information is useful.
>
> Sincerely,
>
> Ted Sussman
>
>
>
> On 19 Jun 2017 at 23:06,  ***@rist.or.jp  wrote:
>
>
>  Ted,
>  
> some traces are missing  because you did not configure with --enable-debug
> i am afraid you have to do it (and you probably want to install that debug version in an
> other
> location since its performances are not good for production) in order to get all the logs.
>  
> Cheers,
>  
> Gilles
>  
> ----- Original Message -----
>    Hello Gilles,
>
>    I retried my example, with the same results as I observed before.  The process with rank
> 1
>    does not get killed by MPI_ABORT.
>
>    I have attached to this E-mail:
>
>      config.log.bz2
>      ompi_info.bz2  (uses ompi_info -a)
>      aborttest09.tgz
>
>    This testing is done on a computer running Linux 3.10.0.  This is a different computer
> than
>    the computer that I previously used for testing.  You can confirm that I am using Open
> MPI
>    2.1.1.
>
>    tar xvzf aborttest09.tgz
>    cd aborttest09
>    ./sh run2.sh
>
>    run2.sh contains the command
>
>    /opt/openmpi-2.1.1-GNU/bin/mpirun -np 2 -mca btl tcp,self --mca odls_base_verbose
> 10
>    ./dum.sh
>
>    The output from this run is in aborttest09/run2.sh.out.
>
>    The output shows that the the "default" component is selected by odls.
>
>    The only messages from odls are: odls: launch spawning child ...  (two messages).
> There
>    are no messages from odls with "kill" and I see no SENDING SIGCONT / SIGKILL
>    messages.
>
>    I am not running from within any batch manager.
>
>    Sincerely,
>
>    Ted Sussman
>
>    On 17 Jun 2017 at 16:02, ***@rist.or.jp wrote:
>
> Ted,
>
> i do not observe the same behavior you describe with Open MPI 2.1.1
>
> # mpirun -np 2 -mca btl tcp,self --mca odls_base_verbose 5 ./abort.sh
>
> abort.sh 31361 launching abort
> abort.sh 31362 launching abort
> I am rank 0 with pid 31363
> I am rank 1 with pid 31364
> ------------------------------------------------------------------------
> --
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> with errorcode 1.
>
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> ------------------------------------------------------------------------
> --
> [linux:31356] [[18199,0],0] odls:kill_local_proc working on WILDCARD
> [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> [[18199,1],0]
> [linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],0]
> [linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 31361
> SUCCESS
> [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> [[18199,1],1]
> [linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],1]
> [linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 31362
> SUCCESS
> [linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],0]
> [linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 31361
> SUCCESS
> [linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],1]
> [linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 31362
> SUCCESS
> [linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],0]
> [linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 31361
> SUCCESS
> [linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],1]
> [linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 31362
> SUCCESS
> [linux:31356] [[18199,0],0] odls:kill_local_proc working on WILDCARD
> [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> [[18199,1],0]
> [linux:31356] [[18199,0],0] odls:kill_local_proc child [[18199,1],0] is
> not alive
> [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> [[18199,1],1]
> [linux:31356] [[18199,0],0] odls:kill_local_proc child [[18199,1],1] is
> not alive
>
>
> Open MPI did kill both shells, and they were indeed killed as evidenced
> by ps
>
> #ps -fu gilles --forest
> UID        PID  PPID  C STIME TTY          TIME CMD
> gilles    1564  1561  0 15:39 ?        00:00:01 sshd: ***@pts/1
> gilles    1565  1564  0 15:39 pts/1    00:00:00  \_ -bash
> gilles   31356  1565  3 15:57 pts/1    00:00:00      \_ /home/gilles/
> local/ompi-v2.x/bin/mpirun -np 2 -mca btl tcp,self --mca odls_base
> gilles   31364     1  1 15:57 pts/1    00:00:00 ./abort
>
>
> so trapping SIGTERM in your shell and manually killing the MPI task
> should work
> (as Jeff explained, as long as the shell script is fast enough to do
> that between SIGTERM and SIGKILL)
>
>
> if you observe a different behavior, please double check your Open MPI
> version and post the outputs of the same commands.
>
> btw, are you running from a batch manager ? if yes, which one ?
>
> Cheers,
>
> Gilles
>
> ----- Original Message -----
> Ted,
>
> if you
>
> mpirun --mca odls_base_verbose 10 ...
>
> you will see which processes get killed and how
>
> Best regards,
>
>
> Gilles
>
> ----- Original Message -----
> Hello Jeff,
>
> Thanks for your comments.
>
> I am not seeing behavior #4, on the two computers that I have
> tested
> on, using Open MPI
> 2.1.1.
>
> I wonder if you can duplicate my results with the files that I have
> uploaded.
>
> Regarding what is the "correct" behavior, I am willing to modify my
> application to correspond
> to Open MPI's behavior (whatever behavior the Open MPI
> developers
> decide is best) --
> provided that Open MPI does in fact kill off both shells.
>
> So my highest priority now is to find out why Open MPI 2.1.1 does
> not
> kill off both shells on
> my computer.
>
> Sincerely,
>
> Ted Sussman
>
>   On 16 Jun 2017 at 16:35, Jeff Squyres (jsquyres) wrote:
>
> Ted --
>
> Sorry for jumping in late.  Here's my $0.02...
>
> In the runtime, we can do 4 things:
>
> 1. Kill just the process that we forked.
> 2. Kill just the process(es) that call back and identify
> themselves
> as MPI processes (we don't track this right now, but we could add that
> functionality).
> 3. Union of #1 and #2.
> 4. Kill all processes (to include any intermediate processes
> that
> are not included in #1 and #2).
>
> In Open MPI 2.x, #4 is the intended behavior.  There may be a
> bug
> or
> two that needs to get fixed (e.g., in your last mail, I don't see
> offhand why it waits until the MPI process finishes sleeping), but we
> should be killing the process group, which -- unless any of the
> descendant processes have explicitly left the process group -- should
> hit the entire process tree. 
>
> Sidenote: there's actually a way to be a bit more aggressive
> and
> do
> a better job of ensuring that we kill *all* processes (via creative
> use
> of PR_SET_CHILD_SUBREAPER), but that's basically a future
> enhancement
> /
> optimization.
>
> I think Gilles and Ralph proposed a good point to you: if you
> want
> to be sure to be able to do cleanup after an MPI process terminates (
> normally or abnormally), you should trap signals in your intermediate
> processes to catch what Open MPI's runtime throws and therefore know
> that it is time to cleanup. 
>
> Hypothetically, this should work in all versions of Open MPI...?
>
> I think Ralph made a pull request that adds an MCA param to
> change
> the default behavior from #4 to #1.
>
> Note, however, that there's a little time between when Open
> MPI
> sends the SIGTERM and the SIGKILL, so this solution could be racy.  If
> you find that you're running out of time to cleanup, we might be able
> to
> make the delay between the SIGTERM and SIGKILL be configurable
> (e.g.,
> via MCA param).
>
>
>
>
> On Jun 16, 2017, at 10:08 AM, Ted Sussman
> <***@adina.com
>
> wrote:
>
> Hello Gilles and Ralph,
>
> Thank you for your advice so far.  I appreciate the time
> that
> you
> have spent to educate me about the details of Open MPI.
>
> But I think that there is something fundamental that I
> don't
> understand.  Consider Example 2 run with Open MPI 2.1.1.
>
> mpirun --> shell for process 0 -->  executable for process
> 0 -->
> MPI calls, MPI_Abort
>         --> shell for process 1 -->  executable for process 1 -->
> MPI calls
>
> After the MPI_Abort is called, ps shows that both shells
> are
> running, and that the executable for process 1 is running (in this
> case,
> process 1 is sleeping).  And mpirun does not exit until process 1 is
> finished sleeping.
>
> I cannot reconcile this observed behavior with the
> statement
>
>       >     2.x: each process is put into its own process group
> upon launch. When we issue a
>      >     "kill", we issue it to the process group. Thus,
> every
> child proc of that child proc will
>      >     receive it. IIRC, this was the intended behavior.
>
> I assume that, for my example, there are two process
> groups. 
> The
> process group for process 0 contains the shell for process 0 and the
> executable for process 0; and the process group for process 1 contains
> the shell for process 1 and the executable for process 1.  So what
> does
> MPI_ABORT do?  MPI_ABORT does not kill the process group for process
> 0,
>  
> since the shell for process 0 continues.  And MPI_ABORT does not kill
> the process group for process 1, since both the shell and executable
> for
> process 1 continue.
>
> If I hit Ctrl-C after MPI_Abort is called, I get the message
>
> mpirun: abort is already in progress.. hit ctrl-c again to
> forcibly terminate
>
> but I don't need to hit Ctrl-C again because mpirun
> immediately
> exits.
>
> Can you shed some light on all of this?
>
> Sincerely,
>
> Ted Sussman
>
>
> On 15 Jun 2017 at 14:44, ***@open-mpi.org wrote:
>
>
> You have to understand that we have no way of
> knowing who is
> making MPI calls - all we see is
> the proc that we started, and we know someone of
> that rank is
> running (but we have no way of
> knowing which of the procs you sub-spawned it is).
>
> So the behavior you are seeking only occurred in
> some earlier
> release by sheer accident. Nor will
> you find it portable as there is no specification
> directing
> that
> behavior.
>
> The behavior I´ve provided is to either deliver the
> signal to
> _
> all_ child processes (including
> grandchildren etc.), or _only_ the immediate child
> of the
> daemon.
>   It won´t do what you describe -
> kill the mPI proc underneath the shell, but not the
> shell
> itself.
>
> What you can eventually do is use PMIx to ask the
> runtime to
> selectively deliver signals to
> pid/procs for you. We don´t have that capability
> implemented
> just yet, I´m afraid.
>
> Meantime, when I get a chance, I can code an
> option that will
> record the pid of the subproc that
> calls MPI_Init, and then let´s you deliver signals to
> just
> that
> proc. No promises as to when that will
> be done.
>
>
>       On Jun 15, 2017, at 1:37 PM, Ted Sussman
> <ted.sussman@
> adina.
> com> wrote:
>
>      Hello Ralph,
>
>       I am just an Open MPI end user, so I will need to
> wait for
> the next official release.
>
>      mpirun --> shell for process 0 -->  executable for
> process
> 0
> --> MPI calls
>              --> shell for process 1 -->  executable for process
> 1
> --> MPI calls
>                                       ...
>
>      I guess the question is, should MPI_ABORT kill the
> executables or the shells?  I naively
>      thought, that, since it is the executables that make
> the
> MPI
> calls, it is the executables that
>      should be aborted by the call to MPI_ABORT.  Since
> the
> shells don't make MPI calls, the
>       shells should not be aborted.
>
>      And users might have several layers of shells in
> between
> mpirun and the executable.
>
>      So now I will look for the latest version of Open MPI
> that
> has the 1.4.3 behavior.
>
>      Sincerely,
>
>      Ted Sussman
>
>       On 15 Jun 2017 at 12:31, ***@open-mpi.org wrote:
>
>      >
>       > Yeah, things jittered a little there as we debated
> the "
> right" behavior. Generally, when we
>      see that
>      > happening it means that a param is required, but
> somehow
> we never reached that point.
>      >
>      > See if https://github.com/open-mpi/ompi/pull/3704 
> helps
> -
> if so, I can schedule it for the next
>      2.x
>       > release if the RMs agree to take it
>      >
>      > Ralph
>       >
>      >     On Jun 15, 2017, at 12:20 PM, Ted Sussman <ted.
> sussman
> @adina.com > wrote:
>       >
>      >     Thank you for your comments.
>       >   
>      >     Our application relies upon "dum.sh" to clean up
> after
> the process exits, either if the
>       process
>      >     exits normally, or if the process exits abnormally
> because of MPI_ABORT.  If the process
>       >     group is killed by MPI_ABORT, this clean up will not
> be performed.  If exec is used to launch
>      >     the executable from dum.sh, then dum.sh is
> terminated
> by the exec, so dum.sh cannot
>      >     perform any clean up.
>      >   
>       >     I suppose that other user applications might work
> similarly, so it would be good to have an
>      >     MCA parameter to control the behavior of
> MPI_ABORT.
>      >   
>      >     We could rewrite our shell script that invokes
> mpirun,
> so that the cleanup that is now done
>      >     by
>       >     dum.sh is done by the invoking shell script after
> mpirun exits.  Perhaps this technique is the
>      >     preferred way to clean up after mpirun is invoked.
>       >   
>      >     By the way, I have also tested with Open MPI
> 1.10.7,
> and Open MPI 1.10.7 has different
>       >     behavior than either Open MPI 1.4.3 or Open MPI
> 2.1.
> 1.
>    In this explanation, it is important to
>       >     know that the aborttest executable sleeps for 20
> sec.
>      >   
>       >     When running example 2:
>      >   
>      >     1.4.3: process 1 immediately aborts
>      >     1.10.7: process 1 doesn't abort and never stops.
>       >     2.1.1 process 1 doesn't abort, but stops after it is
> finished sleeping
>      >   
>      >     Sincerely,
>      >   
>      >     Ted Sussman
>       >   
>      >     On 15 Jun 2017 at 9:18, ***@open-mpi.org wrote:
>      >
>      >     Here is how the system is working:
>       >   
>      >     Master: each process is put into its own process
> group
> upon launch. When we issue a
>      >     "kill", however, we only issue it to the individual
> process (instead of the process group
>      >     that is headed by that child process). This is
> probably a bug as I don´t believe that is
>      >     what we intended, but set that aside for now.
>       >   
>      >     2.x: each process is put into its own process group
> upon launch. When we issue a
>      >     "kill", we issue it to the process group. Thus,
> every
> child proc of that child proc will
>      >     receive it. IIRC, this was the intended behavior.
>       >   
>      >     It is rather trivial to make the change (it only
> involves 3 lines of code), but I´m not sure
>      >     of what our intended behavior is supposed to be.
> Once
> we clarify that, it is also trivial
>      >     to add another MCA param (you can never have too
> many!)
>   to allow you to select the
>      >     other behavior.
>      >   
>      >
>       >     On Jun 15, 2017, at 5:23 AM, Ted Sussman <ted.
> sussman@
> adina.com > wrote:
>      >   
>      >     Hello Gilles,
>      >   
>       >     Thank you for your quick answer.  I confirm that if
> exec is used, both processes
>      >     immediately
>       >     abort.
>      >   
>       >     Now suppose that the line
>      >   
>      >     echo "After aborttest:
>      >    
> OMPI_COMM_WORLD_RANK="$OMPI_COMM_
> WORLD_RANK
>       >   
>      >     is added to the end of dum.sh.
>      >   
>      >     If Example 2 is run with Open MPI 1.4.3, the output
> is
>      >   
>      >     After aborttest: OMPI_COMM_WORLD_RANK=0
>      >   
>      >     which shows that the shell script for the process
> with
> rank 0 continues after the
>       >     abort,
>      >     but that the shell script for the process with rank
> 1
> does not continue after the
>       >     abort.
>      >   
>       >     If Example 2 is run with Open MPI 2.1.1, with exec
> used to invoke
>      >     aborttest02.exe, then
>      >     there is no such output, which shows that both shell
> scripts do not continue after
>      >     the abort.
>      >   
>       >     I prefer the Open MPI 1.4.3 behavior because our
> original application depends
>      >     upon the
>       >     Open MPI 1.4.3 behavior.  (Our original application
> will also work if both
>      >     executables are
>       >     aborted, and if both shell scripts continue after
> the
> abort.)
>      >   
>       >     It might be too much to expect, but is there a way
> to
> recover the Open MPI 1.4.3
>      >     behavior
>       >     using Open MPI 2.1.1? 
>      >   
>       >     Sincerely,
>      >   
>      >     Ted Sussman
>      >   
>      >   
>       >     On 15 Jun 2017 at 9:50, Gilles Gouaillardet wrote:
>      >
>      >     Ted,
>      >   
>       >   
>      >     fwiw, the 'master' branch has the behavior you
> expect.
>      >   
>      >   
>      >     meanwhile, you can simple edit your 'dum.sh' script
> and replace
>       >   
>      >     /home/buildadina/src/aborttest02/aborttest02.exe
>       >   
>      >     with
>       >   
>      >     exec /home/buildadina/src/aborttest02/aborttest02.
> exe
>       >   
>      >   
>      >     Cheers,
>      >   
>      >   
>      >     Gilles
>      >   
>       >   
>      >     On 6/15/2017 3:01 AM, Ted Sussman wrote:
>       >     Hello,
>      >   
>      >     My question concerns MPI_ABORT, indirect
> execution
> of
>      >     executables by mpirun and Open
>      >     MPI 2.1.1.  When mpirun runs executables directly,
> MPI
> _ABORT
>      >     works as expected, but
>       >     when mpirun runs executables indirectly,
> MPI_ABORT
> does not
>      >     work as expected.
>      >   
>      >     If Open MPI 1.4.3 is used instead of Open MPI
> 2.1.1,
> MPI_ABORT
>      >     works as expected in all
>       >     cases.
>      >   
>       >     The examples given below have been simplified as
> far
> as possible
>      >     to show the issues.
>      >   
>      >     ---
>      >   
>       >     Example 1
>      >   
>       >     Consider an MPI job run in the following way:
>      >   
>       >     mpirun ... -app addmpw1
>      >   
>      >     where the appfile addmpw1 lists two executables:
>      >   
>      >     -n 1 -host gulftown ... aborttest02.exe
>      >     -n 1 -host gulftown ... aborttest02.exe
>       >   
>      >     The two executables are executed on the local node
> gulftown.
>      >      aborttest02 calls MPI_ABORT
>      >     for rank 0, then sleeps.
>      >   
>      >     The above MPI job runs as expected.  Both
> processes
> immediately
>      >     abort when rank 0 calls
>      >     MPI_ABORT.
>      >   
>       >     ---
>      >   
>       >     Example 2
>      >   
>      >     Now change the above example as follows:
>      >   
>      >     mpirun ... -app addmpw2
>      >   
>      >     where the appfile addmpw2 lists shell scripts:
>      >   
>      >     -n 1 -host gulftown ... dum.sh
>      >     -n 1 -host gulftown ... dum.sh
>      >   
>      >     dum.sh invokes aborttest02.exe.  So aborttest02.exe
> is
> executed
>      >     indirectly by mpirun.
>      >   
>      >     In this case, the MPI job only aborts process 0 when
> rank 0 calls
>       >     MPI_ABORT.  Process 1
>      >     continues to run.  This behavior is unexpected.
>      >   
>      >     ----
>       >   
>      >     I have attached all files to this E-mail.  Since
> there
> are absolute
>       >     pathnames in the files, to
>      >     reproduce my findings, you will need to update the
> pathnames in the
>       >     appfiles and shell
>      >     scripts.  To run example 1,
>       >   
>      >     sh run1.sh
>       >   
>      >     and to run example 2,
>      >   
>      >     sh run2.sh
>      >   
>       >     ---
>      >   
>       >     I have tested these examples with Open MPI 1.4.3
> and
> 2.
> 0.3.  In
>      >     Open MPI 1.4.3, both
>       >     examples work as expected.  Open MPI 2.0.3 has
> the
> same behavior
>      >     as Open MPI 2.1.1.
>      >   
>      >     ---
>       >   
>      >     I would prefer that Open MPI 2.1.1 aborts both
> processes, even
>      >     when the executables are
>      >     invoked indirectly by mpirun.  If there is an MCA
> setting that is
>      >     needed to make Open MPI
>      >     2.1.1 abort both processes, please let me know.
>       >   
>      >   
>      >     Sincerely,
>      >   
>      >     Theodore Sussman
>       >   
>      >   
>       >     The following section of this message contains a
> file
> attachment
>      >     prepared for transmission using the Internet MIME
> message format.
>       >     If you are using Pegasus Mail, or any other MIME-
> compliant system,
>      >     you should be able to save it or view it from within
> your mailer.
>      >     If you cannot, please ask your system administrator
> for assistance.
>      >   
>      >       ---- File information -----------
>      >         File:  config.log.bz2
>      >         Date:  14 Jun 2017, 13:35
>      >         Size:  146548 bytes.
>       >         Type:  Binary
>      >   
>       >   
>      >     The following section of this message contains a
> file
> attachment
>       >     prepared for transmission using the Internet MIME
> message format.
>      >     If you are using Pegasus Mail, or any other MIME-
> compliant system,
>      >     you should be able to save it or view it from within
> your mailer.
>      >     If you cannot, please ask your system administrator
> for assistance.
>      >   
>      >       ---- File information -----------
>      >         File:  ompi_info.bz2
>      >         Date:  14 Jun 2017, 13:35
>       >         Size:  24088 bytes.
>      >         Type:  Binary
>       >   
>      >   
>       >     The following section of this message contains a
> file
> attachment
>      >     prepared for transmission using the Internet MIME
> message format.
>       >     If you are using Pegasus Mail, or any other MIME-
> compliant system,
>      >     you should be able to save it or view it from within
> your mailer.
>      >     If you cannot, please ask your system administrator
> for assistance.
>      >   
>      >       ---- File information -----------
>      >         File:  aborttest02.tgz
>      >         Date:  14 Jun 2017, 13:52
>      >         Size:  4285 bytes.
>       >         Type:  Binary
>      >   
>       >   
>      >    
> ________________________________________
> _______
>       >     users mailing list
>      >     ***@lists.open-mpi.org
>       >    
> https://rfd.newmexicoconsortium.org/mailman/listin
> fo/users
>
>
>      >   
>      >    
> ________________________________________
> _______
>       >     users mailing list
>      >     ***@lists.open-mpi.org
>      >    
> https://rfd.newmexicoconsortium.org/mailman/listin
> fo/users
>
>
>      >   
>      >   
>       >   
>      >    
> ________________________________________
> _______
>       >     users mailing list
>      >     ***@lists.open-mpi.org
>       >    
> https://rfd.newmexicoconsortium.org/mailman/listin
> fo/users
>
>
>      >   
>      >    
> ________________________________________
> _______
>       >     users mailing list
>      >     ***@lists.open-mpi.org
>      >    
> https://rfd.newmexicoconsortium.org/mailman/listin
> fo/users
>
>
>      >   
>      >   
>       >   
>      >    
> ________________________________________
> _______
>       >     users mailing list
>      >     ***@lists.open-mpi.org
>       >    
> https://rfd.newmexicoconsortium.org/mailman/listin
> fo/users
>
>
>      >
>
>       
>      __________________________________________
> _____
>       users mailing list
>      ***@lists.open-mpi.org
>     
>  https://rfd.newmexicoconsortium.org/mailman/listin
> fo/users
>
>
>   
> _____________________________________________
> __
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/us
> ers
>
>
> --
> Jeff Squyres
> ***@cisco.com
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>      
>
>
> The following section of this message contains a file attachment
> prepared for transmission using the Internet MIME message format.
> If you are using Pegasus Mail, or any other MIME-compliant system,
> you should be able to save it or view it from within your mailer.
> If you cannot, please ask your system administrator for assistance.
>
>   ---- File information -----------
>     File:  aborttest10.tgz
>     Date:  19 Jun 2017, 12:42
>     Size:  4740 bytes.
>     Type:  Binary
> <aborttest10.tgz>_______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
r***@open-mpi.org
2017-06-19 17:30:33 UTC
Permalink
When you fork that process off, do you set its process group? Or is it in the same process group as the shell script?

> On Jun 19, 2017, at 10:19 AM, Ted Sussman <***@adina.com> wrote:
>
> If I replace the sleep with an infinite loop, I get the same behavior. One "aborttest" process
> remains after all the signals are sent.
>
> On 19 Jun 2017 at 10:10, ***@open-mpi.org wrote:
>
>>
>> That is typical behavior when you throw something into "sleep" - not much we can do about it, I
>> think.
>>
>> On Jun 19, 2017, at 9:58 AM, Ted Sussman <***@adina.com> wrote:
>>
>> Hello,
>>
>> I have rebuilt Open MPI 2.1.1 on the same computer, including --enable-debug.
>>
>> I have attached the abort test program aborttest10.tgz. This version sleeps for 5 sec before
>> calling MPI_ABORT, so that I can check the pids using ps.
>>
>> This is what happens (see run2.sh.out).
>>
>> Open MPI invokes two instances of dum.sh. Each instance of dum.sh invokes aborttest.exe.
>>
>> Pid Process
>> -------------------
>> 19565 dum.sh
>> 19566 dum.sh
>> 19567 aborttest10.exe
>> 19568 aborttest10.exe
>>
>> When MPI_ABORT is called, Open MPI sends SIGCONT, SIGTERM and SIGKILL to both
>> instances of dum.sh (pids 19565 and 19566).
>>
>> ps shows that both the shell processes vanish, and that one of the aborttest10.exe processes
>> vanishes. But the other aborttest10.exe remains and continues until it is finished sleeping.
>>
>> Hope that this information is useful.
>>
>> Sincerely,
>>
>> Ted Sussman
>>
>>
>>
>> On 19 Jun 2017 at 23:06, ***@rist.or.jp wrote:
>>
>>
>> Ted,
>>
>> some traces are missing because you did not configure with --enable-debug
>> i am afraid you have to do it (and you probably want to install that debug version in an
>> other
>> location since its performances are not good for production) in order to get all the logs.
>>
>> Cheers,
>>
>> Gilles
>>
>> ----- Original Message -----
>> Hello Gilles,
>>
>> I retried my example, with the same results as I observed before. The process with rank
>> 1
>> does not get killed by MPI_ABORT.
>>
>> I have attached to this E-mail:
>>
>> config.log.bz2
>> ompi_info.bz2 (uses ompi_info -a)
>> aborttest09.tgz
>>
>> This testing is done on a computer running Linux 3.10.0. This is a different computer
>> than
>> the computer that I previously used for testing. You can confirm that I am using Open
>> MPI
>> 2.1.1.
>>
>> tar xvzf aborttest09.tgz
>> cd aborttest09
>> ./sh run2.sh
>>
>> run2.sh contains the command
>>
>> /opt/openmpi-2.1.1-GNU/bin/mpirun -np 2 -mca btl tcp,self --mca odls_base_verbose
>> 10
>> ./dum.sh
>>
>> The output from this run is in aborttest09/run2.sh.out.
>>
>> The output shows that the the "default" component is selected by odls.
>>
>> The only messages from odls are: odls: launch spawning child ... (two messages).
>> There
>> are no messages from odls with "kill" and I see no SENDING SIGCONT / SIGKILL
>> messages.
>>
>> I am not running from within any batch manager.
>>
>> Sincerely,
>>
>> Ted Sussman
>>
>> On 17 Jun 2017 at 16:02, ***@rist.or.jp wrote:
>>
>> Ted,
>>
>> i do not observe the same behavior you describe with Open MPI 2.1.1
>>
>> # mpirun -np 2 -mca btl tcp,self --mca odls_base_verbose 5 ./abort.sh
>>
>> abort.sh 31361 launching abort
>> abort.sh 31362 launching abort
>> I am rank 0 with pid 31363
>> I am rank 1 with pid 31364
>> ------------------------------------------------------------------------
>> --
>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
>> with errorcode 1.
>>
>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>> You may or may not see output from other processes, depending on
>> exactly when Open MPI kills them.
>> ------------------------------------------------------------------------
>> --
>> [linux:31356] [[18199,0],0] odls:kill_local_proc working on WILDCARD
>> [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
>> [[18199,1],0]
>> [linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],0]
>> [linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 31361
>> SUCCESS
>> [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
>> [[18199,1],1]
>> [linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],1]
>> [linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 31362
>> SUCCESS
>> [linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],0]
>> [linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 31361
>> SUCCESS
>> [linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],1]
>> [linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 31362
>> SUCCESS
>> [linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],0]
>> [linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 31361
>> SUCCESS
>> [linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],1]
>> [linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 31362
>> SUCCESS
>> [linux:31356] [[18199,0],0] odls:kill_local_proc working on WILDCARD
>> [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
>> [[18199,1],0]
>> [linux:31356] [[18199,0],0] odls:kill_local_proc child [[18199,1],0] is
>> not alive
>> [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
>> [[18199,1],1]
>> [linux:31356] [[18199,0],0] odls:kill_local_proc child [[18199,1],1] is
>> not alive
>>
>>
>> Open MPI did kill both shells, and they were indeed killed as evidenced
>> by ps
>>
>> #ps -fu gilles --forest
>> UID PID PPID C STIME TTY TIME CMD
>> gilles 1564 1561 0 15:39 ? 00:00:01 sshd: ***@pts/1
>> gilles 1565 1564 0 15:39 pts/1 00:00:00 \_ -bash
>> gilles 31356 1565 3 15:57 pts/1 00:00:00 \_ /home/gilles/
>> local/ompi-v2.x/bin/mpirun -np 2 -mca btl tcp,self --mca odls_base
>> gilles 31364 1 1 15:57 pts/1 00:00:00 ./abort
>>
>>
>> so trapping SIGTERM in your shell and manually killing the MPI task
>> should work
>> (as Jeff explained, as long as the shell script is fast enough to do
>> that between SIGTERM and SIGKILL)
>>
>>
>> if you observe a different behavior, please double check your Open MPI
>> version and post the outputs of the same commands.
>>
>> btw, are you running from a batch manager ? if yes, which one ?
>>
>> Cheers,
>>
>> Gilles
>>
>> ----- Original Message -----
>> Ted,
>>
>> if you
>>
>> mpirun --mca odls_base_verbose 10 ...
>>
>> you will see which processes get killed and how
>>
>> Best regards,
>>
>>
>> Gilles
>>
>> ----- Original Message -----
>> Hello Jeff,
>>
>> Thanks for your comments.
>>
>> I am not seeing behavior #4, on the two computers that I have
>> tested
>> on, using Open MPI
>> 2.1.1.
>>
>> I wonder if you can duplicate my results with the files that I have
>> uploaded.
>>
>> Regarding what is the "correct" behavior, I am willing to modify my
>> application to correspond
>> to Open MPI's behavior (whatever behavior the Open MPI
>> developers
>> decide is best) --
>> provided that Open MPI does in fact kill off both shells.
>>
>> So my highest priority now is to find out why Open MPI 2.1.1 does
>> not
>> kill off both shells on
>> my computer.
>>
>> Sincerely,
>>
>> Ted Sussman
>>
>> On 16 Jun 2017 at 16:35, Jeff Squyres (jsquyres) wrote:
>>
>> Ted --
>>
>> Sorry for jumping in late. Here's my $0.02...
>>
>> In the runtime, we can do 4 things:
>>
>> 1. Kill just the process that we forked.
>> 2. Kill just the process(es) that call back and identify
>> themselves
>> as MPI processes (we don't track this right now, but we could add that
>> functionality).
>> 3. Union of #1 and #2.
>> 4. Kill all processes (to include any intermediate processes
>> that
>> are not included in #1 and #2).
>>
>> In Open MPI 2.x, #4 is the intended behavior. There may be a
>> bug
>> or
>> two that needs to get fixed (e.g., in your last mail, I don't see
>> offhand why it waits until the MPI process finishes sleeping), but we
>> should be killing the process group, which -- unless any of the
>> descendant processes have explicitly left the process group -- should
>> hit the entire process tree.
>>
>> Sidenote: there's actually a way to be a bit more aggressive
>> and
>> do
>> a better job of ensuring that we kill *all* processes (via creative
>> use
>> of PR_SET_CHILD_SUBREAPER), but that's basically a future
>> enhancement
>> /
>> optimization.
>>
>> I think Gilles and Ralph proposed a good point to you: if you
>> want
>> to be sure to be able to do cleanup after an MPI process terminates (
>> normally or abnormally), you should trap signals in your intermediate
>> processes to catch what Open MPI's runtime throws and therefore know
>> that it is time to cleanup.
>>
>> Hypothetically, this should work in all versions of Open MPI...?
>>
>> I think Ralph made a pull request that adds an MCA param to
>> change
>> the default behavior from #4 to #1.
>>
>> Note, however, that there's a little time between when Open
>> MPI
>> sends the SIGTERM and the SIGKILL, so this solution could be racy. If
>> you find that you're running out of time to cleanup, we might be able
>> to
>> make the delay between the SIGTERM and SIGKILL be configurable
>> (e.g.,
>> via MCA param).
>>
>>
>>
>>
>> On Jun 16, 2017, at 10:08 AM, Ted Sussman
>> <***@adina.com
>>
>> wrote:
>>
>> Hello Gilles and Ralph,
>>
>> Thank you for your advice so far. I appreciate the time
>> that
>> you
>> have spent to educate me about the details of Open MPI.
>>
>> But I think that there is something fundamental that I
>> don't
>> understand. Consider Example 2 run with Open MPI 2.1.1.
>>
>> mpirun --> shell for process 0 --> executable for process
>> 0 -->
>> MPI calls, MPI_Abort
>> --> shell for process 1 --> executable for process 1 -->
>> MPI calls
>>
>> After the MPI_Abort is called, ps shows that both shells
>> are
>> running, and that the executable for process 1 is running (in this
>> case,
>> process 1 is sleeping). And mpirun does not exit until process 1 is
>> finished sleeping.
>>
>> I cannot reconcile this observed behavior with the
>> statement
>>
>> > 2.x: each process is put into its own process group
>> upon launch. When we issue a
>> > "kill", we issue it to the process group. Thus,
>> every
>> child proc of that child proc will
>> > receive it. IIRC, this was the intended behavior.
>>
>> I assume that, for my example, there are two process
>> groups.
>> The
>> process group for process 0 contains the shell for process 0 and the
>> executable for process 0; and the process group for process 1 contains
>> the shell for process 1 and the executable for process 1. So what
>> does
>> MPI_ABORT do? MPI_ABORT does not kill the process group for process
>> 0,
>>
>> since the shell for process 0 continues. And MPI_ABORT does not kill
>> the process group for process 1, since both the shell and executable
>> for
>> process 1 continue.
>>
>> If I hit Ctrl-C after MPI_Abort is called, I get the message
>>
>> mpirun: abort is already in progress.. hit ctrl-c again to
>> forcibly terminate
>>
>> but I don't need to hit Ctrl-C again because mpirun
>> immediately
>> exits.
>>
>> Can you shed some light on all of this?
>>
>> Sincerely,
>>
>> Ted Sussman
>>
>>
>> On 15 Jun 2017 at 14:44, ***@open-mpi.org wrote:
>>
>>
>> You have to understand that we have no way of
>> knowing who is
>> making MPI calls - all we see is
>> the proc that we started, and we know someone of
>> that rank is
>> running (but we have no way of
>> knowing which of the procs you sub-spawned it is).
>>
>> So the behavior you are seeking only occurred in
>> some earlier
>> release by sheer accident. Nor will
>> you find it portable as there is no specification
>> directing
>> that
>> behavior.
>>
>> The behavior I´ve provided is to either deliver the
>> signal to
>> _
>> all_ child processes (including
>> grandchildren etc.), or _only_ the immediate child
>> of the
>> daemon.
>> It won´t do what you describe -
>> kill the mPI proc underneath the shell, but not the
>> shell
>> itself.
>>
>> What you can eventually do is use PMIx to ask the
>> runtime to
>> selectively deliver signals to
>> pid/procs for you. We don´t have that capability
>> implemented
>> just yet, I´m afraid.
>>
>> Meantime, when I get a chance, I can code an
>> option that will
>> record the pid of the subproc that
>> calls MPI_Init, and then let´s you deliver signals to
>> just
>> that
>> proc. No promises as to when that will
>> be done.
>>
>>
>> On Jun 15, 2017, at 1:37 PM, Ted Sussman
>> <ted.sussman@
>> adina.
>> com> wrote:
>>
>> Hello Ralph,
>>
>> I am just an Open MPI end user, so I will need to
>> wait for
>> the next official release.
>>
>> mpirun --> shell for process 0 --> executable for
>> process
>> 0
>> --> MPI calls
>> --> shell for process 1 --> executable for process
>> 1
>> --> MPI calls
>> ...
>>
>> I guess the question is, should MPI_ABORT kill the
>> executables or the shells? I naively
>> thought, that, since it is the executables that make
>> the
>> MPI
>> calls, it is the executables that
>> should be aborted by the call to MPI_ABORT. Since
>> the
>> shells don't make MPI calls, the
>> shells should not be aborted.
>>
>> And users might have several layers of shells in
>> between
>> mpirun and the executable.
>>
>> So now I will look for the latest version of Open MPI
>> that
>> has the 1.4.3 behavior.
>>
>> Sincerely,
>>
>> Ted Sussman
>>
>> On 15 Jun 2017 at 12:31, ***@open-mpi.org wrote:
>>
>> >
>> > Yeah, things jittered a little there as we debated
>> the "
>> right" behavior. Generally, when we
>> see that
>> > happening it means that a param is required, but
>> somehow
>> we never reached that point.
>> >
>> > See if https://github.com/open-mpi/ompi/pull/3704
>> helps
>> -
>> if so, I can schedule it for the next
>> 2.x
>> > release if the RMs agree to take it
>> >
>> > Ralph
>> >
>> > On Jun 15, 2017, at 12:20 PM, Ted Sussman <ted.
>> sussman
>> @adina.com > wrote:
>> >
>> > Thank you for your comments.
>> >
>> > Our application relies upon "dum.sh" to clean up
>> after
>> the process exits, either if the
>> process
>> > exits normally, or if the process exits abnormally
>> because of MPI_ABORT. If the process
>> > group is killed by MPI_ABORT, this clean up will not
>> be performed. If exec is used to launch
>> > the executable from dum.sh, then dum.sh is
>> terminated
>> by the exec, so dum.sh cannot
>> > perform any clean up.
>> >
>> > I suppose that other user applications might work
>> similarly, so it would be good to have an
>> > MCA parameter to control the behavior of
>> MPI_ABORT.
>> >
>> > We could rewrite our shell script that invokes
>> mpirun,
>> so that the cleanup that is now done
>> > by
>> > dum.sh is done by the invoking shell script after
>> mpirun exits. Perhaps this technique is the
>> > preferred way to clean up after mpirun is invoked.
>> >
>> > By the way, I have also tested with Open MPI
>> 1.10.7,
>> and Open MPI 1.10.7 has different
>> > behavior than either Open MPI 1.4.3 or Open MPI
>> 2.1.
>> 1.
>> In this explanation, it is important to
>> > know that the aborttest executable sleeps for 20
>> sec.
>> >
>> > When running example 2:
>> >
>> > 1.4.3: process 1 immediately aborts
>> > 1.10.7: process 1 doesn't abort and never stops.
>> > 2.1.1 process 1 doesn't abort, but stops after it is
>> finished sleeping
>> >
>> > Sincerely,
>> >
>> > Ted Sussman
>> >
>> > On 15 Jun 2017 at 9:18, ***@open-mpi.org wrote:
>> >
>> > Here is how the system is working:
>> >
>> > Master: each process is put into its own process
>> group
>> upon launch. When we issue a
>> > "kill", however, we only issue it to the individual
>> process (instead of the process group
>> > that is headed by that child process). This is
>> probably a bug as I don´t believe that is
>> > what we intended, but set that aside for now.
>> >
>> > 2.x: each process is put into its own process group
>> upon launch. When we issue a
>> > "kill", we issue it to the process group. Thus,
>> every
>> child proc of that child proc will
>> > receive it. IIRC, this was the intended behavior.
>> >
>> > It is rather trivial to make the change (it only
>> involves 3 lines of code), but I´m not sure
>> > of what our intended behavior is supposed to be.
>> Once
>> we clarify that, it is also trivial
>> > to add another MCA param (you can never have too
>> many!)
>> to allow you to select the
>> > other behavior.
>> >
>> >
>> > On Jun 15, 2017, at 5:23 AM, Ted Sussman <ted.
>> sussman@
>> adina.com > wrote:
>> >
>> > Hello Gilles,
>> >
>> > Thank you for your quick answer. I confirm that if
>> exec is used, both processes
>> > immediately
>> > abort.
>> >
>> > Now suppose that the line
>> >
>> > echo "After aborttest:
>> >
>> OMPI_COMM_WORLD_RANK="$OMPI_COMM_
>> WORLD_RANK
>> >
>> > is added to the end of dum.sh.
>> >
>> > If Example 2 is run with Open MPI 1.4.3, the output
>> is
>> >
>> > After aborttest: OMPI_COMM_WORLD_RANK=0
>> >
>> > which shows that the shell script for the process
>> with
>> rank 0 continues after the
>> > abort,
>> > but that the shell script for the process with rank
>> 1
>> does not continue after the
>> > abort.
>> >
>> > If Example 2 is run with Open MPI 2.1.1, with exec
>> used to invoke
>> > aborttest02.exe, then
>> > there is no such output, which shows that both shell
>> scripts do not continue after
>> > the abort.
>> >
>> > I prefer the Open MPI 1.4.3 behavior because our
>> original application depends
>> > upon the
>> > Open MPI 1.4.3 behavior. (Our original application
>> will also work if both
>> > executables are
>> > aborted, and if both shell scripts continue after
>> the
>> abort.)
>> >
>> > It might be too much to expect, but is there a way
>> to
>> recover the Open MPI 1.4.3
>> > behavior
>> > using Open MPI 2.1.1?
>> >
>> > Sincerely,
>> >
>> > Ted Sussman
>> >
>> >
>> > On 15 Jun 2017 at 9:50, Gilles Gouaillardet wrote:
>> >
>> > Ted,
>> >
>> >
>> > fwiw, the 'master' branch has the behavior you
>> expect.
>> >
>> >
>> > meanwhile, you can simple edit your 'dum.sh' script
>> and replace
>> >
>> > /home/buildadina/src/aborttest02/aborttest02.exe
>> >
>> > with
>> >
>> > exec /home/buildadina/src/aborttest02/aborttest02.
>> exe
>> >
>> >
>> > Cheers,
>> >
>> >
>> > Gilles
>> >
>> >
>> > On 6/15/2017 3:01 AM, Ted Sussman wrote:
>> > Hello,
>> >
>> > My question concerns MPI_ABORT, indirect
>> execution
>> of
>> > executables by mpirun and Open
>> > MPI 2.1.1. When mpirun runs executables directly,
>> MPI
>> _ABORT
>> > works as expected, but
>> > when mpirun runs executables indirectly,
>> MPI_ABORT
>> does not
>> > work as expected.
>> >
>> > If Open MPI 1.4.3 is used instead of Open MPI
>> 2.1.1,
>> MPI_ABORT
>> > works as expected in all
>> > cases.
>> >
>> > The examples given below have been simplified as
>> far
>> as possible
>> > to show the issues.
>> >
>> > ---
>> >
>> > Example 1
>> >
>> > Consider an MPI job run in the following way:
>> >
>> > mpirun ... -app addmpw1
>> >
>> > where the appfile addmpw1 lists two executables:
>> >
>> > -n 1 -host gulftown ... aborttest02.exe
>> > -n 1 -host gulftown ... aborttest02.exe
>> >
>> > The two executables are executed on the local node
>> gulftown.
>> > aborttest02 calls MPI_ABORT
>> > for rank 0, then sleeps.
>> >
>> > The above MPI job runs as expected. Both
>> processes
>> immediately
>> > abort when rank 0 calls
>> > MPI_ABORT.
>> >
>> > ---
>> >
>> > Example 2
>> >
>> > Now change the above example as follows:
>> >
>> > mpirun ... -app addmpw2
>> >
>> > where the appfile addmpw2 lists shell scripts:
>> >
>> > -n 1 -host gulftown ... dum.sh
>> > -n 1 -host gulftown ... dum.sh
>> >
>> > dum.sh invokes aborttest02.exe. So aborttest02.exe
>> is
>> executed
>> > indirectly by mpirun.
>> >
>> > In this case, the MPI job only aborts process 0 when
>> rank 0 calls
>> > MPI_ABORT. Process 1
>> > continues to run. This behavior is unexpected.
>> >
>> > ----
>> >
>> > I have attached all files to this E-mail. Since
>> there
>> are absolute
>> > pathnames in the files, to
>> > reproduce my findings, you will need to update the
>> pathnames in the
>> > appfiles and shell
>> > scripts. To run example 1,
>> >
>> > sh run1.sh
>> >
>> > and to run example 2,
>> >
>> > sh run2.sh
>> >
>> > ---
>> >
>> > I have tested these examples with Open MPI 1.4.3
>> and
>> 2.
>> 0.3. In
>> > Open MPI 1.4.3, both
>> > examples work as expected. Open MPI 2.0.3 has
>> the
>> same behavior
>> > as Open MPI 2.1.1.
>> >
>> > ---
>> >
>> > I would prefer that Open MPI 2.1.1 aborts both
>> processes, even
>> > when the executables are
>> > invoked indirectly by mpirun. If there is an MCA
>> setting that is
>> > needed to make Open MPI
>> > 2.1.1 abort both processes, please let me know.
>> >
>> >
>> > Sincerely,
>> >
>> > Theodore Sussman
>> >
>> >
>> > The following section of this message contains a
>> file
>> attachment
>> > prepared for transmission using the Internet MIME
>> message format.
>> > If you are using Pegasus Mail, or any other MIME-
>> compliant system,
>> > you should be able to save it or view it from within
>> your mailer.
>> > If you cannot, please ask your system administrator
>> for assistance.
>> >
>> > ---- File information -----------
>> > File: config.log.bz2
>> > Date: 14 Jun 2017, 13:35
>> > Size: 146548 bytes.
>> > Type: Binary
>> >
>> >
>> > The following section of this message contains a
>> file
>> attachment
>> > prepared for transmission using the Internet MIME
>> message format.
>> > If you are using Pegasus Mail, or any other MIME-
>> compliant system,
>> > you should be able to save it or view it from within
>> your mailer.
>> > If you cannot, please ask your system administrator
>> for assistance.
>> >
>> > ---- File information -----------
>> > File: ompi_info.bz2
>> > Date: 14 Jun 2017, 13:35
>> > Size: 24088 bytes.
>> > Type: Binary
>> >
>> >
>> > The following section of this message contains a
>> file
>> attachment
>> > prepared for transmission using the Internet MIME
>> message format.
>> > If you are using Pegasus Mail, or any other MIME-
>> compliant system,
>> > you should be able to save it or view it from within
>> your mailer.
>> > If you cannot, please ask your system administrator
>> for assistance.
>> >
>> > ---- File information -----------
>> > File: aborttest02.tgz
>> > Date: 14 Jun 2017, 13:52
>> > Size: 4285 bytes.
>> > Type: Binary
>> >
>> >
>> >
>> ________________________________________
>> _______
>> > users mailing list
>> > ***@lists.open-mpi.org
>> >
>> https://rfd.newmexicoconsortium.org/mailman/listin
>> fo/users
>>
>>
>> >
>> >
>> ________________________________________
>> _______
>> > users mailing list
>> > ***@lists.open-mpi.org
>> >
>> https://rfd.newmexicoconsortium.org/mailman/listin
>> fo/users
>>
>>
>> >
>> >
>> >
>> >
>> ________________________________________
>> _______
>> > users mailing list
>> > ***@lists.open-mpi.org
>> >
>> https://rfd.newmexicoconsortium.org/mailman/listin
>> fo/users
>>
>>
>> >
>> >
>> ________________________________________
>> _______
>> > users mailing list
>> > ***@lists.open-mpi.org
>> >
>> https://rfd.newmexicoconsortium.org/mailman/listin
>> fo/users
>>
>>
>> >
>> >
>> >
>> >
>> ________________________________________
>> _______
>> > users mailing list
>> > ***@lists.open-mpi.org
>> >
>> https://rfd.newmexicoconsortium.org/mailman/listin
>> fo/users
>>
>>
>> >
>>
>>
>> __________________________________________
>> _____
>> users mailing list
>> ***@lists.open-mpi.org
>>
>> https://rfd.newmexicoconsortium.org/mailman/listin
>> fo/users
>>
>>
>>
>> _____________________________________________
>> __
>> users mailing list
>> ***@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/us
>> ers
>>
>>
>> --
>> Jeff Squyres
>> ***@cisco.com
>>
>> _______________________________________________
>> users mailing list
>> ***@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> ***@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
>> _______________________________________________
>> users mailing list
>> ***@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
>> _______________________________________________
>> users mailing list
>> ***@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
>>
>>
>>
>> The following section of this message contains a file attachment
>> prepared for transmission using the Internet MIME message format.
>> If you are using Pegasus Mail, or any other MIME-compliant system,
>> you should be able to save it or view it from within your mailer.
>> If you cannot, please ask your system administrator for assistance.
>>
>> ---- File information -----------
>> File: aborttest10.tgz
>> Date: 19 Jun 2017, 12:42
>> Size: 4740 bytes.
>> Type: Binary
>> <aborttest10.tgz>_______________________________________________
>> users mailing list
>> ***@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
>
>
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Ted Sussman
2017-06-19 17:54:10 UTC
Permalink
I don't do any setting of process groups. dum.sh just invokes the executable:

/..../aborttest10.exe


On 19 Jun 2017 at 10:30, ***@open-mpi.org wrote:

> When you fork that process off, do you set its process group? Or is it in the same process group as the shell script?
>
> > On Jun 19, 2017, at 10:19 AM, Ted Sussman <***@adina.com> wrote:
> >
> > If I replace the sleep with an infinite loop, I get the same behavior. One "aborttest" process
> > remains after all the signals are sent.
> >
> > On 19 Jun 2017 at 10:10, ***@open-mpi.org wrote:
> >
> >>
> >> That is typical behavior when you throw something into "sleep" - not much we can do about it, I
> >> think.
> >>
> >> On Jun 19, 2017, at 9:58 AM, Ted Sussman <***@adina.com> wrote:
> >>
> >> Hello,
> >>
> >> I have rebuilt Open MPI 2.1.1 on the same computer, including --enable-debug.
> >>
> >> I have attached the abort test program aborttest10.tgz. This version sleeps for 5 sec before
> >> calling MPI_ABORT, so that I can check the pids using ps.
> >>
> >> This is what happens (see run2.sh.out).
> >>
> >> Open MPI invokes two instances of dum.sh. Each instance of dum.sh invokes aborttest.exe.
> >>
> >> Pid Process
> >> -------------------
> >> 19565 dum.sh
> >> 19566 dum.sh
> >> 19567 aborttest10.exe
> >> 19568 aborttest10.exe
> >>
> >> When MPI_ABORT is called, Open MPI sends SIGCONT, SIGTERM and SIGKILL to both
> >> instances of dum.sh (pids 19565 and 19566).
> >>
> >> ps shows that both the shell processes vanish, and that one of the aborttest10.exe processes
> >> vanishes. But the other aborttest10.exe remains and continues until it is finished sleeping.
> >>
> >> Hope that this information is useful.
> >>
> >> Sincerely,
> >>
> >> Ted Sussman
> >>
> >>
> >>
> >> On 19 Jun 2017 at 23:06, ***@rist.or.jp wrote:
> >>
> >>
> >> Ted,
> >>
> >> some traces are missing because you did not configure with --enable-debug
> >> i am afraid you have to do it (and you probably want to install that debug version in an
> >> other
> >> location since its performances are not good for production) in order to get all the logs.
> >>
> >> Cheers,
> >>
> >> Gilles
> >>
> >> ----- Original Message -----
> >> Hello Gilles,
> >>
> >> I retried my example, with the same results as I observed before. The process with rank
> >> 1
> >> does not get killed by MPI_ABORT.
> >>
> >> I have attached to this E-mail:
> >>
> >> config.log.bz2
> >> ompi_info.bz2 (uses ompi_info -a)
> >> aborttest09.tgz
> >>
> >> This testing is done on a computer running Linux 3.10.0. This is a different computer
> >> than
> >> the computer that I previously used for testing. You can confirm that I am using Open
> >> MPI
> >> 2.1.1.
> >>
> >> tar xvzf aborttest09.tgz
> >> cd aborttest09
> >> ./sh run2.sh
> >>
> >> run2.sh contains the command
> >>
> >> /opt/openmpi-2.1.1-GNU/bin/mpirun -np 2 -mca btl tcp,self --mca odls_base_verbose
> >> 10
> >> ./dum.sh
> >>
> >> The output from this run is in aborttest09/run2.sh.out.
> >>
> >> The output shows that the the "default" component is selected by odls.
> >>
> >> The only messages from odls are: odls: launch spawning child ... (two messages).
> >> There
> >> are no messages from odls with "kill" and I see no SENDING SIGCONT / SIGKILL
> >> messages.
> >>
> >> I am not running from within any batch manager.
> >>
> >> Sincerely,
> >>
> >> Ted Sussman
> >>
> >> On 17 Jun 2017 at 16:02, ***@rist.or.jp wrote:
> >>
> >> Ted,
> >>
> >> i do not observe the same behavior you describe with Open MPI 2.1.1
> >>
> >> # mpirun -np 2 -mca btl tcp,self --mca odls_base_verbose 5 ./abort.sh
> >>
> >> abort.sh 31361 launching abort
> >> abort.sh 31362 launching abort
> >> I am rank 0 with pid 31363
> >> I am rank 1 with pid 31364
> >> ------------------------------------------------------------------------
> >> --
> >> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> >> with errorcode 1.
> >>
> >> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> >> You may or may not see output from other processes, depending on
> >> exactly when Open MPI kills them.
> >> ------------------------------------------------------------------------
> >> --
> >> [linux:31356] [[18199,0],0] odls:kill_local_proc working on WILDCARD
> >> [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> >> [[18199,1],0]
> >> [linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],0]
> >> [linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 31361
> >> SUCCESS
> >> [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> >> [[18199,1],1]
> >> [linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],1]
> >> [linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 31362
> >> SUCCESS
> >> [linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],0]
> >> [linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 31361
> >> SUCCESS
> >> [linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],1]
> >> [linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 31362
> >> SUCCESS
> >> [linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],0]
> >> [linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 31361
> >> SUCCESS
> >> [linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],1]
> >> [linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 31362
> >> SUCCESS
> >> [linux:31356] [[18199,0],0] odls:kill_local_proc working on WILDCARD
> >> [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> >> [[18199,1],0]
> >> [linux:31356] [[18199,0],0] odls:kill_local_proc child [[18199,1],0] is
> >> not alive
> >> [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> >> [[18199,1],1]
> >> [linux:31356] [[18199,0],0] odls:kill_local_proc child [[18199,1],1] is
> >> not alive
> >>
> >>
> >> Open MPI did kill both shells, and they were indeed killed as evidenced
> >> by ps
> >>
> >> #ps -fu gilles --forest
> >> UID PID PPID C STIME TTY TIME CMD
> >> gilles 1564 1561 0 15:39 ? 00:00:01 sshd: ***@pts/1
> >> gilles 1565 1564 0 15:39 pts/1 00:00:00 \_ -bash
> >> gilles 31356 1565 3 15:57 pts/1 00:00:00 \_ /home/gilles/
> >> local/ompi-v2.x/bin/mpirun -np 2 -mca btl tcp,self --mca odls_base
> >> gilles 31364 1 1 15:57 pts/1 00:00:00 ./abort
> >>
> >>
> >> so trapping SIGTERM in your shell and manually killing the MPI task
> >> should work
> >> (as Jeff explained, as long as the shell script is fast enough to do
> >> that between SIGTERM and SIGKILL)
> >>
> >>
> >> if you observe a different behavior, please double check your Open MPI
> >> version and post the outputs of the same commands.
> >>
> >> btw, are you running from a batch manager ? if yes, which one ?
> >>
> >> Cheers,
> >>
> >> Gilles
> >>
> >> ----- Original Message -----
> >> Ted,
> >>
> >> if you
> >>
> >> mpirun --mca odls_base_verbose 10 ...
> >>
> >> you will see which processes get killed and how
> >>
> >> Best regards,
> >>
> >>
> >> Gilles
> >>
> >> ----- Original Message -----
> >> Hello Jeff,
> >>
> >> Thanks for your comments.
> >>
> >> I am not seeing behavior #4, on the two computers that I have
> >> tested
> >> on, using Open MPI
> >> 2.1.1.
> >>
> >> I wonder if you can duplicate my results with the files that I have
> >> uploaded.
> >>
> >> Regarding what is the "correct" behavior, I am willing to modify my
> >> application to correspond
> >> to Open MPI's behavior (whatever behavior the Open MPI
> >> developers
> >> decide is best) --
> >> provided that Open MPI does in fact kill off both shells.
> >>
> >> So my highest priority now is to find out why Open MPI 2.1.1 does
> >> not
> >> kill off both shells on
> >> my computer.
> >>
> >> Sincerely,
> >>
> >> Ted Sussman
> >>
> >> On 16 Jun 2017 at 16:35, Jeff Squyres (jsquyres) wrote:
> >>
> >> Ted --
> >>
> >> Sorry for jumping in late. Here's my $0.02...
> >>
> >> In the runtime, we can do 4 things:
> >>
> >> 1. Kill just the process that we forked.
> >> 2. Kill just the process(es) that call back and identify
> >> themselves
> >> as MPI processes (we don't track this right now, but we could add that
> >> functionality).
> >> 3. Union of #1 and #2.
> >> 4. Kill all processes (to include any intermediate processes
> >> that
> >> are not included in #1 and #2).
> >>
> >> In Open MPI 2.x, #4 is the intended behavior. There may be a
> >> bug
> >> or
> >> two that needs to get fixed (e.g., in your last mail, I don't see
> >> offhand why it waits until the MPI process finishes sleeping), but we
> >> should be killing the process group, which -- unless any of the
> >> descendant processes have explicitly left the process group -- should
> >> hit the entire process tree.
> >>
> >> Sidenote: there's actually a way to be a bit more aggressive
> >> and
> >> do
> >> a better job of ensuring that we kill *all* processes (via creative
> >> use
> >> of PR_SET_CHILD_SUBREAPER), but that's basically a future
> >> enhancement
> >> /
> >> optimization.
> >>
> >> I think Gilles and Ralph proposed a good point to you: if you
> >> want
> >> to be sure to be able to do cleanup after an MPI process terminates (
> >> normally or abnormally), you should trap signals in your intermediate
> >> processes to catch what Open MPI's runtime throws and therefore know
> >> that it is time to cleanup.
> >>
> >> Hypothetically, this should work in all versions of Open MPI...?
> >>
> >> I think Ralph made a pull request that adds an MCA param to
> >> change
> >> the default behavior from #4 to #1.
> >>
> >> Note, however, that there's a little time between when Open
> >> MPI
> >> sends the SIGTERM and the SIGKILL, so this solution could be racy. If
> >> you find that you're running out of time to cleanup, we might be able
> >> to
> >> make the delay between the SIGTERM and SIGKILL be configurable
> >> (e.g.,
> >> via MCA param).
> >>
> >>
> >>
> >>
> >> On Jun 16, 2017, at 10:08 AM, Ted Sussman
> >> <***@adina.com
> >>
> >> wrote:
> >>
> >> Hello Gilles and Ralph,
> >>
> >> Thank you for your advice so far. I appreciate the time
> >> that
> >> you
> >> have spent to educate me about the details of Open MPI.
> >>
> >> But I think that there is something fundamental that I
> >> don't
> >> understand. Consider Example 2 run with Open MPI 2.1.1.
> >>
> >> mpirun --> shell for process 0 --> executable for process
> >> 0 -->
> >> MPI calls, MPI_Abort
> >> --> shell for process 1 --> executable for process 1 -->
> >> MPI calls
> >>
> >> After the MPI_Abort is called, ps shows that both shells
> >> are
> >> running, and that the executable for process 1 is running (in this
> >> case,
> >> process 1 is sleeping). And mpirun does not exit until process 1 is
> >> finished sleeping.
> >>
> >> I cannot reconcile this observed behavior with the
> >> statement
> >>
> >> > 2.x: each process is put into its own process group
> >> upon launch. When we issue a
> >> > "kill", we issue it to the process group. Thus,
> >> every
> >> child proc of that child proc will
> >> > receive it. IIRC, this was the intended behavior.
> >>
> >> I assume that, for my example, there are two process
> >> groups.
> >> The
> >> process group for process 0 contains the shell for process 0 and the
> >> executable for process 0; and the process group for process 1 contains
> >> the shell for process 1 and the executable for process 1. So what
> >> does
> >> MPI_ABORT do? MPI_ABORT does not kill the process group for process
> >> 0,
> >>
> >> since the shell for process 0 continues. And MPI_ABORT does not kill
> >> the process group for process 1, since both the shell and executable
> >> for
> >> process 1 continue.
> >>
> >> If I hit Ctrl-C after MPI_Abort is called, I get the message
> >>
> >> mpirun: abort is already in progress.. hit ctrl-c again to
> >> forcibly terminate
> >>
> >> but I don't need to hit Ctrl-C again because mpirun
> >> immediately
> >> exits.
> >>
> >> Can you shed some light on all of this?
> >>
> >> Sincerely,
> >>
> >> Ted Sussman
> >>
> >>
> >> On 15 Jun 2017 at 14:44, ***@open-mpi.org wrote:
> >>
> >>
> >> You have to understand that we have no way of
> >> knowing who is
> >> making MPI calls - all we see is
> >> the proc that we started, and we know someone of
> >> that rank is
> >> running (but we have no way of
> >> knowing which of the procs you sub-spawned it is).
> >>
> >> So the behavior you are seeking only occurred in
> >> some earlier
> >> release by sheer accident. Nor will
> >> you find it portable as there is no specification
> >> directing
> >> that
> >> behavior.
> >>
> >> The behavior I´ve provided is to either deliver the
> >> signal to
> >> _
> >> all_ child processes (including
> >> grandchildren etc.), or _only_ the immediate child
> >> of the
> >> daemon.
> >> It won´t do what you describe -
> >> kill the mPI proc underneath the shell, but not the
> >> shell
> >> itself.
> >>
> >> What you can eventually do is use PMIx to ask the
> >> runtime to
> >> selectively deliver signals to
> >> pid/procs for you. We don´t have that capability
> >> implemented
> >> just yet, I´m afraid.
> >>
> >> Meantime, when I get a chance, I can code an
> >> option that will
> >> record the pid of the subproc that
> >> calls MPI_Init, and then let´s you deliver signals to
> >> just
> >> that
> >> proc. No promises as to when that will
> >> be done.
> >>
> >>
> >> On Jun 15, 2017, at 1:37 PM, Ted Sussman
> >> <ted.sussman@
> >> adina.
> >> com> wrote:
> >>
> >> Hello Ralph,
> >>
> >> I am just an Open MPI end user, so I will need to
> >> wait for
> >> the next official release.
> >>
> >> mpirun --> shell for process 0 --> executable for
> >> process
> >> 0
> >> --> MPI calls
> >> --> shell for process 1 --> executable for process
> >> 1
> >> --> MPI calls
> >> ...
> >>
> >> I guess the question is, should MPI_ABORT kill the
> >> executables or the shells? I naively
> >> thought, that, since it is the executables that make
> >> the
> >> MPI
> >> calls, it is the executables that
> >> should be aborted by the call to MPI_ABORT. Since
> >> the
> >> shells don't make MPI calls, the
> >> shells should not be aborted.
> >>
> >> And users might have several layers of shells in
> >> between
> >> mpirun and the executable.
> >>
> >> So now I will look for the latest version of Open MPI
> >> that
> >> has the 1.4.3 behavior.
> >>
> >> Sincerely,
> >>
> >> Ted Sussman
> >>
> >> On 15 Jun 2017 at 12:31, ***@open-mpi.org wrote:
> >>
> >> >
> >> > Yeah, things jittered a little there as we debated
> >> the "
> >> right" behavior. Generally, when we
> >> see that
> >> > happening it means that a param is required, but
> >> somehow
> >> we never reached that point.
> >> >
> >> > See if https://github.com/open-mpi/ompi/pull/3704
> >> helps
> >> -
> >> if so, I can schedule it for the next
> >> 2.x
> >> > release if the RMs agree to take it
> >> >
> >> > Ralph
> >> >
> >> > On Jun 15, 2017, at 12:20 PM, Ted Sussman <ted.
> >> sussman
> >> @adina.com > wrote:
> >> >
> >> > Thank you for your comments.
> >> >
> >> > Our application relies upon "dum.sh" to clean up
> >> after
> >> the process exits, either if the
> >> process
> >> > exits normally, or if the process exits abnormally
> >> because of MPI_ABORT. If the process
> >> > group is killed by MPI_ABORT, this clean up will not
> >> be performed. If exec is used to launch
> >> > the executable from dum.sh, then dum.sh is
> >> terminated
> >> by the exec, so dum.sh cannot
> >> > perform any clean up.
> >> >
> >> > I suppose that other user applications might work
> >> similarly, so it would be good to have an
> >> > MCA parameter to control the behavior of
> >> MPI_ABORT.
> >> >
> >> > We could rewrite our shell script that invokes
> >> mpirun,
> >> so that the cleanup that is now done
> >> > by
> >> > dum.sh is done by the invoking shell script after
> >> mpirun exits. Perhaps this technique is the
> >> > preferred way to clean up after mpirun is invoked.
> >> >
> >> > By the way, I have also tested with Open MPI
> >> 1.10.7,
> >> and Open MPI 1.10.7 has different
> >> > behavior than either Open MPI 1.4.3 or Open MPI
> >> 2.1.
> >> 1.
> >> In this explanation, it is important to
> >> > know that the aborttest executable sleeps for 20
> >> sec.
> >> >
> >> > When running example 2:
> >> >
> >> > 1.4.3: process 1 immediately aborts
> >> > 1.10.7: process 1 doesn't abort and never stops.
> >> > 2.1.1 process 1 doesn't abort, but stops after it is
> >> finished sleeping
> >> >
> >> > Sincerely,
> >> >
> >> > Ted Sussman
> >> >
> >> > On 15 Jun 2017 at 9:18, ***@open-mpi.org wrote:
> >> >
> >> > Here is how the system is working:
> >> >
> >> > Master: each process is put into its own process
> >> group
> >> upon launch. When we issue a
> >> > "kill", however, we only issue it to the individual
> >> process (instead of the process group
> >> > that is headed by that child process). This is
> >> probably a bug as I don´t believe that is
> >> > what we intended, but set that aside for now.
> >> >
> >> > 2.x: each process is put into its own process group
> >> upon launch. When we issue a
> >> > "kill", we issue it to the process group. Thus,
> >> every
> >> child proc of that child proc will
> >> > receive it. IIRC, this was the intended behavior.
> >> >
> >> > It is rather trivial to make the change (it only
> >> involves 3 lines of code), but I´m not sure
> >> > of what our intended behavior is supposed to be.
> >> Once
> >> we clarify that, it is also trivial
> >> > to add another MCA param (you can never have too
> >> many!)
> >> to allow you to select the
> >> > other behavior.
> >> >
> >> >
> >> > On Jun 15, 2017, at 5:23 AM, Ted Sussman <ted.
> >> sussman@
> >> adina.com > wrote:
> >> >
> >> > Hello Gilles,
> >> >
> >> > Thank you for your quick answer. I confirm that if
> >> exec is used, both processes
> >> > immediately
> >> > abort.
> >> >
> >> > Now suppose that the line
> >> >
> >> > echo "After aborttest:
> >> >
> >> OMPI_COMM_WORLD_RANK="$OMPI_COMM_
> >> WORLD_RANK
> >> >
> >> > is added to the end of dum.sh.
> >> >
> >> > If Example 2 is run with Open MPI 1.4.3, the output
> >> is
> >> >
> >> > After aborttest: OMPI_COMM_WORLD_RANK=0
> >> >
> >> > which shows that the shell script for the process
> >> with
> >> rank 0 continues after the
> >> > abort,
> >> > but that the shell script for the process with rank
> >> 1
> >> does not continue after the
> >> > abort.
> >> >
> >> > If Example 2 is run with Open MPI 2.1.1, with exec
> >> used to invoke
> >> > aborttest02.exe, then
> >> > there is no such output, which shows that both shell
> >> scripts do not continue after
> >> > the abort.
> >> >
> >> > I prefer the Open MPI 1.4.3 behavior because our
> >> original application depends
> >> > upon the
> >> > Open MPI 1.4.3 behavior. (Our original application
> >> will also work if both
> >> > executables are
> >> > aborted, and if both shell scripts continue after
> >> the
> >> abort.)
> >> >
> >> > It might be too much to expect, but is there a way
> >> to
> >> recover the Open MPI 1.4.3
> >> > behavior
> >> > using Open MPI 2.1.1?
> >> >
> >> > Sincerely,
> >> >
> >> > Ted Sussman
> >> >
> >> >
> >> > On 15 Jun 2017 at 9:50, Gilles Gouaillardet wrote:
> >> >
> >> > Ted,
> >> >
> >> >
> >> > fwiw, the 'master' branch has the behavior you
> >> expect.
> >> >
> >> >
> >> > meanwhile, you can simple edit your 'dum.sh' script
> >> and replace
> >> >
> >> > /home/buildadina/src/aborttest02/aborttest02.exe
> >> >
> >> > with
> >> >
> >> > exec /home/buildadina/src/aborttest02/aborttest02.
> >> exe
> >> >
> >> >
> >> > Cheers,
> >> >
> >> >
> >> > Gilles
> >> >
> >> >
> >> > On 6/15/2017 3:01 AM, Ted Sussman wrote:
> >> > Hello,
> >> >
> >> > My question concerns MPI_ABORT, indirect
> >> execution
> >> of
> >> > executables by mpirun and Open
> >> > MPI 2.1.1. When mpirun runs executables directly,
> >> MPI
> >> _ABORT
> >> > works as expected, but
> >> > when mpirun runs executables indirectly,
> >> MPI_ABORT
> >> does not
> >> > work as expected.
> >> >
> >> > If Open MPI 1.4.3 is used instead of Open MPI
> >> 2.1.1,
> >> MPI_ABORT
> >> > works as expected in all
> >> > cases.
> >> >
> >> > The examples given below have been simplified as
> >> far
> >> as possible
> >> > to show the issues.
> >> >
> >> > ---
> >> >
> >> > Example 1
> >> >
> >> > Consider an MPI job run in the following way:
> >> >
> >> > mpirun ... -app addmpw1
> >> >
> >> > where the appfile addmpw1 lists two executables:
> >> >
> >> > -n 1 -host gulftown ... aborttest02.exe
> >> > -n 1 -host gulftown ... aborttest02.exe
> >> >
> >> > The two executables are executed on the local node
> >> gulftown.
> >> > aborttest02 calls MPI_ABORT
> >> > for rank 0, then sleeps.
> >> >
> >> > The above MPI job runs as expected. Both
> >> processes
> >> immediately
> >> > abort when rank 0 calls
> >> > MPI_ABORT.
> >> >
> >> > ---
> >> >
> >> > Example 2
> >> >
> >> > Now change the above example as follows:
> >> >
> >> > mpirun ... -app addmpw2
> >> >
> >> > where the appfile addmpw2 lists shell scripts:
> >> >
> >> > -n 1 -host gulftown ... dum.sh
> >> > -n 1 -host gulftown ... dum.sh
> >> >
> >> > dum.sh invokes aborttest02.exe. So aborttest02.exe
> >> is
> >> executed
> >> > indirectly by mpirun.
> >> >
> >> > In this case, the MPI job only aborts process 0 when
> >> rank 0 calls
> >> > MPI_ABORT. Process 1
> >> > continues to run. This behavior is unexpected.
> >> >
> >> > ----
> >> >
> >> > I have attached all files to this E-mail. Since
> >> there
> >> are absolute
> >> > pathnames in the files, to
> >> > reproduce my findings, you will need to update the
> >> pathnames in the
> >> > appfiles and shell
> >> > scripts. To run example 1,
> >> >
> >> > sh run1.sh
> >> >
> >> > and to run example 2,
> >> >
> >> > sh run2.sh
> >> >
> >> > ---
> >> >
> >> > I have tested these examples with Open MPI 1.4.3
> >> and
> >> 2.
> >> 0.3. In
> >> > Open MPI 1.4.3, both
> >> > examples work as expected. Open MPI 2.0.3 has
> >> the
> >> same behavior
> >> > as Open MPI 2.1.1.
> >> >
> >> > ---
> >> >
> >> > I would prefer that Open MPI 2.1.1 aborts both
> >> processes, even
> >> > when the executables are
> >> > invoked indirectly by mpirun. If there is an MCA
> >> setting that is
> >> > needed to make Open MPI
> >> > 2.1.1 abort both processes, please let me know.
> >> >
> >> >
> >> > Sincerely,
> >> >
> >> > Theodore Sussman
> >> >
> >> >
> >> > The following section of this message contains a
> >> file
> >> attachment
> >> > prepared for transmission using the Internet MIME
> >> message format.
> >> > If you are using Pegasus Mail, or any other MIME-
> >> compliant system,
> >> > you should be able to save it or view it from within
> >> your mailer.
> >> > If you cannot, please ask your system administrator
> >> for assistance.
> >> >
> >> > ---- File information -----------
> >> > File: config.log.bz2
> >> > Date: 14 Jun 2017, 13:35
> >> > Size: 146548 bytes.
> >> > Type: Binary
> >> >
> >> >
> >> > The following section of this message contains a
> >> file
> >> attachment
> >> > prepared for transmission using the Internet MIME
> >> message format.
> >> > If you are using Pegasus Mail, or any other MIME-
> >> compliant system,
> >> > you should be able to save it or view it from within
> >> your mailer.
> >> > If you cannot, please ask your system administrator
> >> for assistance.
> >> >
> >> > ---- File information -----------
> >> > File: ompi_info.bz2
> >> > Date: 14 Jun 2017, 13:35
> >> > Size: 24088 bytes.
> >> > Type: Binary
> >> >
> >> >
> >> > The following section of this message contains a
> >> file
> >> attachment
> >> > prepared for transmission using the Internet MIME
> >> message format.
> >> > If you are using Pegasus Mail, or any other MIME-
> >> compliant system,
> >> > you should be able to save it or view it from within
> >> your mailer.
> >> > If you cannot, please ask your system administrator
> >> for assistance.
> >> >
> >> > ---- File information -----------
> >> > File: aborttest02.tgz
> >> > Date: 14 Jun 2017, 13:52
> >> > Size: 4285 bytes.
> >> > Type: Binary
> >> >
> >> >
> >> >
> >> ________________________________________
> >> _______
> >> > users mailing list
> >> > ***@lists.open-mpi.org
> >> >
> >> https://rfd.newmexicoconsortium.org/mailman/listin
> >> fo/users
> >>
> >>
> >> >
> >> >
> >> ________________________________________
> >> _______
> >> > users mailing list
> >> > ***@lists.open-mpi.org
> >> >
> >> https://rfd.newmexicoconsortium.org/mailman/listin
> >> fo/users
> >>
> >>
> >> >
> >> >
> >> >
> >> >
> >> ________________________________________
> >> _______
> >> > users mailing list
> >> > ***@lists.open-mpi.org
> >> >
> >> https://rfd.newmexicoconsortium.org/mailman/listin
> >> fo/users
> >>
> >>
> >> >
> >> >
> >> ________________________________________
> >> _______
> >> > users mailing list
> >> > ***@lists.open-mpi.org
> >> >
> >> https://rfd.newmexicoconsortium.org/mailman/listin
> >> fo/users
> >>
> >>
> >> >
> >> >
> >> >
> >> >
> >> ________________________________________
> >> _______
> >> > users mailing list
> >> > ***@lists.open-mpi.org
> >> >
> >> https://rfd.newmexicoconsortium.org/mailman/listin
> >> fo/users
> >>
> >>
> >> >
> >>
> >>
> >> __________________________________________
> >> _____
> >> users mailing list
> >> ***@lists.open-mpi.org
> >>
> >> https://rfd.newmexicoconsortium.org/mailman/listin
> >> fo/users
> >>
> >>
> >>
> >> _____________________________________________
> >> __
> >> users mailing list
> >> ***@lists.open-mpi.org
> >> https://rfd.newmexicoconsortium.org/mailman/listinfo/us
> >> ers
> >>
> >>
> >> --
> >> Jeff Squyres
> >> ***@cisco.com
> >>
> >> _______________________________________________
> >> users mailing list
> >> ***@lists.open-mpi.org
> >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >>
> >>
> >>
> >> _______________________________________________
> >> users mailing list
> >> ***@lists.open-mpi.org
> >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >>
> >> _______________________________________________
> >> users mailing list
> >> ***@lists.open-mpi.org
> >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >>
> >> _______________________________________________
> >> users mailing list
> >> ***@lists.open-mpi.org
> >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >>
> >>
> >>
> >>
> >> The following section of this message contains a file attachment
> >> prepared for transmission using the Internet MIME message format.
> >> If you are using Pegasus Mail, or any other MIME-compliant system,
> >> you should be able to save it or view it from within your mailer.
> >> If you cannot, please ask your system administrator for assistance.
> >>
> >> ---- File information -----------
> >> File: aborttest10.tgz
> >> Date: 19 Jun 2017, 12:42
> >> Size: 4740 bytes.
> >> Type: Binary
> >> <aborttest10.tgz>_______________________________________________
> >> users mailing list
> >> ***@lists.open-mpi.org
> >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >>
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > ***@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Ted Sussman
2017-06-19 17:53:14 UTC
Permalink
For what it's worth, the problem might be related to the following:

mpirun: -np 2 ... dum.sh
dum.sh: Invoke aborttest11.exe
aborttest11.exe: Call MPI_Init, go into an infinite loop.

Now when mpirun is running, send signals at the processes, as follows:

1) kill -9 (pid for one of the aborttest11.exe processes)

The shell for this aborttest11.exe continues. Once this shell exits, then Open MPI sends
signals to both shells, killing the other shell, but the remaining aborttest11.exe survives. The
PPID for the remaining aborttest11.exe becomes 1.

2) kill -9 (pid for one of the dum.sh processes).

Open MPI sends signals to both of the shells. Both shells are killed off, but both
aborttest11.exe processes survive, with PPID set to 1.


On 19 Jun 2017 at 10:10, ***@open-mpi.org wrote:

>
> That is typical behavior when you throw something into "sleep" - not much we can do about it, I
> think.
>
> On Jun 19, 2017, at 9:58 AM, Ted Sussman <***@adina.com> wrote:
>
> Hello,
>
> I have rebuilt Open MPI 2.1.1 on the same computer, including --enable-debug.
>
> I have attached the abort test program aborttest10.tgz.  This version sleeps for 5 sec before
> calling MPI_ABORT, so that I can check the pids using ps.
>
> This is what happens (see run2.sh.out).
>
> Open MPI invokes two instances of dum.sh.  Each instance of dum.sh invokes aborttest.exe.
>
> Pid    Process
> -------------------
> 19565  dum.sh
> 19566  dum.sh
> 19567 aborttest10.exe
> 19568 aborttest10.exe
>
> When MPI_ABORT is called, Open MPI sends SIGCONT, SIGTERM and SIGKILL to both
> instances of dum.sh (pids 19565 and 19566).
>
> ps shows that both the shell processes vanish, and that one of the aborttest10.exe processes
> vanishes.  But the other aborttest10.exe remains and continues until it is finished sleeping.
>
> Hope that this information is useful.
>
> Sincerely,
>
> Ted Sussman
>
>
>
> On 19 Jun 2017 at 23:06,  ***@rist.or.jp  wrote:
>
>
>  Ted,
>  
> some traces are missing  because you did not configure with --enable-debug
> i am afraid you have to do it (and you probably want to install that debug version in an
> other
> location since its performances are not good for production) in order to get all the logs.
>  
> Cheers,
>  
> Gilles
>  
> ----- Original Message -----
>    Hello Gilles,
>
>    I retried my example, with the same results as I observed before.  The process with rank
> 1
>    does not get killed by MPI_ABORT.
>
>    I have attached to this E-mail:
>
>      config.log.bz2
>      ompi_info.bz2  (uses ompi_info -a)
>      aborttest09.tgz
>
>    This testing is done on a computer running Linux 3.10.0.  This is a different computer
> than
>    the computer that I previously used for testing.  You can confirm that I am using Open
> MPI
>    2.1.1.
>
>    tar xvzf aborttest09.tgz
>    cd aborttest09
>    ./sh run2.sh
>
>    run2.sh contains the command
>
>    /opt/openmpi-2.1.1-GNU/bin/mpirun -np 2 -mca btl tcp,self --mca odls_base_verbose
> 10
>    ./dum.sh
>
>    The output from this run is in aborttest09/run2.sh.out.
>
>    The output shows that the the "default" component is selected by odls.
>
>    The only messages from odls are: odls: launch spawning child ...  (two messages).
> There
>    are no messages from odls with "kill" and I see no SENDING SIGCONT / SIGKILL
>    messages.
>
>    I am not running from within any batch manager.
>
>    Sincerely,
>
>    Ted Sussman
>
>    On 17 Jun 2017 at 16:02, ***@rist.or.jp wrote:
>
> Ted,
>
> i do not observe the same behavior you describe with Open MPI 2.1.1
>
> # mpirun -np 2 -mca btl tcp,self --mca odls_base_verbose 5 ./abort.sh
>
> abort.sh 31361 launching abort
> abort.sh 31362 launching abort
> I am rank 0 with pid 31363
> I am rank 1 with pid 31364
> ------------------------------------------------------------------------
> --
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> with errorcode 1.
>
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> ------------------------------------------------------------------------
> --
> [linux:31356] [[18199,0],0] odls:kill_local_proc working on WILDCARD
> [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> [[18199,1],0]
> [linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],0]
> [linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 31361
> SUCCESS
> [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> [[18199,1],1]
> [linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],1]
> [linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 31362
> SUCCESS
> [linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],0]
> [linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 31361
> SUCCESS
> [linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],1]
> [linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 31362
> SUCCESS
> [linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],0]
> [linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 31361
> SUCCESS
> [linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],1]
> [linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 31362
> SUCCESS
> [linux:31356] [[18199,0],0] odls:kill_local_proc working on WILDCARD
> [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> [[18199,1],0]
> [linux:31356] [[18199,0],0] odls:kill_local_proc child [[18199,1],0] is
> not alive
> [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> [[18199,1],1]
> [linux:31356] [[18199,0],0] odls:kill_local_proc child [[18199,1],1] is
> not alive
>
>
> Open MPI did kill both shells, and they were indeed killed as evidenced
> by ps
>
> #ps -fu gilles --forest
> UID        PID  PPID  C STIME TTY          TIME CMD
> gilles    1564  1561  0 15:39 ?        00:00:01 sshd: ***@pts/1
> gilles    1565  1564  0 15:39 pts/1    00:00:00  \_ -bash
> gilles   31356  1565  3 15:57 pts/1    00:00:00      \_ /home/gilles/
> local/ompi-v2.x/bin/mpirun -np 2 -mca btl tcp,self --mca odls_base
> gilles   31364     1  1 15:57 pts/1    00:00:00 ./abort
>
>
> so trapping SIGTERM in your shell and manually killing the MPI task
> should work
> (as Jeff explained, as long as the shell script is fast enough to do
> that between SIGTERM and SIGKILL)
>
>
> if you observe a different behavior, please double check your Open MPI
> version and post the outputs of the same commands.
>
> btw, are you running from a batch manager ? if yes, which one ?
>
> Cheers,
>
> Gilles
>
> ----- Original Message -----
> Ted,
>
> if you
>
> mpirun --mca odls_base_verbose 10 ...
>
> you will see which processes get killed and how
>
> Best regards,
>
>
> Gilles
>
> ----- Original Message -----
> Hello Jeff,
>
> Thanks for your comments.
>
> I am not seeing behavior #4, on the two computers that I have
> tested
> on, using Open MPI
> 2.1.1.
>
> I wonder if you can duplicate my results with the files that I have
> uploaded.
>
> Regarding what is the "correct" behavior, I am willing to modify my
> application to correspond
> to Open MPI's behavior (whatever behavior the Open MPI
> developers
> decide is best) --
> provided that Open MPI does in fact kill off both shells.
>
> So my highest priority now is to find out why Open MPI 2.1.1 does
> not
> kill off both shells on
> my computer.
>
> Sincerely,
>
> Ted Sussman
>
>   On 16 Jun 2017 at 16:35, Jeff Squyres (jsquyres) wrote:
>
> Ted --
>
> Sorry for jumping in late.  Here's my $0.02...
>
> In the runtime, we can do 4 things:
>
> 1. Kill just the process that we forked.
> 2. Kill just the process(es) that call back and identify
> themselves
> as MPI processes (we don't track this right now, but we could add that
> functionality).
> 3. Union of #1 and #2.
> 4. Kill all processes (to include any intermediate processes
> that
> are not included in #1 and #2).
>
> In Open MPI 2.x, #4 is the intended behavior.  There may be a
> bug
> or
> two that needs to get fixed (e.g., in your last mail, I don't see
> offhand why it waits until the MPI process finishes sleeping), but we
> should be killing the process group, which -- unless any of the
> descendant processes have explicitly left the process group -- should
> hit the entire process tree. 
>
> Sidenote: there's actually a way to be a bit more aggressive
> and
> do
> a better job of ensuring that we kill *all* processes (via creative
> use
> of PR_SET_CHILD_SUBREAPER), but that's basically a future
> enhancement
> /
> optimization.
>
> I think Gilles and Ralph proposed a good point to you: if you
> want
> to be sure to be able to do cleanup after an MPI process terminates (
> normally or abnormally), you should trap signals in your intermediate
> processes to catch what Open MPI's runtime throws and therefore know
> that it is time to cleanup. 
>
> Hypothetically, this should work in all versions of Open MPI...?
>
> I think Ralph made a pull request that adds an MCA param to
> change
> the default behavior from #4 to #1.
>
> Note, however, that there's a little time between when Open
> MPI
> sends the SIGTERM and the SIGKILL, so this solution could be racy.  If
> you find that you're running out of time to cleanup, we might be able
> to
> make the delay between the SIGTERM and SIGKILL be configurable
> (e.g.,
> via MCA param).
>
>
>
>
> On Jun 16, 2017, at 10:08 AM, Ted Sussman
> <***@adina.com
>
> wrote:
>
> Hello Gilles and Ralph,
>
> Thank you for your advice so far.  I appreciate the time
> that
> you
> have spent to educate me about the details of Open MPI.
>
> But I think that there is something fundamental that I
> don't
> understand.  Consider Example 2 run with Open MPI 2.1.1.
>
> mpirun --> shell for process 0 -->  executable for process
> 0 -->
> MPI calls, MPI_Abort
>         --> shell for process 1 -->  executable for process 1 -->
> MPI calls
>
> After the MPI_Abort is called, ps shows that both shells
> are
> running, and that the executable for process 1 is running (in this
> case,
> process 1 is sleeping).  And mpirun does not exit until process 1 is
> finished sleeping.
>
> I cannot reconcile this observed behavior with the
> statement
>
>       >     2.x: each process is put into its own process group
> upon launch. When we issue a
>      >     "kill", we issue it to the process group. Thus,
> every
> child proc of that child proc will
>      >     receive it. IIRC, this was the intended behavior.
>
> I assume that, for my example, there are two process
> groups. 
> The
> process group for process 0 contains the shell for process 0 and the
> executable for process 0; and the process group for process 1 contains
> the shell for process 1 and the executable for process 1.  So what
> does
> MPI_ABORT do?  MPI_ABORT does not kill the process group for process
> 0,
>  
> since the shell for process 0 continues.  And MPI_ABORT does not kill
> the process group for process 1, since both the shell and executable
> for
> process 1 continue.
>
> If I hit Ctrl-C after MPI_Abort is called, I get the message
>
> mpirun: abort is already in progress.. hit ctrl-c again to
> forcibly terminate
>
> but I don't need to hit Ctrl-C again because mpirun
> immediately
> exits.
>
> Can you shed some light on all of this?
>
> Sincerely,
>
> Ted Sussman
>
>
> On 15 Jun 2017 at 14:44, ***@open-mpi.org wrote:
>
>
> You have to understand that we have no way of
> knowing who is
> making MPI calls - all we see is
> the proc that we started, and we know someone of
> that rank is
> running (but we have no way of
> knowing which of the procs you sub-spawned it is).
>
> So the behavior you are seeking only occurred in
> some earlier
> release by sheer accident. Nor will
> you find it portable as there is no specification
> directing
> that
> behavior.
>
> The behavior IŽve provided is to either deliver the
> signal to
> _
> all_ child processes (including
> grandchildren etc.), or _only_ the immediate child
> of the
> daemon.
>   It wonŽt do what you describe -
> kill the mPI proc underneath the shell, but not the
> shell
> itself.
>
> What you can eventually do is use PMIx to ask the
> runtime to
> selectively deliver signals to
> pid/procs for you. We donŽt have that capability
> implemented
> just yet, IŽm afraid.
>
> Meantime, when I get a chance, I can code an
> option that will
> record the pid of the subproc that
> calls MPI_Init, and then letŽs you deliver signals to
> just
> that
> proc. No promises as to when that will
> be done.
>
>
>       On Jun 15, 2017, at 1:37 PM, Ted Sussman
> <ted.sussman@
> adina.
> com> wrote:
>
>      Hello Ralph,
>
>       I am just an Open MPI end user, so I will need to
> wait for
> the next official release.
>
>      mpirun --> shell for process 0 -->  executable for
> process
> 0
> --> MPI calls
>              --> shell for process 1 -->  executable for process
> 1
> --> MPI calls
>                                       ...
>
>      I guess the question is, should MPI_ABORT kill the
> executables or the shells?  I naively
>      thought, that, since it is the executables that make
> the
> MPI
> calls, it is the executables that
>      should be aborted by the call to MPI_ABORT.  Since
> the
> shells don't make MPI calls, the
>       shells should not be aborted.
>
>      And users might have several layers of shells in
> between
> mpirun and the executable.
>
>      So now I will look for the latest version of Open MPI
> that
> has the 1.4.3 behavior.
>
>      Sincerely,
>
>      Ted Sussman
>
>       On 15 Jun 2017 at 12:31, ***@open-mpi.org wrote:
>
>      >
>       > Yeah, things jittered a little there as we debated
> the "
> right" behavior. Generally, when we
>      see that
>      > happening it means that a param is required, but
> somehow
> we never reached that point.
>      >
>      > See if https://github.com/open-mpi/ompi/pull/3704 
> helps
> -
> if so, I can schedule it for the next
>      2.x
>       > release if the RMs agree to take it
>      >
>      > Ralph
>       >
>      >     On Jun 15, 2017, at 12:20 PM, Ted Sussman <ted.
> sussman
> @adina.com > wrote:
>       >
>      >     Thank you for your comments.
>       >   
>      >     Our application relies upon "dum.sh" to clean up
> after
> the process exits, either if the
>       process
>      >     exits normally, or if the process exits abnormally
> because of MPI_ABORT.  If the process
>       >     group is killed by MPI_ABORT, this clean up will not
> be performed.  If exec is used to launch
>      >     the executable from dum.sh, then dum.sh is
> terminated
> by the exec, so dum.sh cannot
>      >     perform any clean up.
>      >   
>       >     I suppose that other user applications might work
> similarly, so it would be good to have an
>      >     MCA parameter to control the behavior of
> MPI_ABORT.
>      >   
>      >     We could rewrite our shell script that invokes
> mpirun,
> so that the cleanup that is now done
>      >     by
>       >     dum.sh is done by the invoking shell script after
> mpirun exits.  Perhaps this technique is the
>      >     preferred way to clean up after mpirun is invoked.
>       >   
>      >     By the way, I have also tested with Open MPI
> 1.10.7,
> and Open MPI 1.10.7 has different
>       >     behavior than either Open MPI 1.4.3 or Open MPI
> 2.1.
> 1.
>    In this explanation, it is important to
>       >     know that the aborttest executable sleeps for 20
> sec.
>      >   
>       >     When running example 2:
>      >   
>      >     1.4.3: process 1 immediately aborts
>      >     1.10.7: process 1 doesn't abort and never stops.
>       >     2.1.1 process 1 doesn't abort, but stops after it is
> finished sleeping
>      >   
>      >     Sincerely,
>      >   
>      >     Ted Sussman
>       >   
>      >     On 15 Jun 2017 at 9:18, ***@open-mpi.org wrote:
>      >
>      >     Here is how the system is working:
>       >   
>      >     Master: each process is put into its own process
> group
> upon launch. When we issue a
>      >     "kill", however, we only issue it to the individual
> process (instead of the process group
>      >     that is headed by that child process). This is
> probably a bug as I donŽt believe that is
>      >     what we intended, but set that aside for now.
>       >   
>      >     2.x: each process is put into its own process group
> upon launch. When we issue a
>      >     "kill", we issue it to the process group. Thus,
> every
> child proc of that child proc will
>      >     receive it. IIRC, this was the intended behavior.
>       >   
>      >     It is rather trivial to make the change (it only
> involves 3 lines of code), but IŽm not sure
>      >     of what our intended behavior is supposed to be.
> Once
> we clarify that, it is also trivial
>      >     to add another MCA param (you can never have too
> many!)
>   to allow you to select the
>      >     other behavior.
>      >   
>      >
>       >     On Jun 15, 2017, at 5:23 AM, Ted Sussman <ted.
> sussman@
> adina.com > wrote:
>      >   
>      >     Hello Gilles,
>      >   
>       >     Thank you for your quick answer.  I confirm that if
> exec is used, both processes
>      >     immediately
>       >     abort.
>      >   
>       >     Now suppose that the line
>      >   
>      >     echo "After aborttest:
>      >    
> OMPI_COMM_WORLD_RANK="$OMPI_COMM_
> WORLD_RANK
>       >   
>      >     is added to the end of dum.sh.
>      >   
>      >     If Example 2 is run with Open MPI 1.4.3, the output
> is
>      >   
>      >     After aborttest: OMPI_COMM_WORLD_RANK=0
>      >   
>      >     which shows that the shell script for the process
> with
> rank 0 continues after the
>       >     abort,
>      >     but that the shell script for the process with rank
> 1
> does not continue after the
>       >     abort.
>      >   
>       >     If Example 2 is run with Open MPI 2.1.1, with exec
> used to invoke
>      >     aborttest02.exe, then
>      >     there is no such output, which shows that both shell
> scripts do not continue after
>      >     the abort.
>      >   
>       >     I prefer the Open MPI 1.4.3 behavior because our
> original application depends
>      >     upon the
>       >     Open MPI 1.4.3 behavior.  (Our original application
> will also work if both
>      >     executables are
>       >     aborted, and if both shell scripts continue after
> the
> abort.)
>      >   
>       >     It might be too much to expect, but is there a way
> to
> recover the Open MPI 1.4.3
>      >     behavior
>       >     using Open MPI 2.1.1? 
>      >   
>       >     Sincerely,
>      >   
>      >     Ted Sussman
>      >   
>      >   
>       >     On 15 Jun 2017 at 9:50, Gilles Gouaillardet wrote:
>      >
>      >     Ted,
>      >   
>       >   
>      >     fwiw, the 'master' branch has the behavior you
> expect.
>      >   
>      >   
>      >     meanwhile, you can simple edit your 'dum.sh' script
> and replace
>       >   
>      >     /home/buildadina/src/aborttest02/aborttest02.exe
>       >   
>      >     with
>       >   
>      >     exec /home/buildadina/src/aborttest02/aborttest02.
> exe
>       >   
>      >   
>      >     Cheers,
>      >   
>      >   
>      >     Gilles
>      >   
>       >   
>      >     On 6/15/2017 3:01 AM, Ted Sussman wrote:
>       >     Hello,
>      >   
>      >     My question concerns MPI_ABORT, indirect
> execution
> of
>      >     executables by mpirun and Open
>      >     MPI 2.1.1.  When mpirun runs executables directly,
> MPI
> _ABORT
>      >     works as expected, but
>       >     when mpirun runs executables indirectly,
> MPI_ABORT
> does not
>      >     work as expected.
>      >   
>      >     If Open MPI 1.4.3 is used instead of Open MPI
> 2.1.1,
> MPI_ABORT
>      >     works as expected in all
>       >     cases.
>      >   
>       >     The examples given below have been simplified as
> far
> as possible
>      >     to show the issues.
>      >   
>      >     ---
>      >   
>       >     Example 1
>      >   
>       >     Consider an MPI job run in the following way:
>      >   
>       >     mpirun ... -app addmpw1
>      >   
>      >     where the appfile addmpw1 lists two executables:
>      >   
>      >     -n 1 -host gulftown ... aborttest02.exe
>      >     -n 1 -host gulftown ... aborttest02.exe
>       >   
>      >     The two executables are executed on the local node
> gulftown.
>      >      aborttest02 calls MPI_ABORT
>      >     for rank 0, then sleeps.
>      >   
>      >     The above MPI job runs as expected.  Both
> processes
> immediately
>      >     abort when rank 0 calls
>      >     MPI_ABORT.
>      >   
>       >     ---
>      >   
>       >     Example 2
>      >   
>      >     Now change the above example as follows:
>      >   
>      >     mpirun ... -app addmpw2
>      >   
>      >     where the appfile addmpw2 lists shell scripts:
>      >   
>      >     -n 1 -host gulftown ... dum.sh
>      >     -n 1 -host gulftown ... dum.sh
>      >   
>      >     dum.sh invokes aborttest02.exe.  So aborttest02.exe
> is
> executed
>      >     indirectly by mpirun.
>      >   
>      >     In this case, the MPI job only aborts process 0 when
> rank 0 calls
>       >     MPI_ABORT.  Process 1
>      >     continues to run.  This behavior is unexpected.
>      >   
>      >     ----
>       >   
>      >     I have attached all files to this E-mail.  Since
> there
> are absolute
>       >     pathnames in the files, to
>      >     reproduce my findings, you will need to update the
> pathnames in the
>       >     appfiles and shell
>      >     scripts.  To run example 1,
>       >   
>      >     sh run1.sh
>       >   
>      >     and to run example 2,
>      >   
>      >     sh run2.sh
>      >   
>       >     ---
>      >   
>       >     I have tested these examples with Open MPI 1.4.3
> and
> 2.
> 0.3.  In
>      >     Open MPI 1.4.3, both
>       >     examples work as expected.  Open MPI 2.0.3 has
> the
> same behavior
>      >     as Open MPI 2.1.1.
>      >   
>      >     ---
>       >   
>      >     I would prefer that Open MPI 2.1.1 aborts both
> processes, even
>      >     when the executables are
>      >     invoked indirectly by mpirun.  If there is an MCA
> setting that is
>      >     needed to make Open MPI
>      >     2.1.1 abort both processes, please let me know.
>       >   
>      >   
>      >     Sincerely,
>      >   
>      >     Theodore Sussman
>       >   
>      >   
>       >     The following section of this message contains a
> file
> attachment
>      >     prepared for transmission using the Internet MIME
> message format.
>       >     If you are using Pegasus Mail, or any other MIME-
> compliant system,
>      >     you should be able to save it or view it from within
> your mailer.
>      >     If you cannot, please ask your system administrator
> for assistance.
>      >   
>      >       ---- File information -----------
>      >         File:  config.log.bz2
>      >         Date:  14 Jun 2017, 13:35
>      >         Size:  146548 bytes.
>       >         Type:  Binary
>      >   
>       >   
>      >     The following section of this message contains a
> file
> attachment
>       >     prepared for transmission using the Internet MIME
> message format.
>      >     If you are using Pegasus Mail, or any other MIME-
> compliant system,
>      >     you should be able to save it or view it from within
> your mailer.
>      >     If you cannot, please ask your system administrator
> for assistance.
>      >   
>      >       ---- File information -----------
>      >         File:  ompi_info.bz2
>      >         Date:  14 Jun 2017, 13:35
>       >         Size:  24088 bytes.
>      >         Type:  Binary
>       >   
>      >   
>       >     The following section of this message contains a
> file
> attachment
>      >     prepared for transmission using the Internet MIME
> message format.
>       >     If you are using Pegasus Mail, or any other MIME-
> compliant system,
>      >     you should be able to save it or view it from within
> your mailer.
>      >     If you cannot, please ask your system administrator
> for assistance.
>      >   
>      >       ---- File information -----------
>      >         File:  aborttest02.tgz
>      >         Date:  14 Jun 2017, 13:52
>      >         Size:  4285 bytes.
>       >         Type:  Binary
>      >   
>       >   
>      >    
> ________________________________________
> _______
>       >     users mailing list
>      >     ***@lists.open-mpi.org
>       >    
> https://rfd.newmexicoconsortium.org/mailman/listin
> fo/users
>
>
>      >   
>      >    
> ________________________________________
> _______
>       >     users mailing list
>      >     ***@lists.open-mpi.org
>      >    
> https://rfd.newmexicoconsortium.org/mailman/listin
> fo/users
>
>
>      >   
>      >   
>       >   
>      >    
> ________________________________________
> _______
>       >     users mailing list
>      >     ***@lists.open-mpi.org
>       >    
> https://rfd.newmexicoconsortium.org/mailman/listin
> fo/users
>
>
>      >   
>      >    
> ________________________________________
> _______
>       >     users mailing list
>      >     ***@lists.open-mpi.org
>      >    
> https://rfd.newmexicoconsortium.org/mailman/listin
> fo/users
>
>
>      >   
>      >   
>       >   
>      >    
> ________________________________________
> _______
>       >     users mailing list
>      >     ***@lists.open-mpi.org
>       >    
> https://rfd.newmexicoconsortium.org/mailman/listin
> fo/users
>
>
>      >
>
>       
>      __________________________________________
> _____
>       users mailing list
>      ***@lists.open-mpi.org
>     
>  https://rfd.newmexicoconsortium.org/mailman/listin
> fo/users
>
>
>   
> _____________________________________________
> __
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/us
> ers
>
>
> --
> Jeff Squyres
> ***@cisco.com
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>      
>
>
> The following section of this message contains a file attachment
> prepared for transmission using the Internet MIME message format.
> If you are using Pegasus Mail, or any other MIME-compliant system,
> you should be able to save it or view it from within your mailer.
> If you cannot, please ask your system administrator for assistance.
>
>   ---- File information -----------
>     File:  aborttest10.tgz
>     Date:  19 Jun 2017, 12:42
>     Size:  4740 bytes.
>     Type:  Binary
> <aborttest10.tgz>_______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
r***@open-mpi.org
2017-06-19 18:19:19 UTC
Permalink
> On Jun 19, 2017, at 10:53 AM, Ted Sussman <***@adina.com> wrote:
>
> For what it's worth, the problem might be related to the following:
>
> mpirun: -np 2 ... dum.sh
> dum.sh: Invoke aborttest11.exe
> aborttest11.exe: Call MPI_Init, go into an infinite loop.
>
> Now when mpirun is running, send signals at the processes, as follows:
>
> 1) kill -9 (pid for one of the aborttest11.exe processes)
>
> The shell for this aborttest11.exe continues. Once this shell exits, then Open MPI sends signals to both shells, killing the other shell, but the remaining aborttest11.exe survives. The PPID for the remaining aborttest11.exe becomes 1.

We have no visibility into your aborttest processes since we didn’t launch them. So killing one of them is invisible to us. We can only see the shell scripts.

>
> 2) kill -9 (pid for one of the dum.sh processes).
>
> Open MPI sends signals to both of the shells. Both shells are killed off, but both aborttest11.exe processes survive, with PPID set to 1.

This again is a question of how you handle things in your program. The _only_ process we can see is your script. If you kill a script that started a process, then your process is going to have to know how to detect the script has died and “suicide” - there is nothing we can do to help.

Honestly, it sounds to me like the real problem here is that your .exe program isn’t monitoring the shell above it to know when to “suicide”. I don’t see how we can help you there.

>
>
> On 19 Jun 2017 at 10:10, ***@open-mpi.org wrote:
>
> >
> > That is typical behavior when you throw something into “sleep” - not much we can do about it, I
> > think.
> >
> > On Jun 19, 2017, at 9:58 AM, Ted Sussman <***@adina.com> wrote:
> >
> > Hello,
> >
> > I have rebuilt Open MPI 2.1.1 on the same computer, including --enable-debug.
> >
> > I have attached the abort test program aborttest10.tgz. This version sleeps for 5 sec before
> > calling MPI_ABORT, so that I can check the pids using ps.
> >
> > This is what happens (see run2.sh.out).
> >
> > Open MPI invokes two instances of dum.sh. Each instance of dum.sh invokes aborttest.exe.
> >
> > Pid Process
> > -------------------
> > 19565 dum.sh
> > 19566 dum.sh
> > 19567 aborttest10.exe
> > 19568 aborttest10.exe
> >
> > When MPI_ABORT is called, Open MPI sends SIGCONT, SIGTERM and SIGKILL to both
> > instances of dum.sh (pids 19565 and 19566).
> >
> > ps shows that both the shell processes vanish, and that one of the aborttest10.exe processes
> > vanishes. But the other aborttest10.exe remains and continues until it is finished sleeping.
> >
> > Hope that this information is useful.
> >
> > Sincerely,
> >
> > Ted Sussman
> >
> >
> >
> > On 19 Jun 2017 at 23:06, ***@rist.or.jp wrote:
> >
> >
> > Ted,
> >
> > some traces are missing because you did not configure with --enable-debug
> > i am afraid you have to do it (and you probably want to install that debug version in an
> > other
> > location since its performances are not good for production) in order to get all the logs.
> >
> > Cheers,
> >
> > Gilles
> >
> > ----- Original Message -----
> > Hello Gilles,
> >
> > I retried my example, with the same results as I observed before. The process with rank
> > 1
> > does not get killed by MPI_ABORT.
> >
> > I have attached to this E-mail:
> >
> > config.log.bz2
> > ompi_info.bz2 (uses ompi_info -a)
> > aborttest09.tgz
> >
> > This testing is done on a computer running Linux 3.10.0. This is a different computer
> > than
> > the computer that I previously used for testing. You can confirm that I am using Open
> > MPI
> > 2.1.1.
> >
> > tar xvzf aborttest09.tgz
> > cd aborttest09
> > ./sh run2.sh
> >
> > run2.sh contains the command
> >
> > /opt/openmpi-2.1.1-GNU/bin/mpirun -np 2 -mca btl tcp,self --mca odls_base_verbose
> > 10
> > ./dum.sh
> >
> > The output from this run is in aborttest09/run2.sh.out.
> >
> > The output shows that the the "default" component is selected by odls.
> >
> > The only messages from odls are: odls: launch spawning child ... (two messages).
> > There
> > are no messages from odls with "kill" and I see no SENDING SIGCONT / SIGKILL
> > messages.
> >
> > I am not running from within any batch manager.
> >
> > Sincerely,
> >
> > Ted Sussman
> >
> > On 17 Jun 2017 at 16:02, ***@rist.or.jp wrote:
> >
> > Ted,
> >
> > i do not observe the same behavior you describe with Open MPI 2.1.1
> >
> > # mpirun -np 2 -mca btl tcp,self --mca odls_base_verbose 5 ./abort.sh
> >
> > abort.sh 31361 launching abort
> > abort.sh 31362 launching abort
> > I am rank 0 with pid 31363
> > I am rank 1 with pid 31364
> > ------------------------------------------------------------------------
> > --
> > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> > with errorcode 1.
> >
> > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> > You may or may not see output from other processes, depending on
> > exactly when Open MPI kills them.
> > ------------------------------------------------------------------------
> > --
> > [linux:31356] [[18199,0],0] odls:kill_local_proc working on WILDCARD
> > [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> > [[18199,1],0]
> > [linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],0]
> > [linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 31361
> > SUCCESS
> > [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> > [[18199,1],1]
> > [linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],1]
> > [linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 31362
> > SUCCESS
> > [linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],0]
> > [linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 31361
> > SUCCESS
> > [linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],1]
> > [linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 31362
> > SUCCESS
> > [linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],0]
> > [linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 31361
> > SUCCESS
> > [linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],1]
> > [linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 31362
> > SUCCESS
> > [linux:31356] [[18199,0],0] odls:kill_local_proc working on WILDCARD
> > [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> > [[18199,1],0]
> > [linux:31356] [[18199,0],0] odls:kill_local_proc child [[18199,1],0] is
> > not alive
> > [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> > [[18199,1],1]
> > [linux:31356] [[18199,0],0] odls:kill_local_proc child [[18199,1],1] is
> > not alive
> >
> >
> > Open MPI did kill both shells, and they were indeed killed as evidenced
> > by ps
> >
> > #ps -fu gilles --forest
> > UID PID PPID C STIME TTY TIME CMD
> > gilles 1564 1561 0 15:39 ? 00:00:01 sshd: ***@pts/1
> > gilles 1565 1564 0 15:39 pts/1 00:00:00 \_ -bash
> > gilles 31356 1565 3 15:57 pts/1 00:00:00 \_ /home/gilles/
> > local/ompi-v2.x/bin/mpirun -np 2 -mca btl tcp,self --mca odls_base
> > gilles 31364 1 1 15:57 pts/1 00:00:00 ./abort
> >
> >
> > so trapping SIGTERM in your shell and manually killing the MPI task
> > should work
> > (as Jeff explained, as long as the shell script is fast enough to do
> > that between SIGTERM and SIGKILL)
> >
> >
> > if you observe a different behavior, please double check your Open MPI
> > version and post the outputs of the same commands.
> >
> > btw, are you running from a batch manager ? if yes, which one ?
> >
> > Cheers,
> >
> > Gilles
> >
> > ----- Original Message -----
> > Ted,
> >
> > if you
> >
> > mpirun --mca odls_base_verbose 10 ...
> >
> > you will see which processes get killed and how
> >
> > Best regards,
> >
> >
> > Gilles
> >
> > ----- Original Message -----
> > Hello Jeff,
> >
> > Thanks for your comments.
> >
> > I am not seeing behavior #4, on the two computers that I have
> > tested
> > on, using Open MPI
> > 2.1.1.
> >
> > I wonder if you can duplicate my results with the files that I have
> > uploaded.
> >
> > Regarding what is the "correct" behavior, I am willing to modify my
> > application to correspond
> > to Open MPI's behavior (whatever behavior the Open MPI
> > developers
> > decide is best) --
> > provided that Open MPI does in fact kill off both shells.
> >
> > So my highest priority now is to find out why Open MPI 2.1.1 does
> > not
> > kill off both shells on
> > my computer.
> >
> > Sincerely,
> >
> > Ted Sussman
> >
> > On 16 Jun 2017 at 16:35, Jeff Squyres (jsquyres) wrote:
> >
> > Ted --
> >
> > Sorry for jumping in late. Here's my $0.02...
> >
> > In the runtime, we can do 4 things:
> >
> > 1. Kill just the process that we forked.
> > 2. Kill just the process(es) that call back and identify
> > themselves
> > as MPI processes (we don't track this right now, but we could add that
> > functionality).
> > 3. Union of #1 and #2.
> > 4. Kill all processes (to include any intermediate processes
> > that
> > are not included in #1 and #2).
> >
> > In Open MPI 2.x, #4 is the intended behavior. There may be a
> > bug
> > or
> > two that needs to get fixed (e.g., in your last mail, I don't see
> > offhand why it waits until the MPI process finishes sleeping), but we
> > should be killing the process group, which -- unless any of the
> > descendant processes have explicitly left the process group -- should
> > hit the entire process tree.
> >
> > Sidenote: there's actually a way to be a bit more aggressive
> > and
> > do
> > a better job of ensuring that we kill *all* processes (via creative
> > use
> > of PR_SET_CHILD_SUBREAPER), but that's basically a future
> > enhancement
> > /
> > optimization.
> >
> > I think Gilles and Ralph proposed a good point to you: if you
> > want
> > to be sure to be able to do cleanup after an MPI process terminates (
> > normally or abnormally), you should trap signals in your intermediate
> > processes to catch what Open MPI's runtime throws and therefore know
> > that it is time to cleanup.
> >
> > Hypothetically, this should work in all versions of Open MPI...?
> >
> > I think Ralph made a pull request that adds an MCA param to
> > change
> > the default behavior from #4 to #1.
> >
> > Note, however, that there's a little time between when Open
> > MPI
> > sends the SIGTERM and the SIGKILL, so this solution could be racy. If
> > you find that you're running out of time to cleanup, we might be able
> > to
> > make the delay between the SIGTERM and SIGKILL be configurable
> > (e.g.,
> > via MCA param).
> >
> >
> >
> >
> > On Jun 16, 2017, at 10:08 AM, Ted Sussman
> > <***@adina.com
> >
> > wrote:
> >
> > Hello Gilles and Ralph,
> >
> > Thank you for your advice so far. I appreciate the time
> > that
> > you
> > have spent to educate me about the details of Open MPI.
> >
> > But I think that there is something fundamental that I
> > don't
> > understand. Consider Example 2 run with Open MPI 2.1.1.
> >
> > mpirun --> shell for process 0 --> executable for process
> > 0 -->
> > MPI calls, MPI_Abort
> > --> shell for process 1 --> executable for process 1 -->
> > MPI calls
> >
> > After the MPI_Abort is called, ps shows that both shells
> > are
> > running, and that the executable for process 1 is running (in this
> > case,
> > process 1 is sleeping). And mpirun does not exit until process 1 is
> > finished sleeping.
> >
> > I cannot reconcile this observed behavior with the
> > statement
> >
> > > 2.x: each process is put into its own process group
> > upon launch. When we issue a
> > > "kill", we issue it to the process group. Thus,
> > every
> > child proc of that child proc will
> > > receive it. IIRC, this was the intended behavior.
> >
> > I assume that, for my example, there are two process
> > groups.
> > The
> > process group for process 0 contains the shell for process 0 and the
> > executable for process 0; and the process group for process 1 contains
> > the shell for process 1 and the executable for process 1. So what
> > does
> > MPI_ABORT do? MPI_ABORT does not kill the process group for process
> > 0,
> >
> > since the shell for process 0 continues. And MPI_ABORT does not kill
> > the process group for process 1, since both the shell and executable
> > for
> > process 1 continue.
> >
> > If I hit Ctrl-C after MPI_Abort is called, I get the message
> >
> > mpirun: abort is already in progress.. hit ctrl-c again to
> > forcibly terminate
> >
> > but I don't need to hit Ctrl-C again because mpirun
> > immediately
> > exits.
> >
> > Can you shed some light on all of this?
> >
> > Sincerely,
> >
> > Ted Sussman
> >
> >
> > On 15 Jun 2017 at 14:44, ***@open-mpi.org wrote:
> >
> >
> > You have to understand that we have no way of
> > knowing who is
> > making MPI calls - all we see is
> > the proc that we started, and we know someone of
> > that rank is
> > running (but we have no way of
> > knowing which of the procs you sub-spawned it is).
> >
> > So the behavior you are seeking only occurred in
> > some earlier
> > release by sheer accident. Nor will
> > you find it portable as there is no specification
> > directing
> > that
> > behavior.
> >
> > The behavior IÂŽve provided is to either deliver the
> > signal to
> > _
> > all_ child processes (including
> > grandchildren etc.), or _only_ the immediate child
> > of the
> > daemon.
> > It wonÂŽt do what you describe -
> > kill the mPI proc underneath the shell, but not the
> > shell
> > itself.
> >
> > What you can eventually do is use PMIx to ask the
> > runtime to
> > selectively deliver signals to
> > pid/procs for you. We donÂŽt have that capability
> > implemented
> > just yet, IÂŽm afraid.
> >
> > Meantime, when I get a chance, I can code an
> > option that will
> > record the pid of the subproc that
> > calls MPI_Init, and then letÂŽs you deliver signals to
> > just
> > that
> > proc. No promises as to when that will
> > be done.
> >
> >
> > On Jun 15, 2017, at 1:37 PM, Ted Sussman
> > <ted.sussman@
> > adina.
> > com> wrote:
> >
> > Hello Ralph,
> >
> > I am just an Open MPI end user, so I will need to
> > wait for
> > the next official release.
> >
> > mpirun --> shell for process 0 --> executable for
> > process
> > 0
> > --> MPI calls
> > --> shell for process 1 --> executable for process
> > 1
> > --> MPI calls
> > ...
> >
> > I guess the question is, should MPI_ABORT kill the
> > executables or the shells? I naively
> > thought, that, since it is the executables that make
> > the
> > MPI
> > calls, it is the executables that
> > should be aborted by the call to MPI_ABORT. Since
> > the
> > shells don't make MPI calls, the
> > shells should not be aborted.
> >
> > And users might have several layers of shells in
> > between
> > mpirun and the executable.
> >
> > So now I will look for the latest version of Open MPI
> > that
> > has the 1.4.3 behavior.
> >
> > Sincerely,
> >
> > Ted Sussman
> >
> > On 15 Jun 2017 at 12:31, ***@open-mpi.org wrote:
> >
> > >
> > > Yeah, things jittered a little there as we debated
> > the "
> > right" behavior. Generally, when we
> > see that
> > > happening it means that a param is required, but
> > somehow
> > we never reached that point.
> > >
> > > See if https://github.com/open-mpi/ompi/pull/3704
> > helps
> > -
> > if so, I can schedule it for the next
> > 2.x
> > > release if the RMs agree to take it
> > >
> > > Ralph
> > >
> > > On Jun 15, 2017, at 12:20 PM, Ted Sussman <ted.
> > sussman
> > @adina.com > wrote:
> > >
> > > Thank you for your comments.
> > >
> > > Our application relies upon "dum.sh" to clean up
> > after
> > the process exits, either if the
> > process
> > > exits normally, or if the process exits abnormally
> > because of MPI_ABORT. If the process
> > > group is killed by MPI_ABORT, this clean up will not
> > be performed. If exec is used to launch
> > > the executable from dum.sh, then dum.sh is
> > terminated
> > by the exec, so dum.sh cannot
> > > perform any clean up.
> > >
> > > I suppose that other user applications might work
> > similarly, so it would be good to have an
> > > MCA parameter to control the behavior of
> > MPI_ABORT.
> > >
> > > We could rewrite our shell script that invokes
> > mpirun,
> > so that the cleanup that is now done
> > > by
> > > dum.sh is done by the invoking shell script after
> > mpirun exits. Perhaps this technique is the
> > > preferred way to clean up after mpirun is invoked.
> > >
> > > By the way, I have also tested with Open MPI
> > 1.10.7,
> > and Open MPI 1.10.7 has different
> > > behavior than either Open MPI 1.4.3 or Open MPI
> > 2.1.
> > 1.
> > In this explanation, it is important to
> > > know that the aborttest executable sleeps for 20
> > sec.
> > >
> > > When running example 2:
> > >
> > > 1.4.3: process 1 immediately aborts
> > > 1.10.7: process 1 doesn't abort and never stops.
> > > 2.1.1 process 1 doesn't abort, but stops after it is
> > finished sleeping
> > >
> > > Sincerely,
> > >
> > > Ted Sussman
> > >
> > > On 15 Jun 2017 at 9:18, ***@open-mpi.org wrote:
> > >
> > > Here is how the system is working:
> > >
> > > Master: each process is put into its own process
> > group
> > upon launch. When we issue a
> > > "kill", however, we only issue it to the individual
> > process (instead of the process group
> > > that is headed by that child process). This is
> > probably a bug as I donÂŽt believe that is
> > > what we intended, but set that aside for now.
> > >
> > > 2.x: each process is put into its own process group
> > upon launch. When we issue a
> > > "kill", we issue it to the process group. Thus,
> > every
> > child proc of that child proc will
> > > receive it. IIRC, this was the intended behavior.
> > >
> > > It is rather trivial to make the change (it only
> > involves 3 lines of code), but IÂŽm not sure
> > > of what our intended behavior is supposed to be.
> > Once
> > we clarify that, it is also trivial
> > > to add another MCA param (you can never have too
> > many!)
> > to allow you to select the
> > > other behavior.
> > >
> > >
> > > On Jun 15, 2017, at 5:23 AM, Ted Sussman <ted.
> > sussman@
> > adina.com > wrote:
> > >
> > > Hello Gilles,
> > >
> > > Thank you for your quick answer. I confirm that if
> > exec is used, both processes
> > > immediately
> > > abort.
> > >
> > > Now suppose that the line
> > >
> > > echo "After aborttest:
> > >
> > OMPI_COMM_WORLD_RANK="$OMPI_COMM_
> > WORLD_RANK
> > >
> > > is added to the end of dum.sh.
> > >
> > > If Example 2 is run with Open MPI 1.4.3, the output
> > is
> > >
> > > After aborttest: OMPI_COMM_WORLD_RANK=0
> > >
> > > which shows that the shell script for the process
> > with
> > rank 0 continues after the
> > > abort,
> > > but that the shell script for the process with rank
> > 1
> > does not continue after the
> > > abort.
> > >
> > > If Example 2 is run with Open MPI 2.1.1, with exec
> > used to invoke
> > > aborttest02.exe, then
> > > there is no such output, which shows that both shell
> > scripts do not continue after
> > > the abort.
> > >
> > > I prefer the Open MPI 1.4.3 behavior because our
> > original application depends
> > > upon the
> > > Open MPI 1.4.3 behavior. (Our original application
> > will also work if both
> > > executables are
> > > aborted, and if both shell scripts continue after
> > the
> > abort.)
> > >
> > > It might be too much to expect, but is there a way
> > to
> > recover the Open MPI 1.4.3
> > > behavior
> > > using Open MPI 2.1.1?
> > >
> > > Sincerely,
> > >
> > > Ted Sussman
> > >
> > >
> > > On 15 Jun 2017 at 9:50, Gilles Gouaillardet wrote:
> > >
> > > Ted,
> > >
> > >
> > > fwiw, the 'master' branch has the behavior you
> > expect.
> > >
> > >
> > > meanwhile, you can simple edit your 'dum.sh' script
> > and replace
> > >
> > > /home/buildadina/src/aborttest02/aborttest02.exe
> > >
> > > with
> > >
> > > exec /home/buildadina/src/aborttest02/aborttest02.
> > exe
> > >
> > >
> > > Cheers,
> > >
> > >
> > > Gilles
> > >
> > >
> > > On 6/15/2017 3:01 AM, Ted Sussman wrote:
> > > Hello,
> > >
> > > My question concerns MPI_ABORT, indirect
> > execution
> > of
> > > executables by mpirun and Open
> > > MPI 2.1.1. When mpirun runs executables directly,
> > MPI
> > _ABORT
> > > works as expected, but
> > > when mpirun runs executables indirectly,
> > MPI_ABORT
> > does not
> > > work as expected.
> > >
> > > If Open MPI 1.4.3 is used instead of Open MPI
> > 2.1.1,
> > MPI_ABORT
> > > works as expected in all
> > > cases.
> > >
> > > The examples given below have been simplified as
> > far
> > as possible
> > > to show the issues.
> > >
> > > ---
> > >
> > > Example 1
> > >
> > > Consider an MPI job run in the following way:
> > >
> > > mpirun ... -app addmpw1
> > >
> > > where the appfile addmpw1 lists two executables:
> > >
> > > -n 1 -host gulftown ... aborttest02.exe
> > > -n 1 -host gulftown ... aborttest02.exe
> > >
> > > The two executables are executed on the local node
> > gulftown.
> > > aborttest02 calls MPI_ABORT
> > > for rank 0, then sleeps.
> > >
> > > The above MPI job runs as expected. Both
> > processes
> > immediately
> > > abort when rank 0 calls
> > > MPI_ABORT.
> > >
> > > ---
> > >
> > > Example 2
> > >
> > > Now change the above example as follows:
> > >
> > > mpirun ... -app addmpw2
> > >
> > > where the appfile addmpw2 lists shell scripts:
> > >
> > > -n 1 -host gulftown ... dum.sh
> > > -n 1 -host gulftown ... dum.sh
> > >
> > > dum.sh invokes aborttest02.exe. So aborttest02.exe
> > is
> > executed
> > > indirectly by mpirun.
> > >
> > > In this case, the MPI job only aborts process 0 when
> > rank 0 calls
> > > MPI_ABORT. Process 1
> > > continues to run. This behavior is unexpected.
> > >
> > > ----
> > >
> > > I have attached all files to this E-mail. Since
> > there
> > are absolute
> > > pathnames in the files, to
> > > reproduce my findings, you will need to update the
> > pathnames in the
> > > appfiles and shell
> > > scripts. To run example 1,
> > >
> > > sh run1.sh
> > >
> > > and to run example 2,
> > >
> > > sh run2.sh
> > >
> > > ---
> > >
> > > I have tested these examples with Open MPI 1.4.3
> > and
> > 2.
> > 0.3. In
> > > Open MPI 1.4.3, both
> > > examples work as expected. Open MPI 2.0.3 has
> > the
> > same behavior
> > > as Open MPI 2.1.1.
> > >
> > > ---
> > >
> > > I would prefer that Open MPI 2.1.1 aborts both
> > processes, even
> > > when the executables are
> > > invoked indirectly by mpirun. If there is an MCA
> > setting that is
> > > needed to make Open MPI
> > > 2.1.1 abort both processes, please let me know.
> > >
> > >
> > > Sincerely,
> > >
> > > Theodore Sussman
> > >
> > >
> > > The following section of this message contains a
> > file
> > attachment
> > > prepared for transmission using the Internet MIME
> > message format.
> > > If you are using Pegasus Mail, or any other MIME-
> > compliant system,
> > > you should be able to save it or view it from within
> > your mailer.
> > > If you cannot, please ask your system administrator
> > for assistance.
> > >
> > > ---- File information -----------
> > > File: config.log.bz2
> > > Date: 14 Jun 2017, 13:35
> > > Size: 146548 bytes.
> > > Type: Binary
> > >
> > >
> > > The following section of this message contains a
> > file
> > attachment
> > > prepared for transmission using the Internet MIME
> > message format.
> > > If you are using Pegasus Mail, or any other MIME-
> > compliant system,
> > > you should be able to save it or view it from within
> > your mailer.
> > > If you cannot, please ask your system administrator
> > for assistance.
> > >
> > > ---- File information -----------
> > > File: ompi_info.bz2
> > > Date: 14 Jun 2017, 13:35
> > > Size: 24088 bytes.
> > > Type: Binary
> > >
> > >
> > > The following section of this message contains a
> > file
> > attachment
> > > prepared for transmission using the Internet MIME
> > message format.
> > > If you are using Pegasus Mail, or any other MIME-
> > compliant system,
> > > you should be able to save it or view it from within
> > your mailer.
> > > If you cannot, please ask your system administrator
> > for assistance.
> > >
> > > ---- File information -----------
> > > File: aborttest02.tgz
> > > Date: 14 Jun 2017, 13:52
> > > Size: 4285 bytes.
> > > Type: Binary
> > >
> > >
> > >
> > ________________________________________
> > _______
> > > users mailing list
> > > ***@lists.open-mpi.org
> > >
> > https://rfd.newmexicoconsortium.org/mailman/listin
> > fo/users
> >
> >
> > >
> > >
> > ________________________________________
> > _______
> > > users mailing list
> > > ***@lists.open-mpi.org
> > >
> > https://rfd.newmexicoconsortium.org/mailman/listin
> > fo/users
> >
> >
> > >
> > >
> > >
> > >
> > ________________________________________
> > _______
> > > users mailing list
> > > ***@lists.open-mpi.org
> > >
> > https://rfd.newmexicoconsortium.org/mailman/listin
> > fo/users
> >
> >
> > >
> > >
> > ________________________________________
> > _______
> > > users mailing list
> > > ***@lists.open-mpi.org
> > >
> > https://rfd.newmexicoconsortium.org/mailman/listin
> > fo/users
> >
> >
> > >
> > >
> > >
> > >
> > ________________________________________
> > _______
> > > users mailing list
> > > ***@lists.open-mpi.org
> > >
> > https://rfd.newmexicoconsortium.org/mailman/listin
> > fo/users
> >
> >
> > >
> >
> >
> > __________________________________________
> > _____
> > users mailing list
> > ***@lists.open-mpi.org
> >
> > https://rfd.newmexicoconsortium.org/mailman/listin
> > fo/users
> >
> >
> >
> > _____________________________________________
> > __
> > users mailing list
> > ***@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/us
> > ers
> >
> >
> > --
> > Jeff Squyres
> > ***@cisco.com
> >
> > _______________________________________________
> > users mailing list
> > ***@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > ***@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
> > _______________________________________________
> > users mailing list
> > ***@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
> > _______________________________________________
> > users mailing list
> > ***@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
> >
> >
> >
> > The following section of this message contains a file attachment
> > prepared for transmission using the Internet MIME message format.
> > If you are using Pegasus Mail, or any other MIME-compliant system,
> > you should be able to save it or view it from within your mailer.
> > If you cannot, please ask your system administrator for assistance.
> >
> > ---- File information -----------
> > File: aborttest10.tgz
> > Date: 19 Jun 2017, 12:42
> > Size: 4740 bytes.
> > Type: Binary
> > <aborttest10.tgz>_______________________________________________
> > users mailing list
> > ***@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
>
>
> The following section of this message contains a file attachment
> prepared for transmission using the Internet MIME message format.
> If you are using Pegasus Mail, or any other MIME-compliant system,
> you should be able to save it or view it from within your mailer.
> If you cannot, please ask your system administrator for assistance.
>
> ---- File information -----------
> File: aborttest11.tgz
> Date: 19 Jun 2017, 13:48
> Size: 3800 bytes.
> Type: Unknown
> <aborttest11.tgz>_______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Ted Sussman
2017-06-19 19:03:30 UTC
Permalink
OK. So the problem in my last E-mail is not quite related to MPI_Abort.

Let's go back to the case when MPI_Abort is called. I thought that Open MPI would send
signals to each of the process groups (not just the processes) that Open MPI creates when
mpirun is called.


>     2.x: each process is put into its own process group
>      upon launch. When we issue a  "kill", we issue it to the process group. Thus,
> every child proc of that child proc will receive it. IIRC, this was the intended behavior.


In my case there are two process groups since -np 2.

Since dum.sh and aborttest11.exe are in the same process group, both of them should be
killed when the process group for dum.sh is killed. And since there are two process groups,
all four processes should be killed by MPI_Abort.

It seems like a lot of work for an executable to keep checking the state of its invoking shell,
and then to commit suicide if the invoking shell is gone. The whole point of MPI_ABORT is
that everything should be killed off, regardless of the current state of the executables.


On 19 Jun 2017 at 11:19, ***@open-mpi.org wrote:

>
>
>
> On Jun 19, 2017, at 10:53 AM, Ted Sussman <***@adina.com> wrote:
>
> For what it's worth, the problem might be related to the following:
>
> mpirun: -np 2 ... dum.sh
> dum.sh: Invoke aborttest11.exe
> aborttest11.exe: Call  MPI_Init, go into an infinite loop.
>
> Now when mpirun is running, send signals at the processes, as follows:
>
> 1) kill -9 (pid for one of the aborttest11.exe processes)
>
> The shell for this aborttest11.exe continues. Once this shell exits, then Open MPI sends
> signals to both shells, killing the other shell, but the remaining aborttest11.exe survives.  The
> PPID for the remaining aborttest11.exe becomes 1.
>
> We have no visibility into your aborttest processes since we didnŽt launch them. So killing one of
> them is invisible to us. We can only see the shell scripts.
>
>
> 2) kill -9 (pid for one of the dum.sh processes).
>
> Open MPI sends signals to both of the shells. Both shells are killed off, but both
> aborttest11.exe processes survive, with PPID set to 1.
>
> This again is a question of how you handle things in your program. The _only_ process we can
> see is your script. If you kill a script that started a process, then your process is going to have to
> know how to detect the script has died and "suicide" - there is nothing we can do to help.
>
> Honestly, it sounds to me like the real problem here is that your .exe program isnŽt monitoring the
> shell above it to know when to "suicide". I donŽt see how we can help you there.
>
>
>
> On 19 Jun 2017 at 10:10, ***@open-mpi.org wrote:
>
> >
> > That is typical behavior when you throw something into "sleep" - not much we can do
> about it, I
> > think.
> >
> >     On Jun 19, 2017, at 9:58 AM, Ted Sussman <***@adina.com > wrote:
> >
> >     Hello,
> >    
> >     I have rebuilt Open MPI 2.1.1 on the same computer, including --enable-debug.
> >    
> >     I have attached the abort test program aborttest10.tgz.  This version sleeps for 5 sec before
> >     calling MPI_ABORT, so that I can check the pids using ps.
> >    
> >     This is what happens (see run2.sh.out).
> >    
> >     Open MPI invokes two instances of dum.sh.  Each instance of dum.sh invokes aborttest.exe.
> >    
> >     Pid    Process
> >     -------------------
> >     19565  dum.sh
> >     19566  dum.sh
> >     19567 aborttest10.exe
> >     19568 aborttest10.exe
> >    
> >     When MPI_ABORT is called, Open MPI sends SIGCONT, SIGTERM and SIGKILL to both
> >     instances of dum.sh (pids 19565 and 19566).
> >    
> >     ps shows that both the shell processes vanish, and that one of the aborttest10.exe
> processes
> >     vanishes.  But the other aborttest10.exe remains and continues until it is finished sleeping.
> >    
> >     Hope that this information is useful.
> >    
> >     Sincerely,
> >    
> >     Ted Sussman
> >    
> >    
> >    
> >     On 19 Jun 2017 at 23:06,  ***@rist.or.jp  wrote:
> >
> >    
> >      Ted,
> >      
> >     some traces are missing  because you did not configure with --enable-debug
> >     i am afraid you have to do it (and you probably want to install that debug version in an
> >     other
> >     location since its performances are not good for production) in order to get all the logs.
> >      
> >     Cheers,
> >      
> >     Gilles
> >      
> >     ----- Original Message -----
> >        Hello Gilles,
> >    
> >        I retried my example, with the same results as I observed before.  The process with rank
> >     1
> >        does not get killed by MPI_ABORT.
> >    
> >        I have attached to this E-mail:
> >    
> >          config.log.bz2
> >          ompi_info.bz2  (uses ompi_info -a)
> >          aborttest09.tgz
> >    
> >        This testing is done on a computer running Linux 3.10.0.  This is a different computer
> >     than
> >        the computer that I previously used for testing.  You can confirm that I am using Open
> >     MPI
> >        2.1.1.
> >    
> >        tar xvzf aborttest09.tgz
> >        cd aborttest09
> >        ./sh run2.sh
> >    
> >        run2.sh contains the command
> >    
> >        /opt/openmpi-2.1.1-GNU/bin/mpirun -np 2 -mca btl tcp,self --mca odls_base_verbose
> >     10
> >        ./dum.sh
> >    
> >        The output from this run is in aborttest09/run2.sh.out.
> >    
> >        The output shows that the the "default" component is selected by odls.
> >    
> >        The only messages from odls are: odls: launch spawning child ...  (two messages).
> >     There
> >        are no messages from odls with "kill" and I see no SENDING SIGCONT / SIGKILL
> >        messages.
> >    
> >        I am not running from within any batch manager.
> >    
> >        Sincerely,
> >    
> >        Ted Sussman
> >    
> >        On 17 Jun 2017 at 16:02, ***@rist.or.jp wrote:
> >
> >     Ted,
> >    
> >     i do not observe the same behavior you describe with Open MPI 2.1.1
> >    
> >     # mpirun -np 2 -mca btl tcp,self --mca odls_base_verbose 5 ./abort.sh
> >    
> >     abort.sh 31361 launching abort
> >     abort.sh 31362 launching abort
> >     I am rank 0 with pid 31363
> >     I am rank 1 with pid 31364
> >     ------------------------------------------------------------------------
> >     --
> >     MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> >     with errorcode 1.
> >    
> >     NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> >     You may or may not see output from other processes, depending on
> >     exactly when Open MPI kills them.
> >     ------------------------------------------------------------------------
> >     --
> >     [linux:31356] [[18199,0],0] odls:kill_local_proc working on WILDCARD
> >     [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> >     [[18199,1],0]
> >     [linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],0]
> >     [linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 31361
> >     SUCCESS
> >     [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> >     [[18199,1],1]
> >     [linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],1]
> >     [linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 31362
> >     SUCCESS
> >     [linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],0]
> >     [linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 31361
> >     SUCCESS
> >     [linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],1]
> >     [linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 31362
> >     SUCCESS
> >     [linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],0]
> >     [linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 31361
> >     SUCCESS
> >     [linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],1]
> >     [linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 31362
> >     SUCCESS
> >     [linux:31356] [[18199,0],0] odls:kill_local_proc working on WILDCARD
> >     [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> >     [[18199,1],0]
> >     [linux:31356] [[18199,0],0] odls:kill_local_proc child [[18199,1],0] is
> >     not alive
> >     [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> >     [[18199,1],1]
> >     [linux:31356] [[18199,0],0] odls:kill_local_proc child [[18199,1],1] is
> >     not alive
> >    
> >    
> >     Open MPI did kill both shells, and they were indeed killed as evidenced
> >     by ps
> >    
> >     #ps -fu gilles --forest
> >     UID        PID  PPID  C STIME TTY          TIME CMD
> >     gilles    1564  1561  0 15:39 ?        00:00:01 sshd: ***@pts/1
> >     gilles    1565  1564  0 15:39 pts/1    00:00:00  \_ -bash
> >     gilles   31356  1565  3 15:57 pts/1    00:00:00      \_ /home/gilles/
> >     local/ompi-v2.x/bin/mpirun -np 2 -mca btl tcp,self --mca odls_base
> >     gilles   31364     1  1 15:57 pts/1    00:00:00 ./abort
> >    
> >    
> >     so trapping SIGTERM in your shell and manually killing the MPI task
> >     should work
> >     (as Jeff explained, as long as the shell script is fast enough to do
> >     that between SIGTERM and SIGKILL)
> >    
> >    
> >     if you observe a different behavior, please double check your Open MPI
> >     version and post the outputs of the same commands.
> >    
> >     btw, are you running from a batch manager ? if yes, which one ?
> >    
> >     Cheers,
> >    
> >     Gilles
> >    
> >     ----- Original Message -----
> >     Ted,
> >    
> >     if you
> >    
> >     mpirun --mca odls_base_verbose 10 ...
> >    
> >     you will see which processes get killed and how
> >    
> >     Best regards,
> >    
> >    
> >     Gilles
> >    
> >     ----- Original Message -----
> >     Hello Jeff,
> >    
> >     Thanks for your comments.
> >    
> >     I am not seeing behavior #4, on the two computers that I have
> >     tested
> >     on, using Open MPI
> >     2.1.1.
> >    
> >     I wonder if you can duplicate my results with the files that I have
> >     uploaded.
> >    
> >     Regarding what is the "correct" behavior, I am willing to modify my
> >     application to correspond
> >     to Open MPI's behavior (whatever behavior the Open MPI
> >     developers
> >     decide is best) --
> >     provided that Open MPI does in fact kill off both shells.
> >    
> >     So my highest priority now is to find out why Open MPI 2.1.1 does
> >     not
> >     kill off both shells on
> >     my computer.
> >    
> >     Sincerely,
> >    
> >     Ted Sussman
> >    
> >       On 16 Jun 2017 at 16:35, Jeff Squyres (jsquyres) wrote:
> >
> >     Ted --
> >    
> >     Sorry for jumping in late.  Here's my $0.02...
> >    
> >     In the runtime, we can do 4 things:
> >    
> >     1. Kill just the process that we forked.
> >     2. Kill just the process(es) that call back and identify
> >     themselves
> >     as MPI processes (we don't track this right now, but we could add that
> >     functionality).
> >     3. Union of #1 and #2.
> >     4. Kill all processes (to include any intermediate processes
> >     that
> >     are not included in #1 and #2).
> >    
> >     In Open MPI 2.x, #4 is the intended behavior.  There may be a
> >     bug
> >     or
> >     two that needs to get fixed (e.g., in your last mail, I don't see
> >     offhand why it waits until the MPI process finishes sleeping), but we
> >     should be killing the process group, which -- unless any of the
> >     descendant processes have explicitly left the process group -- should
> >     hit the entire process tree. 
> >    
> >     Sidenote: there's actually a way to be a bit more aggressive
> >     and
> >     do
> >     a better job of ensuring that we kill *all* processes (via creative
> >     use
> >     of PR_SET_CHILD_SUBREAPER), but that's basically a future
> >     enhancement
> >     /
> >     optimization.
> >    
> >     I think Gilles and Ralph proposed a good point to you: if you
> >     want
> >     to be sure to be able to do cleanup after an MPI process terminates (
> >     normally or abnormally), you should trap signals in your intermediate
> >     processes to catch what Open MPI's runtime throws and therefore know
> >     that it is time to cleanup. 
> >    
> >     Hypothetically, this should work in all versions of Open MPI...?
> >    
> >     I think Ralph made a pull request that adds an MCA param to
> >     change
> >     the default behavior from #4 to #1.
> >    
> >     Note, however, that there's a little time between when Open
> >     MPI
> >     sends the SIGTERM and the SIGKILL, so this solution could be racy.  If
> >     you find that you're running out of time to cleanup, we might be able
> >     to
> >     make the delay between the SIGTERM and SIGKILL be configurable
> >     (e.g.,
> >     via MCA param).
> >    
> >    
> >    
> >
> >     On Jun 16, 2017, at 10:08 AM, Ted Sussman
> >     <***@adina.com
> >    
> >     wrote:
> >    
> >     Hello Gilles and Ralph,
> >    
> >     Thank you for your advice so far.  I appreciate the time
> >     that
> >     you
> >     have spent to educate me about the details of Open MPI.
> >    
> >     But I think that there is something fundamental that I
> >     don't
> >     understand.  Consider Example 2 run with Open MPI 2.1.1.
> >    
> >     mpirun --> shell for process 0 -->  executable for process
> >     0 -->
> >     MPI calls, MPI_Abort
> >             --> shell for process 1 -->  executable for process 1 -->
> >     MPI calls
> >    
> >     After the MPI_Abort is called, ps shows that both shells
> >     are
> >     running, and that the executable for process 1 is running (in this
> >     case,
> >     process 1 is sleeping).  And mpirun does not exit until process 1 is
> >     finished sleeping.
> >    
> >     I cannot reconcile this observed behavior with the
> >     statement
> >
> >           >     2.x: each process is put into its own process group
> >     upon launch. When we issue a
> >          >     "kill", we issue it to the process group. Thus,
> >     every
> >     child proc of that child proc will
> >          >     receive it. IIRC, this was the intended behavior.
> >    
> >     I assume that, for my example, there are two process
> >     groups. 
> >     The
> >     process group for process 0 contains the shell for process 0 and the
> >     executable for process 0; and the process group for process 1 contains
> >     the shell for process 1 and the executable for process 1.  So what
> >     does
> >     MPI_ABORT do?  MPI_ABORT does not kill the process group for process
> >     0,
> >      
> >     since the shell for process 0 continues.  And MPI_ABORT does not kill
> >     the process group for process 1, since both the shell and executable
> >     for
> >     process 1 continue.
> >    
> >     If I hit Ctrl-C after MPI_Abort is called, I get the message
> >    
> >     mpirun: abort is already in progress.. hit ctrl-c again to
> >     forcibly terminate
> >    
> >     but I don't need to hit Ctrl-C again because mpirun
> >     immediately
> >     exits.
> >    
> >     Can you shed some light on all of this?
> >    
> >     Sincerely,
> >    
> >     Ted Sussman
> >    
> >    
> >     On 15 Jun 2017 at 14:44, ***@open-mpi.org wrote:
> >
> >    
> >     You have to understand that we have no way of
> >     knowing who is
> >     making MPI calls - all we see is
> >     the proc that we started, and we know someone of
> >     that rank is
> >     running (but we have no way of
> >     knowing which of the procs you sub-spawned it is).
> >    
> >     So the behavior you are seeking only occurred in
> >     some earlier
> >     release by sheer accident. Nor will
> >     you find it portable as there is no specification
> >     directing
> >     that
> >     behavior.
> >    
> >     The behavior IŽve provided is to either deliver the
> >     signal to
> >     _
> >     all_ child processes (including
> >     grandchildren etc.), or _only_ the immediate child
> >     of the
> >     daemon.
> >       It wonŽt do what you describe -
> >     kill the mPI proc underneath the shell, but not the
> >     shell
> >     itself.
> >    
> >     What you can eventually do is use PMIx to ask the
> >     runtime to
> >     selectively deliver signals to
> >     pid/procs for you. We donŽt have that capability
> >     implemented
> >     just yet, IŽm afraid.
> >    
> >     Meantime, when I get a chance, I can code an
> >     option that will
> >     record the pid of the subproc that
> >     calls MPI_Init, and then letŽs you deliver signals to
> >     just
> >     that
> >     proc. No promises as to when that will
> >     be done.
> >    
> >    
> >           On Jun 15, 2017, at 1:37 PM, Ted Sussman
> >     <ted.sussman@
> >     adina.
> >     com> wrote:
> >    
> >          Hello Ralph,
> >    
> >           I am just an Open MPI end user, so I will need to
> >     wait for
> >     the next official release.
> >    
> >          mpirun --> shell for process 0 -->  executable for
> >     process
> >     0
> >     --> MPI calls
> >                  --> shell for process 1 -->  executable for process
> >     1
> >     --> MPI calls
> >                                           ...
> >    
> >          I guess the question is, should MPI_ABORT kill the
> >     executables or the shells?  I naively
> >          thought, that, since it is the executables that make
> >     the
> >     MPI
> >     calls, it is the executables that
> >          should be aborted by the call to MPI_ABORT.  Since
> >     the
> >     shells don't make MPI calls, the
> >           shells should not be aborted.
> >    
> >          And users might have several layers of shells in
> >     between
> >     mpirun and the executable.
> >    
> >          So now I will look for the latest version of Open MPI
> >     that
> >     has the 1.4.3 behavior.
> >    
> >          Sincerely,
> >    
> >          Ted Sussman
> >    
> >           On 15 Jun 2017 at 12:31, ***@open-mpi.org wrote:
> >    
> >          >
> >           > Yeah, things jittered a little there as we debated
> >     the "
> >     right" behavior. Generally, when we
> >          see that
> >          > happening it means that a param is required, but
> >     somehow
> >     we never reached that point.
> >          >
> >          > See if https://github.com/open-mpi/ompi/pull/3704  
> >     helps
> >     -
> >     if so, I can schedule it for the next
> >          2.x
> >           > release if the RMs agree to take it
> >          >
> >          > Ralph
> >           >
> >          >     On Jun 15, 2017, at 12:20 PM, Ted Sussman <ted.
> >     sussman
> >     @adina.com > wrote:
> >           >
> >          >     Thank you for your comments.
> >           >   
> >          >     Our application relies upon "dum.sh" to clean up
> >     after
> >     the process exits, either if the
> >           process
> >          >     exits normally, or if the process exits abnormally
> >     because of MPI_ABORT.  If the process
> >           >     group is killed by MPI_ABORT, this clean up will not
> >     be performed.  If exec is used to launch
> >          >     the executable from dum.sh, then dum.sh is
> >     terminated
> >     by the exec, so dum.sh cannot
> >          >     perform any clean up.
> >          >   
> >           >     I suppose that other user applications might work
> >     similarly, so it would be good to have an
> >          >     MCA parameter to control the behavior of
> >     MPI_ABORT.
> >          >   
> >          >     We could rewrite our shell script that invokes
> >     mpirun,
> >     so that the cleanup that is now done
> >          >     by
> >           >     dum.sh is done by the invoking shell script after
> >     mpirun exits.  Perhaps this technique is the
> >          >     preferred way to clean up after mpirun is invoked.
> >           >   
> >          >     By the way, I have also tested with Open MPI
> >     1.10.7,
> >     and Open MPI 1.10.7 has different
> >           >     behavior than either Open MPI 1.4.3 or Open MPI
> >     2.1.
> >     1.
> >        In this explanation, it is important to
> >           >     know that the aborttest executable sleeps for 20
> >     sec.
> >          >   
> >           >     When running example 2:
> >          >   
> >          >     1.4.3: process 1 immediately aborts
> >          >     1.10.7: process 1 doesn't abort and never stops.
> >           >     2.1.1 process 1 doesn't abort, but stops after it is
> >     finished sleeping
> >          >   
> >          >     Sincerely,
> >          >   
> >          >     Ted Sussman
> >           >   
> >          >     On 15 Jun 2017 at 9:18, ***@open-mpi.org wrote:
> >          >
> >          >     Here is how the system is working:
> >           >   
> >          >     Master: each process is put into its own process
> >     group
> >     upon launch. When we issue a
> >          >     "kill", however, we only issue it to the individual
> >     process (instead of the process group
> >          >     that is headed by that child process). This is
> >     probably a bug as I donŽt believe that is
> >          >     what we intended, but set that aside for now.
> >           >   
> >          >     2.x: each process is put into its own process group
> >     upon launch. When we issue a
> >          >     "kill", we issue it to the process group. Thus,
> >     every
> >     child proc of that child proc will
> >          >     receive it. IIRC, this was the intended behavior.
> >           >   
> >          >     It is rather trivial to make the change (it only
> >     involves 3 lines of code), but IŽm not sure
> >          >     of what our intended behavior is supposed to be.
> >     Once
> >     we clarify that, it is also trivial
> >          >     to add another MCA param (you can never have too
> >     many!)
> >       to allow you to select the
> >          >     other behavior.
> >          >   
> >          >
> >           >     On Jun 15, 2017, at 5:23 AM, Ted Sussman <ted.
> >     sussman@
> >     adina.com > wrote:
> >          >   
> >          >     Hello Gilles,
> >          >   
> >           >     Thank you for your quick answer.  I confirm that if
> >     exec is used, both processes
> >          >     immediately
> >           >     abort.
> >          >   
> >           >     Now suppose that the line
> >          >   
> >          >     echo "After aborttest:
> >          >    
> >     OMPI_COMM_WORLD_RANK="$OMPI_COMM_
> >     WORLD_RANK
> >           >   
> >          >     is added to the end of dum.sh.
> >          >   
> >          >     If Example 2 is run with Open MPI 1.4.3, the output
> >     is
> >          >   
> >          >     After aborttest: OMPI_COMM_WORLD_RANK=0
> >          >   
> >          >     which shows that the shell script for the process
> >     with
> >     rank 0 continues after the
> >           >     abort,
> >          >     but that the shell script for the process with rank
> >     1
> >     does not continue after the
> >           >     abort.
> >          >   
> >           >     If Example 2 is run with Open MPI 2.1.1, with exec
> >     used to invoke
> >          >     aborttest02.exe, then
> >          >     there is no such output, which shows that both shell
> >     scripts do not continue after
> >          >     the abort.
> >          >   
> >           >     I prefer the Open MPI 1.4.3 behavior because our
> >     original application depends
> >          >     upon the
> >           >     Open MPI 1.4.3 behavior.  (Our original application
> >     will also work if both
> >          >     executables are
> >           >     aborted, and if both shell scripts continue after
> >     the
> >     abort.)
> >          >   
> >           >     It might be too much to expect, but is there a way
> >     to
> >     recover the Open MPI 1.4.3
> >          >     behavior
> >           >     using Open MPI 2.1.1? 
> >          >   
> >           >     Sincerely,
> >          >   
> >          >     Ted Sussman
> >          >   
> >          >   
> >           >     On 15 Jun 2017 at 9:50, Gilles Gouaillardet wrote:
> >          >
> >          >     Ted,
> >          >   
> >           >   
> >          >     fwiw, the 'master' branch has the behavior you
> >     expect.
> >          >   
> >          >   
> >          >     meanwhile, you can simple edit your 'dum.sh' script
> >     and replace
> >           >   
> >          >     /home/buildadina/src/aborttest02/aborttest02.exe
> >           >   
> >          >     with
> >           >   
> >          >     exec /home/buildadina/src/aborttest02/aborttest02.
> >     exe
> >           >   
> >          >   
> >          >     Cheers,
> >          >   
> >          >   
> >          >     Gilles
> >          >   
> >           >   
> >          >     On 6/15/2017 3:01 AM, Ted Sussman wrote:
> >           >     Hello,
> >          >   
> >          >     My question concerns MPI_ABORT, indirect
> >     execution
> >     of
> >          >     executables by mpirun and Open
> >          >     MPI 2.1.1.  When mpirun runs executables directly,
> >     MPI
> >     _ABORT
> >          >     works as expected, but
> >           >     when mpirun runs executables indirectly,
> >     MPI_ABORT
> >     does not
> >          >     work as expected.
> >          >   
> >          >     If Open MPI 1.4.3 is used instead of Open MPI
> >     2.1.1,
> >     MPI_ABORT
> >          >     works as expected in all
> >           >     cases.
> >          >   
> >           >     The examples given below have been simplified as
> >     far
> >     as possible
> >          >     to show the issues.
> >          >   
> >          >     ---
> >          >   
> >           >     Example 1
> >          >   
> >           >     Consider an MPI job run in the following way:
> >          >   
> >           >     mpirun ... -app addmpw1
> >          >   
> >          >     where the appfile addmpw1 lists two executables:
> >          >   
> >          >     -n 1 -host gulftown ... aborttest02.exe
> >          >     -n 1 -host gulftown ... aborttest02.exe
> >           >   
> >          >     The two executables are executed on the local node
> >     gulftown.
> >          >      aborttest02 calls MPI_ABORT
> >          >     for rank 0, then sleeps.
> >          >   
> >          >     The above MPI job runs as expected.  Both
> >     processes
> >     immediately
> >          >     abort when rank 0 calls
> >          >     MPI_ABORT.
> >          >   
> >           >     ---
> >          >   
> >           >     Example 2
> >          >   
> >          >     Now change the above example as follows:
> >          >   
> >          >     mpirun ... -app addmpw2
> >          >   
> >          >     where the appfile addmpw2 lists shell scripts:
> >          >   
> >          >     -n 1 -host gulftown ... dum.sh
> >          >     -n 1 -host gulftown ... dum.sh
> >          >   
> >          >     dum.sh invokes aborttest02.exe.  So aborttest02.exe
> >     is
> >     executed
> >          >     indirectly by mpirun.
> >          >   
> >          >     In this case, the MPI job only aborts process 0 when
> >     rank 0 calls
> >           >     MPI_ABORT.  Process 1
> >          >     continues to run.  This behavior is unexpected.
> >          >   
> >          >     ----
> >           >   
> >          >     I have attached all files to this E-mail.  Since
> >     there
> >     are absolute
> >           >     pathnames in the files, to
> >          >     reproduce my findings, you will need to update the
> >     pathnames in the
> >           >     appfiles and shell
> >          >     scripts.  To run example 1,
> >           >   
> >          >     sh run1.sh
> >           >   
> >          >     and to run example 2,
> >          >   
> >          >     sh run2.sh
> >          >   
> >           >     ---
> >          >   
> >           >     I have tested these examples with Open MPI 1.4.3
> >     and
> >     2.
> >     0.3.  In
> >          >     Open MPI 1.4.3, both
> >           >     examples work as expected.  Open MPI 2.0.3 has
> >     the
> >     same behavior
> >          >     as Open MPI 2.1.1.
> >          >   
> >          >     ---
> >           >   
> >          >     I would prefer that Open MPI 2.1.1 aborts both
> >     processes, even
> >          >     when the executables are
> >          >     invoked indirectly by mpirun.  If there is an MCA
> >     setting that is
> >          >     needed to make Open MPI
> >          >     2.1.1 abort both processes, please let me know.
> >           >   
> >          >   
> >          >     Sincerely,
> >          >   
> >          >     Theodore Sussman
> >           >   
> >          >   
> >           >     The following section of this message contains a
> >     file
> >     attachment
> >          >     prepared for transmission using the Internet MIME
> >     message format.
> >           >     If you are using Pegasus Mail, or any other MIME-
> >     compliant system,
> >          >     you should be able to save it or view it from within
> >     your mailer.
> >          >     If you cannot, please ask your system administrator
> >     for assistance.
> >          >   
> >          >       ---- File information -----------
> >          >         File:  config.log.bz2
> >          >         Date:  14 Jun 2017, 13:35
> >          >         Size:  146548 bytes.
> >           >         Type:  Binary
> >          >   
> >           >   
> >          >     The following section of this message contains a
> >     file
> >     attachment
> >           >     prepared for transmission using the Internet MIME
> >     message format.
> >          >     If you are using Pegasus Mail, or any other MIME-
> >     compliant system,
> >          >     you should be able to save it or view it from within
> >     your mailer.
> >          >     If you cannot, please ask your system administrator
> >     for assistance.
> >          >   
> >          >       ---- File information -----------
> >          >         File:  ompi_info.bz2
> >          >         Date:  14 Jun 2017, 13:35
> >           >         Size:  24088 bytes.
> >          >         Type:  Binary
> >           >   
> >          >   
> >           >     The following section of this message contains a
> >     file
> >     attachment
> >          >     prepared for transmission using the Internet MIME
> >     message format.
> >           >     If you are using Pegasus Mail, or any other MIME-
> >     compliant system,
> >          >     you should be able to save it or view it from within
> >     your mailer.
> >          >     If you cannot, please ask your system administrator
> >     for assistance.
> >          >   
> >          >       ---- File information -----------
> >          >         File:  aborttest02.tgz
> >          >         Date:  14 Jun 2017, 13:52
> >          >         Size:  4285 bytes.
> >           >         Type:  Binary
> >          >   
> >           >   
> >          >    
> >     ________________________________________
> >     _______
> >           >     users mailing list
> >          >     ***@lists.open-mpi.org
> >           >    
> >     https://rfd.newmexicoconsortium.org/mailman/listin
> >     fo/users
> >
> >
> >          >   
> >          >    
> >     ________________________________________
> >     _______
> >           >     users mailing list
> >          >     ***@lists.open-mpi.org
> >          >    
> >     https://rfd.newmexicoconsortium.org/mailman/listin
> >     fo/users
> >
> >
> >          >   
> >          >   
> >           >   
> >          >    
> >     ________________________________________
> >     _______
> >           >     users mailing list
> >          >     ***@lists.open-mpi.org
> >           >    
> >     https://rfd.newmexicoconsortium.org/mailman/listin
> >     fo/users
> >
> >
> >          >   
> >          >    
> >     ________________________________________
> >     _______
> >           >     users mailing list
> >          >     ***@lists.open-mpi.org
> >          >    
> >     https://rfd.newmexicoconsortium.org/mailman/listin
> >     fo/users
> >
> >
> >          >   
> >          >   
> >           >   
> >          >    
> >     ________________________________________
> >     _______
> >           >     users mailing list
> >          >     ***@lists.open-mpi.org
> >           >    
> >     https://rfd.newmexicoconsortium.org/mailman/listin
> >     fo/users
> >
> >
> >          >
> >    
> >           
> >          __________________________________________
> >     _____
> >           users mailing list
> >          ***@lists.open-mpi.org
> >         
> >      https://rfd.newmexicoconsortium.org/mailman/listin
> >     fo/users
> >
> >    
> >       
> >     _____________________________________________
> >     __
> >     users mailing list
> >     ***@lists.open-mpi.org
> >     https://rfd.newmexicoconsortium.org/mailman/listinfo/us
> >     ers
> >    
> >    
> >     --
> >     Jeff Squyres
> >     ***@cisco.com
> >    
> >     _______________________________________________
> >     users mailing list
> >     ***@lists.open-mpi.org
> >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >    
> >    
> >    
> >     _______________________________________________
> >     users mailing list
> >     ***@lists.open-mpi.org
> >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
> >     _______________________________________________
> >     users mailing list
> >     ***@lists.open-mpi.org
> >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
> >     _______________________________________________
> >     users mailing list
> >     ***@lists.open-mpi.org
> >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >    
> >          
> >    
> >    
> >     The following section of this message contains a file attachment
> >     prepared for transmission using the Internet MIME message format.
> >     If you are using Pegasus Mail, or any other MIME-compliant system,
> >     you should be able to save it or view it from within your mailer.
> >     If you cannot, please ask your system administrator for assistance.
> >    
> >       ---- File information -----------
> >         File:  aborttest10.tgz
> >         Date:  19 Jun 2017, 12:42
> >         Size:  4740 bytes.
> >         Type:  Binary
> >     <aborttest10.tgz>_______________________________________________
> >     users mailing list
> >     ***@lists.open-mpi.org
> >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
>
>   
> The following section of this message contains a file attachment
> prepared for transmission using the Internet MIME message format.
> If you are using Pegasus Mail, or any other MIME-compliant system,
> you should be able to save it or view it from within your mailer.
> If you cannot, please ask your system administrator for assistance.
>
>   ---- File information -----------
>     File:  aborttest11.tgz
>     Date:  19 Jun 2017, 13:48
>     Size:  3800 bytes.
>     Type:  Unknown
> <aborttest11.tgz> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
Ted Sussman
2017-06-27 17:58:07 UTC
Permalink
Hello all,

Thank you for your help and advice. It has taken me several days to understand what you
were trying to tell me. I have now studied the problem in more detail, using a version of
Open MPI 2.1.1 built with --enable-debug.

-----

Consider the following scenario in Open MPI 2.1.1:

mpirun --> dum.sh --> aborttest.exe (rank 0)
--> dum.sh --> aborttest.exe (rank 1)

aborttest.exe calls MPI_Bcast several times, then aborttest.exe rank 0 calls MPI_Abort.

As far as I can figure out, this is what happens after aborttest.exe rank 0 calls MPI_Abort.

1) aborttest.exe for rank 0 exits. aborttest.exe for rank 1 is polling (waiting for message from
MPI_Bcast).

2) mpirun (or maybe orted?) sends the signals SIGCONT, SIGTERM, SIGKILL to both
dum.sh processes.

3) Both dum.sh processes are killed.

4) aborttest.exe for rank 1 continues to poll. mpirun never exits.

----

Now suppose that dum.sh traps SIGCONT, and that the trap handler in dum.sh sends signal
SIGINT to $PPID. This is what seems to happen after aborttest.exe rank 0 calls MPI_Abort:

1) aborttest.exe for rank 0 exits. aborttest.exe for rank 1 is polling (waiting for message from
MPI_Bcast).

2) mpirun (or maybe orted?) sends the signals SIGCONT, SIGTERM, SIGKILL to both
dum.sh processes.

3) dum.sh for rank 0 catches SIGCONT and sents SIGINT to its parent. dum.sh for rank 1
appears to be killed (I don't understand this, why doesn't dum.sh for rank 1 also catch
SIGCONT?)

4) mpirun catches the SIGINT and kills aborttest.exe for rank 1, then mpirun exits.

So adding the trap handler to dum.sh solves my problem.

Is this the preferred solution to my problem? Or is there a more elegant solution?

Sincerely,

Ted Sussman








On 19 Jun 2017 at 11:19, ***@open-mpi.org wrote:

>
>
>
> On Jun 19, 2017, at 10:53 AM, Ted Sussman <***@adina.com> wrote:
>
> For what it's worth, the problem might be related to the following:
>
> mpirun: -np 2 ... dum.sh
> dum.sh: Invoke aborttest11.exe
> aborttest11.exe: Call  MPI_Init, go into an infinite loop.
>
> Now when mpirun is running, send signals at the processes, as follows:
>
> 1) kill -9 (pid for one of the aborttest11.exe processes)
>
> The shell for this aborttest11.exe continues. Once this shell exits, then Open MPI sends
> signals to both shells, killing the other shell, but the remaining aborttest11.exe survives.  The
> PPID for the remaining aborttest11.exe becomes 1.
>
> We have no visibility into your aborttest processes since we didnŽt launch them. So killing one of
> them is invisible to us. We can only see the shell scripts.
>
>
> 2) kill -9 (pid for one of the dum.sh processes).
>
> Open MPI sends signals to both of the shells. Both shells are killed off, but both
> aborttest11.exe processes survive, with PPID set to 1.
>
> This again is a question of how you handle things in your program. The _only_ process we can
> see is your script. If you kill a script that started a process, then your process is going to have to
> know how to detect the script has died and "suicide" - there is nothing we can do to help.
>
> Honestly, it sounds to me like the real problem here is that your .exe program isnŽt monitoring the
> shell above it to know when to "suicide". I donŽt see how we can help you there.
>
>
>
> On 19 Jun 2017 at 10:10, ***@open-mpi.org wrote:
>
> >
> > That is typical behavior when you throw something into "sleep" - not much we can do
> about it, I
> > think.
> >
> >     On Jun 19, 2017, at 9:58 AM, Ted Sussman <***@adina.com > wrote:
> >
> >     Hello,
> >    
> >     I have rebuilt Open MPI 2.1.1 on the same computer, including --enable-debug.
> >    
> >     I have attached the abort test program aborttest10.tgz.  This version sleeps for 5 sec before
> >     calling MPI_ABORT, so that I can check the pids using ps.
> >    
> >     This is what happens (see run2.sh.out).
> >    
> >     Open MPI invokes two instances of dum.sh.  Each instance of dum.sh invokes aborttest.exe.
> >    
> >     Pid    Process
> >     -------------------
> >     19565  dum.sh
> >     19566  dum.sh
> >     19567 aborttest10.exe
> >     19568 aborttest10.exe
> >    
> >     When MPI_ABORT is called, Open MPI sends SIGCONT, SIGTERM and SIGKILL to both
> >     instances of dum.sh (pids 19565 and 19566).
> >    
> >     ps shows that both the shell processes vanish, and that one of the aborttest10.exe
> processes
> >     vanishes.  But the other aborttest10.exe remains and continues until it is finished sleeping.
> >    
> >     Hope that this information is useful.
> >    
> >     Sincerely,
> >    
> >     Ted Sussman
> >    
> >    
> >    
> >     On 19 Jun 2017 at 23:06,  ***@rist.or.jp  wrote:
> >
> >    
> >      Ted,
> >      
> >     some traces are missing  because you did not configure with --enable-debug
> >     i am afraid you have to do it (and you probably want to install that debug version in an
> >     other
> >     location since its performances are not good for production) in order to get all the logs.
> >      
> >     Cheers,
> >      
> >     Gilles
> >      
> >     ----- Original Message -----
> >        Hello Gilles,
> >    
> >        I retried my example, with the same results as I observed before.  The process with rank
> >     1
> >        does not get killed by MPI_ABORT.
> >    
> >        I have attached to this E-mail:
> >    
> >          config.log.bz2
> >          ompi_info.bz2  (uses ompi_info -a)
> >          aborttest09.tgz
> >    
> >        This testing is done on a computer running Linux 3.10.0.  This is a different computer
> >     than
> >        the computer that I previously used for testing.  You can confirm that I am using Open
> >     MPI
> >        2.1.1.
> >    
> >        tar xvzf aborttest09.tgz
> >        cd aborttest09
> >        ./sh run2.sh
> >    
> >        run2.sh contains the command
> >    
> >        /opt/openmpi-2.1.1-GNU/bin/mpirun -np 2 -mca btl tcp,self --mca odls_base_verbose
> >     10
> >        ./dum.sh
> >    
> >        The output from this run is in aborttest09/run2.sh.out.
> >    
> >        The output shows that the the "default" component is selected by odls.
> >    
> >        The only messages from odls are: odls: launch spawning child ...  (two messages).
> >     There
> >        are no messages from odls with "kill" and I see no SENDING SIGCONT / SIGKILL
> >        messages.
> >    
> >        I am not running from within any batch manager.
> >    
> >        Sincerely,
> >    
> >        Ted Sussman
> >    
> >        On 17 Jun 2017 at 16:02, ***@rist.or.jp wrote:
> >
> >     Ted,
> >    
> >     i do not observe the same behavior you describe with Open MPI 2.1.1
> >    
> >     # mpirun -np 2 -mca btl tcp,self --mca odls_base_verbose 5 ./abort.sh
> >    
> >     abort.sh 31361 launching abort
> >     abort.sh 31362 launching abort
> >     I am rank 0 with pid 31363
> >     I am rank 1 with pid 31364
> >     ------------------------------------------------------------------------
> >     --
> >     MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> >     with errorcode 1.
> >    
> >     NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> >     You may or may not see output from other processes, depending on
> >     exactly when Open MPI kills them.
> >     ------------------------------------------------------------------------
> >     --
> >     [linux:31356] [[18199,0],0] odls:kill_local_proc working on WILDCARD
> >     [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> >     [[18199,1],0]
> >     [linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],0]
> >     [linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 31361
> >     SUCCESS
> >     [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> >     [[18199,1],1]
> >     [linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],1]
> >     [linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 31362
> >     SUCCESS
> >     [linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],0]
> >     [linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 31361
> >     SUCCESS
> >     [linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],1]
> >     [linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 31362
> >     SUCCESS
> >     [linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],0]
> >     [linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 31361
> >     SUCCESS
> >     [linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],1]
> >     [linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 31362
> >     SUCCESS
> >     [linux:31356] [[18199,0],0] odls:kill_local_proc working on WILDCARD
> >     [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> >     [[18199,1],0]
> >     [linux:31356] [[18199,0],0] odls:kill_local_proc child [[18199,1],0] is
> >     not alive
> >     [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> >     [[18199,1],1]
> >     [linux:31356] [[18199,0],0] odls:kill_local_proc child [[18199,1],1] is
> >     not alive
> >    
> >    
> >     Open MPI did kill both shells, and they were indeed killed as evidenced
> >     by ps
> >    
> >     #ps -fu gilles --forest
> >     UID        PID  PPID  C STIME TTY          TIME CMD
> >     gilles    1564  1561  0 15:39 ?        00:00:01 sshd: ***@pts/1
> >     gilles    1565  1564  0 15:39 pts/1    00:00:00  \_ -bash
> >     gilles   31356  1565  3 15:57 pts/1    00:00:00      \_ /home/gilles/
> >     local/ompi-v2.x/bin/mpirun -np 2 -mca btl tcp,self --mca odls_base
> >     gilles   31364     1  1 15:57 pts/1    00:00:00 ./abort
> >    
> >    
> >     so trapping SIGTERM in your shell and manually killing the MPI task
> >     should work
> >     (as Jeff explained, as long as the shell script is fast enough to do
> >     that between SIGTERM and SIGKILL)
> >    
> >    
> >     if you observe a different behavior, please double check your Open MPI
> >     version and post the outputs of the same commands.
> >    
> >     btw, are you running from a batch manager ? if yes, which one ?
> >    
> >     Cheers,
> >    
> >     Gilles
> >    
> >     ----- Original Message -----
> >     Ted,
> >    
> >     if you
> >    
> >     mpirun --mca odls_base_verbose 10 ...
> >    
> >     you will see which processes get killed and how
> >    
> >     Best regards,
> >    
> >    
> >     Gilles
> >    
> >     ----- Original Message -----
> >     Hello Jeff,
> >    
> >     Thanks for your comments.
> >    
> >     I am not seeing behavior #4, on the two computers that I have
> >     tested
> >     on, using Open MPI
> >     2.1.1.
> >    
> >     I wonder if you can duplicate my results with the files that I have
> >     uploaded.
> >    
> >     Regarding what is the "correct" behavior, I am willing to modify my
> >     application to correspond
> >     to Open MPI's behavior (whatever behavior the Open MPI
> >     developers
> >     decide is best) --
> >     provided that Open MPI does in fact kill off both shells.
> >    
> >     So my highest priority now is to find out why Open MPI 2.1.1 does
> >     not
> >     kill off both shells on
> >     my computer.
> >    
> >     Sincerely,
> >    
> >     Ted Sussman
> >    
> >       On 16 Jun 2017 at 16:35, Jeff Squyres (jsquyres) wrote:
> >
> >     Ted --
> >    
> >     Sorry for jumping in late.  Here's my $0.02...
> >    
> >     In the runtime, we can do 4 things:
> >    
> >     1. Kill just the process that we forked.
> >     2. Kill just the process(es) that call back and identify
> >     themselves
> >     as MPI processes (we don't track this right now, but we could add that
> >     functionality).
> >     3. Union of #1 and #2.
> >     4. Kill all processes (to include any intermediate processes
> >     that
> >     are not included in #1 and #2).
> >    
> >     In Open MPI 2.x, #4 is the intended behavior.  There may be a
> >     bug
> >     or
> >     two that needs to get fixed (e.g., in your last mail, I don't see
> >     offhand why it waits until the MPI process finishes sleeping), but we
> >     should be killing the process group, which -- unless any of the
> >     descendant processes have explicitly left the process group -- should
> >     hit the entire process tree. 
> >    
> >     Sidenote: there's actually a way to be a bit more aggressive
> >     and
> >     do
> >     a better job of ensuring that we kill *all* processes (via creative
> >     use
> >     of PR_SET_CHILD_SUBREAPER), but that's basically a future
> >     enhancement
> >     /
> >     optimization.
> >    
> >     I think Gilles and Ralph proposed a good point to you: if you
> >     want
> >     to be sure to be able to do cleanup after an MPI process terminates (
> >     normally or abnormally), you should trap signals in your intermediate
> >     processes to catch what Open MPI's runtime throws and therefore know
> >     that it is time to cleanup. 
> >    
> >     Hypothetically, this should work in all versions of Open MPI...?
> >    
> >     I think Ralph made a pull request that adds an MCA param to
> >     change
> >     the default behavior from #4 to #1.
> >    
> >     Note, however, that there's a little time between when Open
> >     MPI
> >     sends the SIGTERM and the SIGKILL, so this solution could be racy.  If
> >     you find that you're running out of time to cleanup, we might be able
> >     to
> >     make the delay between the SIGTERM and SIGKILL be configurable
> >     (e.g.,
> >     via MCA param).
> >    
> >    
> >    
> >
> >     On Jun 16, 2017, at 10:08 AM, Ted Sussman
> >     <***@adina.com
> >    
> >     wrote:
> >    
> >     Hello Gilles and Ralph,
> >    
> >     Thank you for your advice so far.  I appreciate the time
> >     that
> >     you
> >     have spent to educate me about the details of Open MPI.
> >    
> >     But I think that there is something fundamental that I
> >     don't
> >     understand.  Consider Example 2 run with Open MPI 2.1.1.
> >    
> >     mpirun --> shell for process 0 -->  executable for process
> >     0 -->
> >     MPI calls, MPI_Abort
> >             --> shell for process 1 -->  executable for process 1 -->
> >     MPI calls
> >    
> >     After the MPI_Abort is called, ps shows that both shells
> >     are
> >     running, and that the executable for process 1 is running (in this
> >     case,
> >     process 1 is sleeping).  And mpirun does not exit until process 1 is
> >     finished sleeping.
> >    
> >     I cannot reconcile this observed behavior with the
> >     statement
> >
> >           >     2.x: each process is put into its own process group
> >     upon launch. When we issue a
> >          >     "kill", we issue it to the process group. Thus,
> >     every
> >     child proc of that child proc will
> >          >     receive it. IIRC, this was the intended behavior.
> >    
> >     I assume that, for my example, there are two process
> >     groups. 
> >     The
> >     process group for process 0 contains the shell for process 0 and the
> >     executable for process 0; and the process group for process 1 contains
> >     the shell for process 1 and the executable for process 1.  So what
> >     does
> >     MPI_ABORT do?  MPI_ABORT does not kill the process group for process
> >     0,
> >      
> >     since the shell for process 0 continues.  And MPI_ABORT does not kill
> >     the process group for process 1, since both the shell and executable
> >     for
> >     process 1 continue.
> >    
> >     If I hit Ctrl-C after MPI_Abort is called, I get the message
> >    
> >     mpirun: abort is already in progress.. hit ctrl-c again to
> >     forcibly terminate
> >    
> >     but I don't need to hit Ctrl-C again because mpirun
> >     immediately
> >     exits.
> >    
> >     Can you shed some light on all of this?
> >    
> >     Sincerely,
> >    
> >     Ted Sussman
> >    
> >    
> >     On 15 Jun 2017 at 14:44, ***@open-mpi.org wrote:
> >
> >    
> >     You have to understand that we have no way of
> >     knowing who is
> >     making MPI calls - all we see is
> >     the proc that we started, and we know someone of
> >     that rank is
> >     running (but we have no way of
> >     knowing which of the procs you sub-spawned it is).
> >    
> >     So the behavior you are seeking only occurred in
> >     some earlier
> >     release by sheer accident. Nor will
> >     you find it portable as there is no specification
> >     directing
> >     that
> >     behavior.
> >    
> >     The behavior IŽve provided is to either deliver the
> >     signal to
> >     _
> >     all_ child processes (including
> >     grandchildren etc.), or _only_ the immediate child
> >     of the
> >     daemon.
> >       It wonŽt do what you describe -
> >     kill the mPI proc underneath the shell, but not the
> >     shell
> >     itself.
> >    
> >     What you can eventually do is use PMIx to ask the
> >     runtime to
> >     selectively deliver signals to
> >     pid/procs for you. We donŽt have that capability
> >     implemented
> >     just yet, IŽm afraid.
> >    
> >     Meantime, when I get a chance, I can code an
> >     option that will
> >     record the pid of the subproc that
> >     calls MPI_Init, and then letŽs you deliver signals to
> >     just
> >     that
> >     proc. No promises as to when that will
> >     be done.
> >    
> >    
> >           On Jun 15, 2017, at 1:37 PM, Ted Sussman
> >     <ted.sussman@
> >     adina.
> >     com> wrote:
> >    
> >          Hello Ralph,
> >    
> >           I am just an Open MPI end user, so I will need to
> >     wait for
> >     the next official release.
> >    
> >          mpirun --> shell for process 0 -->  executable for
> >     process
> >     0
> >     --> MPI calls
> >                  --> shell for process 1 -->  executable for process
> >     1
> >     --> MPI calls
> >                                           ...
> >    
> >          I guess the question is, should MPI_ABORT kill the
> >     executables or the shells?  I naively
> >          thought, that, since it is the executables that make
> >     the
> >     MPI
> >     calls, it is the executables that
> >          should be aborted by the call to MPI_ABORT.  Since
> >     the
> >     shells don't make MPI calls, the
> >           shells should not be aborted.
> >    
> >          And users might have several layers of shells in
> >     between
> >     mpirun and the executable.
> >    
> >          So now I will look for the latest version of Open MPI
> >     that
> >     has the 1.4.3 behavior.
> >    
> >          Sincerely,
> >    
> >          Ted Sussman
> >    
> >           On 15 Jun 2017 at 12:31, ***@open-mpi.org wrote:
> >    
> >          >
> >           > Yeah, things jittered a little there as we debated
> >     the "
> >     right" behavior. Generally, when we
> >          see that
> >          > happening it means that a param is required, but
> >     somehow
> >     we never reached that point.
> >          >
> >          > See if https://github.com/open-mpi/ompi/pull/3704  
> >     helps
> >     -
> >     if so, I can schedule it for the next
> >          2.x
> >           > release if the RMs agree to take it
> >          >
> >          > Ralph
> >           >
> >          >     On Jun 15, 2017, at 12:20 PM, Ted Sussman <ted.
> >     sussman
> >     @adina.com > wrote:
> >           >
> >          >     Thank you for your comments.
> >           >   
> >          >     Our application relies upon "dum.sh" to clean up
> >     after
> >     the process exits, either if the
> >           process
> >          >     exits normally, or if the process exits abnormally
> >     because of MPI_ABORT.  If the process
> >           >     group is killed by MPI_ABORT, this clean up will not
> >     be performed.  If exec is used to launch
> >          >     the executable from dum.sh, then dum.sh is
> >     terminated
> >     by the exec, so dum.sh cannot
> >          >     perform any clean up.
> >          >   
> >           >     I suppose that other user applications might work
> >     similarly, so it would be good to have an
> >          >     MCA parameter to control the behavior of
> >     MPI_ABORT.
> >          >   
> >          >     We could rewrite our shell script that invokes
> >     mpirun,
> >     so that the cleanup that is now done
> >          >     by
> >           >     dum.sh is done by the invoking shell script after
> >     mpirun exits.  Perhaps this technique is the
> >          >     preferred way to clean up after mpirun is invoked.
> >           >   
> >          >     By the way, I have also tested with Open MPI
> >     1.10.7,
> >     and Open MPI 1.10.7 has different
> >           >     behavior than either Open MPI 1.4.3 or Open MPI
> >     2.1.
> >     1.
> >        In this explanation, it is important to
> >           >     know that the aborttest executable sleeps for 20
> >     sec.
> >          >   
> >           >     When running example 2:
> >          >   
> >          >     1.4.3: process 1 immediately aborts
> >          >     1.10.7: process 1 doesn't abort and never stops.
> >           >     2.1.1 process 1 doesn't abort, but stops after it is
> >     finished sleeping
> >          >   
> >          >     Sincerely,
> >          >   
> >          >     Ted Sussman
> >           >   
> >          >     On 15 Jun 2017 at 9:18, ***@open-mpi.org wrote:
> >          >
> >          >     Here is how the system is working:
> >           >   
> >          >     Master: each process is put into its own process
> >     group
> >     upon launch. When we issue a
> >          >     "kill", however, we only issue it to the individual
> >     process (instead of the process group
> >          >     that is headed by that child process). This is
> >     probably a bug as I donŽt believe that is
> >          >     what we intended, but set that aside for now.
> >           >   
> >          >     2.x: each process is put into its own process group
> >     upon launch. When we issue a
> >          >     "kill", we issue it to the process group. Thus,
> >     every
> >     child proc of that child proc will
> >          >     receive it. IIRC, this was the intended behavior.
> >           >   
> >          >     It is rather trivial to make the change (it only
> >     involves 3 lines of code), but IŽm not sure
> >          >     of what our intended behavior is supposed to be.
> >     Once
> >     we clarify that, it is also trivial
> >          >     to add another MCA param (you can never have too
> >     many!)
> >       to allow you to select the
> >          >     other behavior.
> >          >   
> >          >
> >           >     On Jun 15, 2017, at 5:23 AM, Ted Sussman <ted.
> >     sussman@
> >     adina.com > wrote:
> >          >   
> >          >     Hello Gilles,
> >          >   
> >           >     Thank you for your quick answer.  I confirm that if
> >     exec is used, both processes
> >          >     immediately
> >           >     abort.
> >          >   
> >           >     Now suppose that the line
> >          >   
> >          >     echo "After aborttest:
> >          >    
> >     OMPI_COMM_WORLD_RANK="$OMPI_COMM_
> >     WORLD_RANK
> >           >   
> >          >     is added to the end of dum.sh.
> >          >   
> >          >     If Example 2 is run with Open MPI 1.4.3, the output
> >     is
> >          >   
> >          >     After aborttest: OMPI_COMM_WORLD_RANK=0
> >          >   
> >          >     which shows that the shell script for the process
> >     with
> >     rank 0 continues after the
> >           >     abort,
> >          >     but that the shell script for the process with rank
> >     1
> >     does not continue after the
> >           >     abort.
> >          >   
> >           >     If Example 2 is run with Open MPI 2.1.1, with exec
> >     used to invoke
> >          >     aborttest02.exe, then
> >          >     there is no such output, which shows that both shell
> >     scripts do not continue after
> >          >     the abort.
> >          >   
> >           >     I prefer the Open MPI 1.4.3 behavior because our
> >     original application depends
> >          >     upon the
> >           >     Open MPI 1.4.3 behavior.  (Our original application
> >     will also work if both
> >          >     executables are
> >           >     aborted, and if both shell scripts continue after
> >     the
> >     abort.)
> >          >   
> >           >     It might be too much to expect, but is there a way
> >     to
> >     recover the Open MPI 1.4.3
> >          >     behavior
> >           >     using Open MPI 2.1.1? 
> >          >   
> >           >     Sincerely,
> >          >   
> >          >     Ted Sussman
> >          >   
> >          >   
> >           >     On 15 Jun 2017 at 9:50, Gilles Gouaillardet wrote:
> >          >
> >          >     Ted,
> >          >   
> >           >   
> >          >     fwiw, the 'master' branch has the behavior you
> >     expect.
> >          >   
> >          >   
> >          >     meanwhile, you can simple edit your 'dum.sh' script
> >     and replace
> >           >   
> >          >     /home/buildadina/src/aborttest02/aborttest02.exe
> >           >   
> >          >     with
> >           >   
> >          >     exec /home/buildadina/src/aborttest02/aborttest02.
> >     exe
> >           >   
> >          >   
> >          >     Cheers,
> >          >   
> >          >   
> >          >     Gilles
> >          >   
> >           >   
> >          >     On 6/15/2017 3:01 AM, Ted Sussman wrote:
> >           >     Hello,
> >          >   
> >          >     My question concerns MPI_ABORT, indirect
> >     execution
> >     of
> >          >     executables by mpirun and Open
> >          >     MPI 2.1.1.  When mpirun runs executables directly,
> >     MPI
> >     _ABORT
> >          >     works as expected, but
> >           >     when mpirun runs executables indirectly,
> >     MPI_ABORT
> >     does not
> >          >     work as expected.
> >          >   
> >          >     If Open MPI 1.4.3 is used instead of Open MPI
> >     2.1.1,
> >     MPI_ABORT
> >          >     works as expected in all
> >           >     cases.
> >          >   
> >           >     The examples given below have been simplified as
> >     far
> >     as possible
> >          >     to show the issues.
> >          >   
> >          >     ---
> >          >   
> >           >     Example 1
> >          >   
> >           >     Consider an MPI job run in the following way:
> >          >   
> >           >     mpirun ... -app addmpw1
> >          >   
> >          >     where the appfile addmpw1 lists two executables:
> >          >   
> >          >     -n 1 -host gulftown ... aborttest02.exe
> >          >     -n 1 -host gulftown ... aborttest02.exe
> >           >   
> >          >     The two executables are executed on the local node
> >     gulftown.
> >          >      aborttest02 calls MPI_ABORT
> >          >     for rank 0, then sleeps.
> >          >   
> >          >     The above MPI job runs as expected.  Both
> >     processes
> >     immediately
> >          >     abort when rank 0 calls
> >          >     MPI_ABORT.
> >          >   
> >           >     ---
> >          >   
> >           >     Example 2
> >          >   
> >          >     Now change the above example as follows:
> >          >   
> >          >     mpirun ... -app addmpw2
> >          >   
> >          >     where the appfile addmpw2 lists shell scripts:
> >          >   
> >          >     -n 1 -host gulftown ... dum.sh
> >          >     -n 1 -host gulftown ... dum.sh
> >          >   
> >          >     dum.sh invokes aborttest02.exe.  So aborttest02.exe
> >     is
> >     executed
> >          >     indirectly by mpirun.
> >          >   
> >          >     In this case, the MPI job only aborts process 0 when
> >     rank 0 calls
> >           >     MPI_ABORT.  Process 1
> >          >     continues to run.  This behavior is unexpected.
> >          >   
> >          >     ----
> >           >   
> >          >     I have attached all files to this E-mail.  Since
> >     there
> >     are absolute
> >           >     pathnames in the files, to
> >          >     reproduce my findings, you will need to update the
> >     pathnames in the
> >           >     appfiles and shell
> >          >     scripts.  To run example 1,
> >           >   
> >          >     sh run1.sh
> >           >   
> >          >     and to run example 2,
> >          >   
> >          >     sh run2.sh
> >          >   
> >           >     ---
> >          >   
> >           >     I have tested these examples with Open MPI 1.4.3
> >     and
> >     2.
> >     0.3.  In
> >          >     Open MPI 1.4.3, both
> >           >     examples work as expected.  Open MPI 2.0.3 has
> >     the
> >     same behavior
> >          >     as Open MPI 2.1.1.
> >          >   
> >          >     ---
> >           >   
> >          >     I would prefer that Open MPI 2.1.1 aborts both
> >     processes, even
> >          >     when the executables are
> >          >     invoked indirectly by mpirun.  If there is an MCA
> >     setting that is
> >          >     needed to make Open MPI
> >          >     2.1.1 abort both processes, please let me know.
> >           >   
> >          >   
> >          >     Sincerely,
> >          >   
> >          >     Theodore Sussman
> >           >   
> >          >   
> >           >     The following section of this message contains a
> >     file
> >     attachment
> >          >     prepared for transmission using the Internet MIME
> >     message format.
> >           >     If you are using Pegasus Mail, or any other MIME-
> >     compliant system,
> >          >     you should be able to save it or view it from within
> >     your mailer.
> >          >     If you cannot, please ask your system administrator
> >     for assistance.
> >          >   
> >          >       ---- File information -----------
> >          >         File:  config.log.bz2
> >          >         Date:  14 Jun 2017, 13:35
> >          >         Size:  146548 bytes.
> >           >         Type:  Binary
> >          >   
> >           >   
> >          >     The following section of this message contains a
> >     file
> >     attachment
> >           >     prepared for transmission using the Internet MIME
> >     message format.
> >          >     If you are using Pegasus Mail, or any other MIME-
> >     compliant system,
> >          >     you should be able to save it or view it from within
> >     your mailer.
> >          >     If you cannot, please ask your system administrator
> >     for assistance.
> >          >   
> >          >       ---- File information -----------
> >          >         File:  ompi_info.bz2
> >          >         Date:  14 Jun 2017, 13:35
> >           >         Size:  24088 bytes.
> >          >         Type:  Binary
> >           >   
> >          >   
> >           >     The following section of this message contains a
> >     file
> >     attachment
> >          >     prepared for transmission using the Internet MIME
> >     message format.
> >           >     If you are using Pegasus Mail, or any other MIME-
> >     compliant system,
> >          >     you should be able to save it or view it from within
> >     your mailer.
> >          >     If you cannot, please ask your system administrator
> >     for assistance.
> >          >   
> >          >       ---- File information -----------
> >          >         File:  aborttest02.tgz
> >          >         Date:  14 Jun 2017, 13:52
> >          >         Size:  4285 bytes.
> >           >         Type:  Binary
> >          >   
> >           >   
> >          >    
> >     ________________________________________
> >     _______
> >           >     users mailing list
> >          >     ***@lists.open-mpi.org
> >           >    
> >     https://rfd.newmexicoconsortium.org/mailman/listin
> >     fo/users
> >
> >
> >          >   
> >          >    
> >     ________________________________________
> >     _______
> >           >     users mailing list
> >          >     ***@lists.open-mpi.org
> >          >    
> >     https://rfd.newmexicoconsortium.org/mailman/listin
> >     fo/users
> >
> >
> >          >   
> >          >   
> >           >   
> >          >    
> >     ________________________________________
> >     _______
> >           >     users mailing list
> >          >     ***@lists.open-mpi.org
> >           >    
> >     https://rfd.newmexicoconsortium.org/mailman/listin
> >     fo/users
> >
> >
> >          >   
> >          >    
> >     ________________________________________
> >     _______
> >           >     users mailing list
> >          >     ***@lists.open-mpi.org
> >          >    
> >     https://rfd.newmexicoconsortium.org/mailman/listin
> >     fo/users
> >
> >
> >          >   
> >          >   
> >           >   
> >          >    
> >     ________________________________________
> >     _______
> >           >     users mailing list
> >          >     ***@lists.open-mpi.org
> >           >    
> >     https://rfd.newmexicoconsortium.org/mailman/listin
> >     fo/users
> >
> >
> >          >
> >    
> >           
> >          __________________________________________
> >     _____
> >           users mailing list
> >          ***@lists.open-mpi.org
> >         
> >      https://rfd.newmexicoconsortium.org/mailman/listin
> >     fo/users
> >
> >    
> >       
> >     _____________________________________________
> >     __
> >     users mailing list
> >     ***@lists.open-mpi.org
> >     https://rfd.newmexicoconsortium.org/mailman/listinfo/us
> >     ers
> >    
> >    
> >     --
> >     Jeff Squyres
> >     ***@cisco.com
> >    
> >     _______________________________________________
> >     users mailing list
> >     ***@lists.open-mpi.org
> >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >    
> >    
> >    
> >     _______________________________________________
> >     users mailing list
> >     ***@lists.open-mpi.org
> >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
> >     _______________________________________________
> >     users mailing list
> >     ***@lists.open-mpi.org
> >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
> >     _______________________________________________
> >     users mailing list
> >     ***@lists.open-mpi.org
> >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >    
> >          
> >    
> >    
> >     The following section of this message contains a file attachment
> >     prepared for transmission using the Internet MIME message format.
> >     If you are using Pegasus Mail, or any other MIME-compliant system,
> >     you should be able to save it or view it from within your mailer.
> >     If you cannot, please ask your system administrator for assistance.
> >    
> >       ---- File information -----------
> >         File:  aborttest10.tgz
> >         Date:  19 Jun 2017, 12:42
> >         Size:  4740 bytes.
> >         Type:  Binary
> >     <aborttest10.tgz>_______________________________________________
> >     users mailing list
> >     ***@lists.open-mpi.org
> >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
>
>   
> The following section of this message contains a file attachment
> prepared for transmission using the Internet MIME message format.
> If you are using Pegasus Mail, or any other MIME-compliant system,
> you should be able to save it or view it from within your mailer.
> If you cannot, please ask your system administrator for assistance.
>
>   ---- File information -----------
>     File:  aborttest11.tgz
>     Date:  19 Jun 2017, 13:48
>     Size:  3800 bytes.
>     Type:  Unknown
> <aborttest11.tgz> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
r***@open-mpi.org
2017-06-27 18:17:28 UTC
Permalink
Ideally, we should be delivering the signal to all procs in the process group of each dum.sh. Looking at the code in the head of the 2.x branch, that does indeed appear to be what we are doing, assuming that we found setpgid in your system:

static int odls_default_kill_local(pid_t pid, int signum)
{
pid_t pgrp;

#if HAVE_SETPGID
pgrp = getpgid(pid);
if (-1 != pgrp) {
/* target the lead process of the process
* group so we ensure that the signal is
* seen by all members of that group. This
* ensures that the signal is seen by any
* child processes our child may have
* started
*/
pid = pgrp;
}
#endif
if (0 != kill(pid, signum)) {
if (ESRCH != errno) {
OPAL_OUTPUT_VERBOSE((2, orte_odls_base_framework.framework_output,
"%s odls:default:SENT KILL %d TO PID %d GOT ERRNO %d",
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), signum, (int)pid, errno));
return errno;
}
}
OPAL_OUTPUT_VERBOSE((2, orte_odls_base_framework.framework_output,
"%s odls:default:SENT KILL %d TO PID %d SUCCESS",
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), signum, (int)pid));
return 0;
}

For some strange reason, it appears that you aren’t see this? I’m building the branch now and will see if I can reproduce it.

> On Jun 27, 2017, at 10:58 AM, Ted Sussman <***@adina.com> wrote:
>
> Hello all,
>
> Thank you for your help and advice. It has taken me several days to understand what you were trying to tell me. I have now studied the problem in more detail, using a version of Open MPI 2.1.1 built with --enable-debug.
>
> -----
>
> Consider the following scenario in Open MPI 2.1.1:
>
> mpirun --> dum.sh --> aborttest.exe (rank 0)
> --> dum.sh --> aborttest.exe (rank 1)
>
> aborttest.exe calls MPI_Bcast several times, then aborttest.exe rank 0 calls MPI_Abort.
>
> As far as I can figure out, this is what happens after aborttest.exe rank 0 calls MPI_Abort.
>
> 1) aborttest.exe for rank 0 exits. aborttest.exe for rank 1 is polling (waiting for message from MPI_Bcast).
>
> 2) mpirun (or maybe orted?) sends the signals SIGCONT, SIGTERM, SIGKILL to both dum.sh processes.
>
> 3) Both dum.sh processes are killed.
>
> 4) aborttest.exe for rank 1 continues to poll. mpirun never exits.
>
> ----
>
> Now suppose that dum.sh traps SIGCONT, and that the trap handler in dum.sh sends signal SIGINT to $PPID. This is what seems to happen after aborttest.exe rank 0 calls MPI_Abort:
>
> 1) aborttest.exe for rank 0 exits. aborttest.exe for rank 1 is polling (waiting for message from MPI_Bcast).
>
> 2) mpirun (or maybe orted?) sends the signals SIGCONT, SIGTERM, SIGKILL to both dum.sh processes.
>
> 3) dum.sh for rank 0 catches SIGCONT and sents SIGINT to its parent. dum.sh for rank 1 appears to be killed (I don't understand this, why doesn't dum.sh for rank 1 also catch SIGCONT?)
>
> 4) mpirun catches the SIGINT and kills aborttest.exe for rank 1, then mpirun exits.
>
> So adding the trap handler to dum.sh solves my problem.
>
> Is this the preferred solution to my problem? Or is there a more elegant solution?
>
> Sincerely,
>
> Ted Sussman
>
>
>
>
>
>
>
>
> On 19 Jun 2017 at 11:19, ***@open-mpi.org wrote:
>
> >
> >
> >
> > On Jun 19, 2017, at 10:53 AM, Ted Sussman <***@adina.com> wrote:
> >
> > For what it's worth, the problem might be related to the following:
> >
> > mpirun: -np 2 ... dum.sh
> > dum.sh: Invoke aborttest11.exe
> > aborttest11.exe: Call MPI_Init, go into an infinite loop.
> >
> > Now when mpirun is running, send signals at the processes, as follows:
> >
> > 1) kill -9 (pid for one of the aborttest11.exe processes)
> >
> > The shell for this aborttest11.exe continues. Once this shell exits, then Open MPI sends
> > signals to both shells, killing the other shell, but the remaining aborttest11.exe survives. The
> > PPID for the remaining aborttest11.exe becomes 1.
> >
> > We have no visibility into your aborttest processes since we didn’t launch them. So killing one of
> > them is invisible to us. We can only see the shell scripts.
> >
> >
> > 2) kill -9 (pid for one of the dum.sh processes).
> >
> > Open MPI sends signals to both of the shells. Both shells are killed off, but both
> > aborttest11.exe processes survive, with PPID set to 1.
> >
> > This again is a question of how you handle things in your program. The _only_ process we can
> > see is your script. If you kill a script that started a process, then your process is going to have to
> > know how to detect the script has died and “suicide” - there is nothing we can do to help.
> >
> > Honestly, it sounds to me like the real problem here is that your .exe program isn’t monitoring the
> > shell above it to know when to “suicide”. I don’t see how we can help you there.
> >
> >
> >
> > On 19 Jun 2017 at 10:10, ***@open-mpi.org wrote:
> >
> > >
> > > That is typical behavior when you throw something into “sleep” - not much we can do
> > about it, I
> > > think.
> > >
> > > On Jun 19, 2017, at 9:58 AM, Ted Sussman <***@adina.com > wrote:
> > >
> > > Hello,
> > >
> > > I have rebuilt Open MPI 2.1.1 on the same computer, including --enable-debug.
> > >
> > > I have attached the abort test program aborttest10.tgz. This version sleeps for 5 sec before
> > > calling MPI_ABORT, so that I can check the pids using ps.
> > >
> > > This is what happens (see run2.sh.out).
> > >
> > > Open MPI invokes two instances of dum.sh. Each instance of dum.sh invokes aborttest.exe.
> > >
> > > Pid Process
> > > -------------------
> > > 19565 dum.sh
> > > 19566 dum.sh
> > > 19567 aborttest10.exe
> > > 19568 aborttest10.exe
> > >
> > > When MPI_ABORT is called, Open MPI sends SIGCONT, SIGTERM and SIGKILL to both
> > > instances of dum.sh (pids 19565 and 19566).
> > >
> > > ps shows that both the shell processes vanish, and that one of the aborttest10.exe
> > processes
> > > vanishes. But the other aborttest10.exe remains and continues until it is finished sleeping.
> > >
> > > Hope that this information is useful.
> > >
> > > Sincerely,
> > >
> > > Ted Sussman
> > >
> > >
> > >
> > > On 19 Jun 2017 at 23:06, ***@rist.or.jp wrote:
> > >
> > >
> > > Ted,
> > >
> > > some traces are missing because you did not configure with --enable-debug
> > > i am afraid you have to do it (and you probably want to install that debug version in an
> > > other
> > > location since its performances are not good for production) in order to get all the logs.
> > >
> > > Cheers,
> > >
> > > Gilles
> > >
> > > ----- Original Message -----
> > > Hello Gilles,
> > >
> > > I retried my example, with the same results as I observed before. The process with rank
> > > 1
> > > does not get killed by MPI_ABORT.
> > >
> > > I have attached to this E-mail:
> > >
> > > config.log.bz2
> > > ompi_info.bz2 (uses ompi_info -a)
> > > aborttest09.tgz
> > >
> > > This testing is done on a computer running Linux 3.10.0. This is a different computer
> > > than
> > > the computer that I previously used for testing. You can confirm that I am using Open
> > > MPI
> > > 2.1.1.
> > >
> > > tar xvzf aborttest09.tgz
> > > cd aborttest09
> > > ./sh run2.sh
> > >
> > > run2.sh contains the command
> > >
> > > /opt/openmpi-2.1.1-GNU/bin/mpirun -np 2 -mca btl tcp,self --mca odls_base_verbose
> > > 10
> > > ./dum.sh
> > >
> > > The output from this run is in aborttest09/run2.sh.out.
> > >
> > > The output shows that the the "default" component is selected by odls.
> > >
> > > The only messages from odls are: odls: launch spawning child ... (two messages).
> > > There
> > > are no messages from odls with "kill" and I see no SENDING SIGCONT / SIGKILL
> > > messages.
> > >
> > > I am not running from within any batch manager.
> > >
> > > Sincerely,
> > >
> > > Ted Sussman
> > >
> > > On 17 Jun 2017 at 16:02, ***@rist.or.jp wrote:
> > >
> > > Ted,
> > >
> > > i do not observe the same behavior you describe with Open MPI 2.1.1
> > >
> > > # mpirun -np 2 -mca btl tcp,self --mca odls_base_verbose 5 ./abort.sh
> > >
> > > abort.sh 31361 launching abort
> > > abort.sh 31362 launching abort
> > > I am rank 0 with pid 31363
> > > I am rank 1 with pid 31364
> > > ------------------------------------------------------------------------
> > > --
> > > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> > > with errorcode 1.
> > >
> > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> > > You may or may not see output from other processes, depending on
> > > exactly when Open MPI kills them.
> > > ------------------------------------------------------------------------
> > > --
> > > [linux:31356] [[18199,0],0] odls:kill_local_proc working on WILDCARD
> > > [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> > > [[18199,1],0]
> > > [linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],0]
> > > [linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 31361
> > > SUCCESS
> > > [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> > > [[18199,1],1]
> > > [linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],1]
> > > [linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 31362
> > > SUCCESS
> > > [linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],0]
> > > [linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 31361
> > > SUCCESS
> > > [linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],1]
> > > [linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 31362
> > > SUCCESS
> > > [linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],0]
> > > [linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 31361
> > > SUCCESS
> > > [linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],1]
> > > [linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 31362
> > > SUCCESS
> > > [linux:31356] [[18199,0],0] odls:kill_local_proc working on WILDCARD
> > > [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> > > [[18199,1],0]
> > > [linux:31356] [[18199,0],0] odls:kill_local_proc child [[18199,1],0] is
> > > not alive
> > > [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> > > [[18199,1],1]
> > > [linux:31356] [[18199,0],0] odls:kill_local_proc child [[18199,1],1] is
> > > not alive
> > >
> > >
> > > Open MPI did kill both shells, and they were indeed killed as evidenced
> > > by ps
> > >
> > > #ps -fu gilles --forest
> > > UID PID PPID C STIME TTY TIME CMD
> > > gilles 1564 1561 0 15:39 ? 00:00:01 sshd: ***@pts/1
> > > gilles 1565 1564 0 15:39 pts/1 00:00:00 \_ -bash
> > > gilles 31356 1565 3 15:57 pts/1 00:00:00 \_ /home/gilles/
> > > local/ompi-v2.x/bin/mpirun -np 2 -mca btl tcp,self --mca odls_base
> > > gilles 31364 1 1 15:57 pts/1 00:00:00 ./abort
> > >
> > >
> > > so trapping SIGTERM in your shell and manually killing the MPI task
> > > should work
> > > (as Jeff explained, as long as the shell script is fast enough to do
> > > that between SIGTERM and SIGKILL)
> > >
> > >
> > > if you observe a different behavior, please double check your Open MPI
> > > version and post the outputs of the same commands.
> > >
> > > btw, are you running from a batch manager ? if yes, which one ?
> > >
> > > Cheers,
> > >
> > > Gilles
> > >
> > > ----- Original Message -----
> > > Ted,
> > >
> > > if you
> > >
> > > mpirun --mca odls_base_verbose 10 ...
> > >
> > > you will see which processes get killed and how
> > >
> > > Best regards,
> > >
> > >
> > > Gilles
> > >
> > > ----- Original Message -----
> > > Hello Jeff,
> > >
> > > Thanks for your comments.
> > >
> > > I am not seeing behavior #4, on the two computers that I have
> > > tested
> > > on, using Open MPI
> > > 2.1.1.
> > >
> > > I wonder if you can duplicate my results with the files that I have
> > > uploaded.
> > >
> > > Regarding what is the "correct" behavior, I am willing to modify my
> > > application to correspond
> > > to Open MPI's behavior (whatever behavior the Open MPI
> > > developers
> > > decide is best) --
> > > provided that Open MPI does in fact kill off both shells.
> > >
> > > So my highest priority now is to find out why Open MPI 2.1.1 does
> > > not
> > > kill off both shells on
> > > my computer.
> > >
> > > Sincerely,
> > >
> > > Ted Sussman
> > >
> > > On 16 Jun 2017 at 16:35, Jeff Squyres (jsquyres) wrote:
> > >
> > > Ted --
> > >
> > > Sorry for jumping in late. Here's my $0.02...
> > >
> > > In the runtime, we can do 4 things:
> > >
> > > 1. Kill just the process that we forked.
> > > 2. Kill just the process(es) that call back and identify
> > > themselves
> > > as MPI processes (we don't track this right now, but we could add that
> > > functionality).
> > > 3. Union of #1 and #2.
> > > 4. Kill all processes (to include any intermediate processes
> > > that
> > > are not included in #1 and #2).
> > >
> > > In Open MPI 2.x, #4 is the intended behavior. There may be a
> > > bug
> > > or
> > > two that needs to get fixed (e.g., in your last mail, I don't see
> > > offhand why it waits until the MPI process finishes sleeping), but we
> > > should be killing the process group, which -- unless any of the
> > > descendant processes have explicitly left the process group -- should
> > > hit the entire process tree.
> > >
> > > Sidenote: there's actually a way to be a bit more aggressive
> > > and
> > > do
> > > a better job of ensuring that we kill *all* processes (via creative
> > > use
> > > of PR_SET_CHILD_SUBREAPER), but that's basically a future
> > > enhancement
> > > /
> > > optimization.
> > >
> > > I think Gilles and Ralph proposed a good point to you: if you
> > > want
> > > to be sure to be able to do cleanup after an MPI process terminates (
> > > normally or abnormally), you should trap signals in your intermediate
> > > processes to catch what Open MPI's runtime throws and therefore know
> > > that it is time to cleanup.
> > >
> > > Hypothetically, this should work in all versions of Open MPI...?
> > >
> > > I think Ralph made a pull request that adds an MCA param to
> > > change
> > > the default behavior from #4 to #1.
> > >
> > > Note, however, that there's a little time between when Open
> > > MPI
> > > sends the SIGTERM and the SIGKILL, so this solution could be racy. If
> > > you find that you're running out of time to cleanup, we might be able
> > > to
> > > make the delay between the SIGTERM and SIGKILL be configurable
> > > (e.g.,
> > > via MCA param).
> > >
> > >
> > >
> > >
> > > On Jun 16, 2017, at 10:08 AM, Ted Sussman
> > > <***@adina.com
> > >
> > > wrote:
> > >
> > > Hello Gilles and Ralph,
> > >
> > > Thank you for your advice so far. I appreciate the time
> > > that
> > > you
> > > have spent to educate me about the details of Open MPI.
> > >
> > > But I think that there is something fundamental that I
> > > don't
> > > understand. Consider Example 2 run with Open MPI 2.1.1.
> > >
> > > mpirun --> shell for process 0 --> executable for process
> > > 0 -->
> > > MPI calls, MPI_Abort
> > > --> shell for process 1 --> executable for process 1 -->
> > > MPI calls
> > >
> > > After the MPI_Abort is called, ps shows that both shells
> > > are
> > > running, and that the executable for process 1 is running (in this
> > > case,
> > > process 1 is sleeping). And mpirun does not exit until process 1 is
> > > finished sleeping.
> > >
> > > I cannot reconcile this observed behavior with the
> > > statement
> > >
> > > > 2.x: each process is put into its own process group
> > > upon launch. When we issue a
> > > > "kill", we issue it to the process group. Thus,
> > > every
> > > child proc of that child proc will
> > > > receive it. IIRC, this was the intended behavior.
> > >
> > > I assume that, for my example, there are two process
> > > groups.
> > > The
> > > process group for process 0 contains the shell for process 0 and the
> > > executable for process 0; and the process group for process 1 contains
> > > the shell for process 1 and the executable for process 1. So what
> > > does
> > > MPI_ABORT do? MPI_ABORT does not kill the process group for process
> > > 0,
> > >
> > > since the shell for process 0 continues. And MPI_ABORT does not kill
> > > the process group for process 1, since both the shell and executable
> > > for
> > > process 1 continue.
> > >
> > > If I hit Ctrl-C after MPI_Abort is called, I get the message
> > >
> > > mpirun: abort is already in progress.. hit ctrl-c again to
> > > forcibly terminate
> > >
> > > but I don't need to hit Ctrl-C again because mpirun
> > > immediately
> > > exits.
> > >
> > > Can you shed some light on all of this?
> > >
> > > Sincerely,
> > >
> > > Ted Sussman
> > >
> > >
> > > On 15 Jun 2017 at 14:44, ***@open-mpi.org wrote:
> > >
> > >
> > > You have to understand that we have no way of
> > > knowing who is
> > > making MPI calls - all we see is
> > > the proc that we started, and we know someone of
> > > that rank is
> > > running (but we have no way of
> > > knowing which of the procs you sub-spawned it is).
> > >
> > > So the behavior you are seeking only occurred in
> > > some earlier
> > > release by sheer accident. Nor will
> > > you find it portable as there is no specification
> > > directing
> > > that
> > > behavior.
> > >
> > > The behavior IÂŽve provided is to either deliver the
> > > signal to
> > > _
> > > all_ child processes (including
> > > grandchildren etc.), or _only_ the immediate child
> > > of the
> > > daemon.
> > > It wonÂŽt do what you describe -
> > > kill the mPI proc underneath the shell, but not the
> > > shell
> > > itself.
> > >
> > > What you can eventually do is use PMIx to ask the
> > > runtime to
> > > selectively deliver signals to
> > > pid/procs for you. We donÂŽt have that capability
> > > implemented
> > > just yet, IÂŽm afraid.
> > >
> > > Meantime, when I get a chance, I can code an
> > > option that will
> > > record the pid of the subproc that
> > > calls MPI_Init, and then letÂŽs you deliver signals to
> > > just
> > > that
> > > proc. No promises as to when that will
> > > be done.
> > >
> > >
> > > On Jun 15, 2017, at 1:37 PM, Ted Sussman
> > > <ted.sussman@
> > > adina.
> > > com> wrote:
> > >
> > > Hello Ralph,
> > >
> > > I am just an Open MPI end user, so I will need to
> > > wait for
> > > the next official release.
> > >
> > > mpirun --> shell for process 0 --> executable for
> > > process
> > > 0
> > > --> MPI calls
> > > --> shell for process 1 --> executable for process
> > > 1
> > > --> MPI calls
> > > ...
> > >
> > > I guess the question is, should MPI_ABORT kill the
> > > executables or the shells? I naively
> > > thought, that, since it is the executables that make
> > > the
> > > MPI
> > > calls, it is the executables that
> > > should be aborted by the call to MPI_ABORT. Since
> > > the
> > > shells don't make MPI calls, the
> > > shells should not be aborted.
> > >
> > > And users might have several layers of shells in
> > > between
> > > mpirun and the executable.
> > >
> > > So now I will look for the latest version of Open MPI
> > > that
> > > has the 1.4.3 behavior.
> > >
> > > Sincerely,
> > >
> > > Ted Sussman
> > >
> > > On 15 Jun 2017 at 12:31, ***@open-mpi.org wrote:
> > >
> > > >
> > > > Yeah, things jittered a little there as we debated
> > > the "
> > > right" behavior. Generally, when we
> > > see that
> > > > happening it means that a param is required, but
> > > somehow
> > > we never reached that point.
> > > >
> > > > See if https://github.com/open-mpi/ompi/pull/3704
> > > helps
> > > -
> > > if so, I can schedule it for the next
> > > 2.x
> > > > release if the RMs agree to take it
> > > >
> > > > Ralph
> > > >
> > > > On Jun 15, 2017, at 12:20 PM, Ted Sussman <ted.
> > > sussman
> > > @adina.com > wrote:
> > > >
> > > > Thank you for your comments.
> > > >
> > > > Our application relies upon "dum.sh" to clean up
> > > after
> > > the process exits, either if the
> > > process
> > > > exits normally, or if the process exits abnormally
> > > because of MPI_ABORT. If the process
> > > > group is killed by MPI_ABORT, this clean up will not
> > > be performed. If exec is used to launch
> > > > the executable from dum.sh, then dum.sh is
> > > terminated
> > > by the exec, so dum.sh cannot
> > > > perform any clean up.
> > > >
> > > > I suppose that other user applications might work
> > > similarly, so it would be good to have an
> > > > MCA parameter to control the behavior of
> > > MPI_ABORT.
> > > >
> > > > We could rewrite our shell script that invokes
> > > mpirun,
> > > so that the cleanup that is now done
> > > > by
> > > > dum.sh is done by the invoking shell script after
> > > mpirun exits. Perhaps this technique is the
> > > > preferred way to clean up after mpirun is invoked.
> > > >
> > > > By the way, I have also tested with Open MPI
> > > 1.10.7,
> > > and Open MPI 1.10.7 has different
> > > > behavior than either Open MPI 1.4.3 or Open MPI
> > > 2.1.
> > > 1.
> > > In this explanation, it is important to
> > > > know that the aborttest executable sleeps for 20
> > > sec.
> > > >
> > > > When running example 2:
> > > >
> > > > 1.4.3: process 1 immediately aborts
> > > > 1.10.7: process 1 doesn't abort and never stops.
> > > > 2.1.1 process 1 doesn't abort, but stops after it is
> > > finished sleeping
> > > >
> > > > Sincerely,
> > > >
> > > > Ted Sussman
> > > >
> > > > On 15 Jun 2017 at 9:18, ***@open-mpi.org wrote:
> > > >
> > > > Here is how the system is working:
> > > >
> > > > Master: each process is put into its own process
> > > group
> > > upon launch. When we issue a
> > > > "kill", however, we only issue it to the individual
> > > process (instead of the process group
> > > > that is headed by that child process). This is
> > > probably a bug as I donÂŽt believe that is
> > > > what we intended, but set that aside for now.
> > > >
> > > > 2.x: each process is put into its own process group
> > > upon launch. When we issue a
> > > > "kill", we issue it to the process group. Thus,
> > > every
> > > child proc of that child proc will
> > > > receive it. IIRC, this was the intended behavior.
> > > >
> > > > It is rather trivial to make the change (it only
> > > involves 3 lines of code), but IÂŽm not sure
> > > > of what our intended behavior is supposed to be.
> > > Once
> > > we clarify that, it is also trivial
> > > > to add another MCA param (you can never have too
> > > many!)
> > > to allow you to select the
> > > > other behavior.
> > > >
> > > >
> > > > On Jun 15, 2017, at 5:23 AM, Ted Sussman <ted.
> > > sussman@
> > > adina.com > wrote:
> > > >
> > > > Hello Gilles,
> > > >
> > > > Thank you for your quick answer. I confirm that if
> > > exec is used, both processes
> > > > immediately
> > > > abort.
> > > >
> > > > Now suppose that the line
> > > >
> > > > echo "After aborttest:
> > > >
> > > OMPI_COMM_WORLD_RANK="$OMPI_COMM_
> > > WORLD_RANK
> > > >
> > > > is added to the end of dum.sh.
> > > >
> > > > If Example 2 is run with Open MPI 1.4.3, the output
> > > is
> > > >
> > > > After aborttest: OMPI_COMM_WORLD_RANK=0
> > > >
> > > > which shows that the shell script for the process
> > > with
> > > rank 0 continues after the
> > > > abort,
> > > > but that the shell script for the process with rank
> > > 1
> > > does not continue after the
> > > > abort.
> > > >
> > > > If Example 2 is run with Open MPI 2.1.1, with exec
> > > used to invoke
> > > > aborttest02.exe, then
> > > > there is no such output, which shows that both shell
> > > scripts do not continue after
> > > > the abort.
> > > >
> > > > I prefer the Open MPI 1.4.3 behavior because our
> > > original application depends
> > > > upon the
> > > > Open MPI 1.4.3 behavior. (Our original application
> > > will also work if both
> > > > executables are
> > > > aborted, and if both shell scripts continue after
> > > the
> > > abort.)
> > > >
> > > > It might be too much to expect, but is there a way
> > > to
> > > recover the Open MPI 1.4.3
> > > > behavior
> > > > using Open MPI 2.1.1?
> > > >
> > > > Sincerely,
> > > >
> > > > Ted Sussman
> > > >
> > > >
> > > > On 15 Jun 2017 at 9:50, Gilles Gouaillardet wrote:
> > > >
> > > > Ted,
> > > >
> > > >
> > > > fwiw, the 'master' branch has the behavior you
> > > expect.
> > > >
> > > >
> > > > meanwhile, you can simple edit your 'dum.sh' script
> > > and replace
> > > >
> > > > /home/buildadina/src/aborttest02/aborttest02.exe
> > > >
> > > > with
> > > >
> > > > exec /home/buildadina/src/aborttest02/aborttest02.
> > > exe
> > > >
> > > >
> > > > Cheers,
> > > >
> > > >
> > > > Gilles
> > > >
> > > >
> > > > On 6/15/2017 3:01 AM, Ted Sussman wrote:
> > > > Hello,
> > > >
> > > > My question concerns MPI_ABORT, indirect
> > > execution
> > > of
> > > > executables by mpirun and Open
> > > > MPI 2.1.1. When mpirun runs executables directly,
> > > MPI
> > > _ABORT
> > > > works as expected, but
> > > > when mpirun runs executables indirectly,
> > > MPI_ABORT
> > > does not
> > > > work as expected.
> > > >
> > > > If Open MPI 1.4.3 is used instead of Open MPI
> > > 2.1.1,
> > > MPI_ABORT
> > > > works as expected in all
> > > > cases.
> > > >
> > > > The examples given below have been simplified as
> > > far
> > > as possible
> > > > to show the issues.
> > > >
> > > > ---
> > > >
> > > > Example 1
> > > >
> > > > Consider an MPI job run in the following way:
> > > >
> > > > mpirun ... -app addmpw1
> > > >
> > > > where the appfile addmpw1 lists two executables:
> > > >
> > > > -n 1 -host gulftown ... aborttest02.exe
> > > > -n 1 -host gulftown ... aborttest02.exe
> > > >
> > > > The two executables are executed on the local node
> > > gulftown.
> > > > aborttest02 calls MPI_ABORT
> > > > for rank 0, then sleeps.
> > > >
> > > > The above MPI job runs as expected. Both
> > > processes
> > > immediately
> > > > abort when rank 0 calls
> > > > MPI_ABORT.
> > > >
> > > > ---
> > > >
> > > > Example 2
> > > >
> > > > Now change the above example as follows:
> > > >
> > > > mpirun ... -app addmpw2
> > > >
> > > > where the appfile addmpw2 lists shell scripts:
> > > >
> > > > -n 1 -host gulftown ... dum.sh
> > > > -n 1 -host gulftown ... dum.sh
> > > >
> > > > dum.sh invokes aborttest02.exe. So aborttest02.exe
> > > is
> > > executed
> > > > indirectly by mpirun.
> > > >
> > > > In this case, the MPI job only aborts process 0 when
> > > rank 0 calls
> > > > MPI_ABORT. Process 1
> > > > continues to run. This behavior is unexpected.
> > > >
> > > > ----
> > > >
> > > > I have attached all files to this E-mail. Since
> > > there
> > > are absolute
> > > > pathnames in the files, to
> > > > reproduce my findings, you will need to update the
> > > pathnames in the
> > > > appfiles and shell
> > > > scripts. To run example 1,
> > > >
> > > > sh run1.sh
> > > >
> > > > and to run example 2,
> > > >
> > > > sh run2.sh
> > > >
> > > > ---
> > > >
> > > > I have tested these examples with Open MPI 1.4.3
> > > and
> > > 2.
> > > 0.3. In
> > > > Open MPI 1.4.3, both
> > > > examples work as expected. Open MPI 2.0.3 has
> > > the
> > > same behavior
> > > > as Open MPI 2.1.1.
> > > >
> > > > ---
> > > >
> > > > I would prefer that Open MPI 2.1.1 aborts both
> > > processes, even
> > > > when the executables are
> > > > invoked indirectly by mpirun. If there is an MCA
> > > setting that is
> > > > needed to make Open MPI
> > > > 2.1.1 abort both processes, please let me know.
> > > >
> > > >
> > > > Sincerely,
> > > >
> > > > Theodore Sussman
> > > >
> > > >
> > > > The following section of this message contains a
> > > file
> > > attachment
> > > > prepared for transmission using the Internet MIME
> > > message format.
> > > > If you are using Pegasus Mail, or any other MIME-
> > > compliant system,
> > > > you should be able to save it or view it from within
> > > your mailer.
> > > > If you cannot, please ask your system administrator
> > > for assistance.
> > > >
> > > > ---- File information -----------
> > > > File: config.log.bz2
> > > > Date: 14 Jun 2017, 13:35
> > > > Size: 146548 bytes.
> > > > Type: Binary
> > > >
> > > >
> > > > The following section of this message contains a
> > > file
> > > attachment
> > > > prepared for transmission using the Internet MIME
> > > message format.
> > > > If you are using Pegasus Mail, or any other MIME-
> > > compliant system,
> > > > you should be able to save it or view it from within
> > > your mailer.
> > > > If you cannot, please ask your system administrator
> > > for assistance.
> > > >
> > > > ---- File information -----------
> > > > File: ompi_info.bz2
> > > > Date: 14 Jun 2017, 13:35
> > > > Size: 24088 bytes.
> > > > Type: Binary
> > > >
> > > >
> > > > The following section of this message contains a
> > > file
> > > attachment
> > > > prepared for transmission using the Internet MIME
> > > message format.
> > > > If you are using Pegasus Mail, or any other MIME-
> > > compliant system,
> > > > you should be able to save it or view it from within
> > > your mailer.
> > > > If you cannot, please ask your system administrator
> > > for assistance.
> > > >
> > > > ---- File information -----------
> > > > File: aborttest02.tgz
> > > > Date: 14 Jun 2017, 13:52
> > > > Size: 4285 bytes.
> > > > Type: Binary
> > > >
> > > >
> > > >
> > > ________________________________________
> > > _______
> > > > users mailing list
> > > > ***@lists.open-mpi.org
> > > >
> > > https://rfd.newmexicoconsortium.org/mailman/listin
> > > fo/users
> > >
> > >
> > > >
> > > >
> > > ________________________________________
> > > _______
> > > > users mailing list
> > > > ***@lists.open-mpi.org
> > > >
> > > https://rfd.newmexicoconsortium.org/mailman/listin
> > > fo/users
> > >
> > >
> > > >
> > > >
> > > >
> > > >
> > > ________________________________________
> > > _______
> > > > users mailing list
> > > > ***@lists.open-mpi.org
> > > >
> > > https://rfd.newmexicoconsortium.org/mailman/listin
> > > fo/users
> > >
> > >
> > > >
> > > >
> > > ________________________________________
> > > _______
> > > > users mailing list
> > > > ***@lists.open-mpi.org
> > > >
> > > https://rfd.newmexicoconsortium.org/mailman/listin
> > > fo/users
> > >
> > >
> > > >
> > > >
> > > >
> > > >
> > > ________________________________________
> > > _______
> > > > users mailing list
> > > > ***@lists.open-mpi.org
> > > >
> > > https://rfd.newmexicoconsortium.org/mailman/listin
> > > fo/users
> > >
> > >
> > > >
> > >
> > >
> > > __________________________________________
> > > _____
> > > users mailing list
> > > ***@lists.open-mpi.org
> > >
> > > https://rfd.newmexicoconsortium.org/mailman/listin
> > > fo/users
> > >
> > >
> > >
> > > _____________________________________________
> > > __
> > > users mailing list
> > > ***@lists.open-mpi.org
> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/us
> > > ers
> > >
> > >
> > > --
> > > Jeff Squyres
> > > ***@cisco.com
> > >
> > > _______________________________________________
> > > users mailing list
> > > ***@lists.open-mpi.org
> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > >
> > >
> > >
> > > _______________________________________________
> > > users mailing list
> > > ***@lists.open-mpi.org
> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > >
> > > _______________________________________________
> > > users mailing list
> > > ***@lists.open-mpi.org
> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > >
> > > _______________________________________________
> > > users mailing list
> > > ***@lists.open-mpi.org
> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > >
> > >
> > >
> > >
> > > The following section of this message contains a file attachment
> > > prepared for transmission using the Internet MIME message format.
> > > If you are using Pegasus Mail, or any other MIME-compliant system,
> > > you should be able to save it or view it from within your mailer.
> > > If you cannot, please ask your system administrator for assistance.
> > >
> > > ---- File information -----------
> > > File: aborttest10.tgz
> > > Date: 19 Jun 2017, 12:42
> > > Size: 4740 bytes.
> > > Type: Binary
> > > <aborttest10.tgz>_______________________________________________
> > > users mailing list
> > > ***@lists.open-mpi.org
> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > >
> >
> >
> > The following section of this message contains a file attachment
> > prepared for transmission using the Internet MIME message format.
> > If you are using Pegasus Mail, or any other MIME-compliant system,
> > you should be able to save it or view it from within your mailer.
> > If you cannot, please ask your system administrator for assistance.
> >
> > ---- File information -----------
> > File: aborttest11.tgz
> > Date: 19 Jun 2017, 13:48
> > Size: 3800 bytes.
> > Type: Unknown
> > <aborttest11.tgz> _______________________________________________
> > users mailing list
> > ***@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
>
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
r***@open-mpi.org
2017-06-27 19:13:02 UTC
Permalink
Oh my - I finally tracked it down. A simple one character error.

Thanks for your patience. Fix is https://github.com/open-mpi/ompi/pull/3773 <https://github.com/open-mpi/ompi/pull/3773> and will be ported to 2.x and 3.0
Ralph

> On Jun 27, 2017, at 11:17 AM, ***@open-mpi.org wrote:
>
> Ideally, we should be delivering the signal to all procs in the process group of each dum.sh. Looking at the code in the head of the 2.x branch, that does indeed appear to be what we are doing, assuming that we found setpgid in your system:
>
> static int odls_default_kill_local(pid_t pid, int signum)
> {
> pid_t pgrp;
>
> #if HAVE_SETPGID
> pgrp = getpgid(pid);
> if (-1 != pgrp) {
> /* target the lead process of the process
> * group so we ensure that the signal is
> * seen by all members of that group. This
> * ensures that the signal is seen by any
> * child processes our child may have
> * started
> */
> pid = pgrp;
> }
> #endif
> if (0 != kill(pid, signum)) {
> if (ESRCH != errno) {
> OPAL_OUTPUT_VERBOSE((2, orte_odls_base_framework.framework_output,
> "%s odls:default:SENT KILL %d TO PID %d GOT ERRNO %d",
> ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), signum, (int)pid, errno));
> return errno;
> }
> }
> OPAL_OUTPUT_VERBOSE((2, orte_odls_base_framework.framework_output,
> "%s odls:default:SENT KILL %d TO PID %d SUCCESS",
> ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), signum, (int)pid));
> return 0;
> }
>
> For some strange reason, it appears that you aren’t see this? I’m building the branch now and will see if I can reproduce it.
>
>> On Jun 27, 2017, at 10:58 AM, Ted Sussman <***@adina.com <mailto:***@adina.com>> wrote:
>>
>> Hello all,
>>
>> Thank you for your help and advice. It has taken me several days to understand what you were trying to tell me. I have now studied the problem in more detail, using a version of Open MPI 2.1.1 built with --enable-debug.
>>
>> -----
>>
>> Consider the following scenario in Open MPI 2.1.1:
>>
>> mpirun --> dum.sh --> aborttest.exe (rank 0)
>> --> dum.sh --> aborttest.exe (rank 1)
>>
>> aborttest.exe calls MPI_Bcast several times, then aborttest.exe rank 0 calls MPI_Abort.
>>
>> As far as I can figure out, this is what happens after aborttest.exe rank 0 calls MPI_Abort.
>>
>> 1) aborttest.exe for rank 0 exits. aborttest.exe for rank 1 is polling (waiting for message from MPI_Bcast).
>>
>> 2) mpirun (or maybe orted?) sends the signals SIGCONT, SIGTERM, SIGKILL to both dum.sh processes.
>>
>> 3) Both dum.sh processes are killed.
>>
>> 4) aborttest.exe for rank 1 continues to poll. mpirun never exits.
>>
>> ----
>>
>> Now suppose that dum.sh traps SIGCONT, and that the trap handler in dum.sh sends signal SIGINT to $PPID. This is what seems to happen after aborttest.exe rank 0 calls MPI_Abort:
>>
>> 1) aborttest.exe for rank 0 exits. aborttest.exe for rank 1 is polling (waiting for message from MPI_Bcast).
>>
>> 2) mpirun (or maybe orted?) sends the signals SIGCONT, SIGTERM, SIGKILL to both dum.sh processes.
>>
>> 3) dum.sh for rank 0 catches SIGCONT and sents SIGINT to its parent. dum.sh for rank 1 appears to be killed (I don't understand this, why doesn't dum.sh for rank 1 also catch SIGCONT?)
>>
>> 4) mpirun catches the SIGINT and kills aborttest.exe for rank 1, then mpirun exits.
>>
>> So adding the trap handler to dum.sh solves my problem.
>>
>> Is this the preferred solution to my problem? Or is there a more elegant solution?
>>
>> Sincerely,
>>
>> Ted Sussman
>>
>>
>>
>>
>>
>>
>>
>>
>> On 19 Jun 2017 at 11:19, ***@open-mpi.org <mailto:***@open-mpi.org> wrote:
>>
>> >
>> >
>> >
>> > On Jun 19, 2017, at 10:53 AM, Ted Sussman <***@adina.com <mailto:***@adina.com>> wrote:
>> >
>> > For what it's worth, the problem might be related to the following:
>> >
>> > mpirun: -np 2 ... dum.sh
>> > dum.sh: Invoke aborttest11.exe
>> > aborttest11.exe: Call MPI_Init, go into an infinite loop.
>> >
>> > Now when mpirun is running, send signals at the processes, as follows:
>> >
>> > 1) kill -9 (pid for one of the aborttest11.exe processes)
>> >
>> > The shell for this aborttest11.exe continues. Once this shell exits, then Open MPI sends
>> > signals to both shells, killing the other shell, but the remaining aborttest11.exe survives. The
>> > PPID for the remaining aborttest11.exe becomes 1.
>> >
>> > We have no visibility into your aborttest processes since we didn’t launch them. So killing one of
>> > them is invisible to us. We can only see the shell scripts.
>> >
>> >
>> > 2) kill -9 (pid for one of the dum.sh processes).
>> >
>> > Open MPI sends signals to both of the shells. Both shells are killed off, but both
>> > aborttest11.exe processes survive, with PPID set to 1.
>> >
>> > This again is a question of how you handle things in your program. The _only_ process we can
>> > see is your script. If you kill a script that started a process, then your process is going to have to
>> > know how to detect the script has died and “suicide” - there is nothing we can do to help.
>> >
>> > Honestly, it sounds to me like the real problem here is that your .exe program isn’t monitoring the
>> > shell above it to know when to “suicide”. I don’t see how we can help you there.
>> >
>> >
>> >
>> > On 19 Jun 2017 at 10:10, ***@open-mpi.org <mailto:***@open-mpi.org> wrote:
>> >
>> > >
>> > > That is typical behavior when you throw something into “sleep” - not much we can do
>> > about it, I
>> > > think.
>> > >
>> > > On Jun 19, 2017, at 9:58 AM, Ted Sussman <***@adina.com <mailto:***@adina.com> > wrote:
>> > >
>> > > Hello,
>> > >
>> > > I have rebuilt Open MPI 2.1.1 on the same computer, including --enable-debug.
>> > >
>> > > I have attached the abort test program aborttest10.tgz. This version sleeps for 5 sec before
>> > > calling MPI_ABORT, so that I can check the pids using ps.
>> > >
>> > > This is what happens (see run2.sh.out).
>> > >
>> > > Open MPI invokes two instances of dum.sh. Each instance of dum.sh invokes aborttest.exe.
>> > >
>> > > Pid Process
>> > > -------------------
>> > > 19565 dum.sh
>> > > 19566 dum.sh
>> > > 19567 aborttest10.exe
>> > > 19568 aborttest10.exe
>> > >
>> > > When MPI_ABORT is called, Open MPI sends SIGCONT, SIGTERM and SIGKILL to both
>> > > instances of dum.sh (pids 19565 and 19566).
>> > >
>> > > ps shows that both the shell processes vanish, and that one of the aborttest10.exe
>> > processes
>> > > vanishes. But the other aborttest10.exe remains and continues until it is finished sleeping.
>> > >
>> > > Hope that this information is useful.
>> > >
>> > > Sincerely,
>> > >
>> > > Ted Sussman
>> > >
>> > >
>> > >
>> > > On 19 Jun 2017 at 23:06, ***@rist.or.jp <mailto:***@rist.or.jp> wrote:
>> > >
>> > >
>> > > Ted,
>> > >
>> > > some traces are missing because you did not configure with --enable-debug
>> > > i am afraid you have to do it (and you probably want to install that debug version in an
>> > > other
>> > > location since its performances are not good for production) in order to get all the logs.
>> > >
>> > > Cheers,
>> > >
>> > > Gilles
>> > >
>> > > ----- Original Message -----
>> > > Hello Gilles,
>> > >
>> > > I retried my example, with the same results as I observed before. The process with rank
>> > > 1
>> > > does not get killed by MPI_ABORT.
>> > >
>> > > I have attached to this E-mail:
>> > >
>> > > config.log.bz2
>> > > ompi_info.bz2 (uses ompi_info -a)
>> > > aborttest09.tgz
>> > >
>> > > This testing is done on a computer running Linux 3.10.0. This is a different computer
>> > > than
>> > > the computer that I previously used for testing. You can confirm that I am using Open
>> > > MPI
>> > > 2.1.1.
>> > >
>> > > tar xvzf aborttest09.tgz
>> > > cd aborttest09
>> > > ./sh run2.sh
>> > >
>> > > run2.sh contains the command
>> > >
>> > > /opt/openmpi-2.1.1-GNU/bin/mpirun -np 2 -mca btl tcp,self --mca odls_base_verbose
>> > > 10
>> > > ./dum.sh
>> > >
>> > > The output from this run is in aborttest09/run2.sh.out.
>> > >
>> > > The output shows that the the "default" component is selected by odls.
>> > >
>> > > The only messages from odls are: odls: launch spawning child ... (two messages).
>> > > There
>> > > are no messages from odls with "kill" and I see no SENDING SIGCONT / SIGKILL
>> > > messages.
>> > >
>> > > I am not running from within any batch manager.
>> > >
>> > > Sincerely,
>> > >
>> > > Ted Sussman
>> > >
>> > > On 17 Jun 2017 at 16:02, ***@rist.or.jp <mailto:***@rist.or.jp> wrote:
>> > >
>> > > Ted,
>> > >
>> > > i do not observe the same behavior you describe with Open MPI 2.1.1
>> > >
>> > > # mpirun -np 2 -mca btl tcp,self --mca odls_base_verbose 5 ./abort.sh
>> > >
>> > > abort.sh 31361 launching abort
>> > > abort.sh 31362 launching abort
>> > > I am rank 0 with pid 31363
>> > > I am rank 1 with pid 31364
>> > > ------------------------------------------------------------------------
>> > > --
>> > > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
>> > > with errorcode 1.
>> > >
>> > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>> > > You may or may not see output from other processes, depending on
>> > > exactly when Open MPI kills them.
>> > > ------------------------------------------------------------------------
>> > > --
>> > > [linux:31356] [[18199,0],0] odls:kill_local_proc working on WILDCARD
>> > > [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
>> > > [[18199,1],0]
>> > > [linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],0]
>> > > [linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 31361
>> > > SUCCESS
>> > > [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
>> > > [[18199,1],1]
>> > > [linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],1]
>> > > [linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 31362
>> > > SUCCESS
>> > > [linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],0]
>> > > [linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 31361
>> > > SUCCESS
>> > > [linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],1]
>> > > [linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 31362
>> > > SUCCESS
>> > > [linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],0]
>> > > [linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 31361
>> > > SUCCESS
>> > > [linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],1]
>> > > [linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 31362
>> > > SUCCESS
>> > > [linux:31356] [[18199,0],0] odls:kill_local_proc working on WILDCARD
>> > > [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
>> > > [[18199,1],0]
>> > > [linux:31356] [[18199,0],0] odls:kill_local_proc child [[18199,1],0] is
>> > > not alive
>> > > [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
>> > > [[18199,1],1]
>> > > [linux:31356] [[18199,0],0] odls:kill_local_proc child [[18199,1],1] is
>> > > not alive
>> > >
>> > >
>> > > Open MPI did kill both shells, and they were indeed killed as evidenced
>> > > by ps
>> > >
>> > > #ps -fu gilles --forest
>> > > UID PID PPID C STIME TTY TIME CMD
>> > > gilles 1564 1561 0 15:39 ? 00:00:01 sshd: ***@pts/1
>> > > gilles 1565 1564 0 15:39 pts/1 00:00:00 \_ -bash
>> > > gilles 31356 1565 3 15:57 pts/1 00:00:00 \_ /home/gilles/
>> > > local/ompi-v2.x/bin/mpirun -np 2 -mca btl tcp,self --mca odls_base
>> > > gilles 31364 1 1 15:57 pts/1 00:00:00 ./abort
>> > >
>> > >
>> > > so trapping SIGTERM in your shell and manually killing the MPI task
>> > > should work
>> > > (as Jeff explained, as long as the shell script is fast enough to do
>> > > that between SIGTERM and SIGKILL)
>> > >
>> > >
>> > > if you observe a different behavior, please double check your Open MPI
>> > > version and post the outputs of the same commands.
>> > >
>> > > btw, are you running from a batch manager ? if yes, which one ?
>> > >
>> > > Cheers,
>> > >
>> > > Gilles
>> > >
>> > > ----- Original Message -----
>> > > Ted,
>> > >
>> > > if you
>> > >
>> > > mpirun --mca odls_base_verbose 10 ...
>> > >
>> > > you will see which processes get killed and how
>> > >
>> > > Best regards,
>> > >
>> > >
>> > > Gilles
>> > >
>> > > ----- Original Message -----
>> > > Hello Jeff,
>> > >
>> > > Thanks for your comments.
>> > >
>> > > I am not seeing behavior #4, on the two computers that I have
>> > > tested
>> > > on, using Open MPI
>> > > 2.1.1.
>> > >
>> > > I wonder if you can duplicate my results with the files that I have
>> > > uploaded.
>> > >
>> > > Regarding what is the "correct" behavior, I am willing to modify my
>> > > application to correspond
>> > > to Open MPI's behavior (whatever behavior the Open MPI
>> > > developers
>> > > decide is best) --
>> > > provided that Open MPI does in fact kill off both shells.
>> > >
>> > > So my highest priority now is to find out why Open MPI 2.1.1 does
>> > > not
>> > > kill off both shells on
>> > > my computer.
>> > >
>> > > Sincerely,
>> > >
>> > > Ted Sussman
>> > >
>> > > On 16 Jun 2017 at 16:35, Jeff Squyres (jsquyres) wrote:
>> > >
>> > > Ted --
>> > >
>> > > Sorry for jumping in late. Here's my $0.02...
>> > >
>> > > In the runtime, we can do 4 things:
>> > >
>> > > 1. Kill just the process that we forked.
>> > > 2. Kill just the process(es) that call back and identify
>> > > themselves
>> > > as MPI processes (we don't track this right now, but we could add that
>> > > functionality).
>> > > 3. Union of #1 and #2.
>> > > 4. Kill all processes (to include any intermediate processes
>> > > that
>> > > are not included in #1 and #2).
>> > >
>> > > In Open MPI 2.x, #4 is the intended behavior. There may be a
>> > > bug
>> > > or
>> > > two that needs to get fixed (e.g., in your last mail, I don't see
>> > > offhand why it waits until the MPI process finishes sleeping), but we
>> > > should be killing the process group, which -- unless any of the
>> > > descendant processes have explicitly left the process group -- should
>> > > hit the entire process tree.
>> > >
>> > > Sidenote: there's actually a way to be a bit more aggressive
>> > > and
>> > > do
>> > > a better job of ensuring that we kill *all* processes (via creative
>> > > use
>> > > of PR_SET_CHILD_SUBREAPER), but that's basically a future
>> > > enhancement
>> > > /
>> > > optimization.
>> > >
>> > > I think Gilles and Ralph proposed a good point to you: if you
>> > > want
>> > > to be sure to be able to do cleanup after an MPI process terminates (
>> > > normally or abnormally), you should trap signals in your intermediate
>> > > processes to catch what Open MPI's runtime throws and therefore know
>> > > that it is time to cleanup.
>> > >
>> > > Hypothetically, this should work in all versions of Open MPI...?
>> > >
>> > > I think Ralph made a pull request that adds an MCA param to
>> > > change
>> > > the default behavior from #4 to #1.
>> > >
>> > > Note, however, that there's a little time between when Open
>> > > MPI
>> > > sends the SIGTERM and the SIGKILL, so this solution could be racy. If
>> > > you find that you're running out of time to cleanup, we might be able
>> > > to
>> > > make the delay between the SIGTERM and SIGKILL be configurable
>> > > (e.g.,
>> > > via MCA param).
>> > >
>> > >
>> > >
>> > >
>> > > On Jun 16, 2017, at 10:08 AM, Ted Sussman
>> > > <***@adina.com <mailto:***@adina.com>
>> > >
>> > > wrote:
>> > >
>> > > Hello Gilles and Ralph,
>> > >
>> > > Thank you for your advice so far. I appreciate the time
>> > > that
>> > > you
>> > > have spent to educate me about the details of Open MPI.
>> > >
>> > > But I think that there is something fundamental that I
>> > > don't
>> > > understand. Consider Example 2 run with Open MPI 2.1.1.
>> > >
>> > > mpirun --> shell for process 0 --> executable for process
>> > > 0 -->
>> > > MPI calls, MPI_Abort
>> > > --> shell for process 1 --> executable for process 1 -->
>> > > MPI calls
>> > >
>> > > After the MPI_Abort is called, ps shows that both shells
>> > > are
>> > > running, and that the executable for process 1 is running (in this
>> > > case,
>> > > process 1 is sleeping). And mpirun does not exit until process 1 is
>> > > finished sleeping.
>> > >
>> > > I cannot reconcile this observed behavior with the
>> > > statement
>> > >
>> > > > 2.x: each process is put into its own process group
>> > > upon launch. When we issue a
>> > > > "kill", we issue it to the process group. Thus,
>> > > every
>> > > child proc of that child proc will
>> > > > receive it. IIRC, this was the intended behavior.
>> > >
>> > > I assume that, for my example, there are two process
>> > > groups.
>> > > The
>> > > process group for process 0 contains the shell for process 0 and the
>> > > executable for process 0; and the process group for process 1 contains
>> > > the shell for process 1 and the executable for process 1. So what
>> > > does
>> > > MPI_ABORT do? MPI_ABORT does not kill the process group for process
>> > > 0,
>> > >
>> > > since the shell for process 0 continues. And MPI_ABORT does not kill
>> > > the process group for process 1, since both the shell and executable
>> > > for
>> > > process 1 continue.
>> > >
>> > > If I hit Ctrl-C after MPI_Abort is called, I get the message
>> > >
>> > > mpirun: abort is already in progress.. hit ctrl-c again to
>> > > forcibly terminate
>> > >
>> > > but I don't need to hit Ctrl-C again because mpirun
>> > > immediately
>> > > exits.
>> > >
>> > > Can you shed some light on all of this?
>> > >
>> > > Sincerely,
>> > >
>> > > Ted Sussman
>> > >
>> > >
>> > > On 15 Jun 2017 at 14:44, ***@open-mpi.org <mailto:***@open-mpi.org> wrote:
>> > >
>> > >
>> > > You have to understand that we have no way of
>> > > knowing who is
>> > > making MPI calls - all we see is
>> > > the proc that we started, and we know someone of
>> > > that rank is
>> > > running (but we have no way of
>> > > knowing which of the procs you sub-spawned it is).
>> > >
>> > > So the behavior you are seeking only occurred in
>> > > some earlier
>> > > release by sheer accident. Nor will
>> > > you find it portable as there is no specification
>> > > directing
>> > > that
>> > > behavior.
>> > >
>> > > The behavior IÂŽve provided is to either deliver the
>> > > signal to
>> > > _
>> > > all_ child processes (including
>> > > grandchildren etc.), or _only_ the immediate child
>> > > of the
>> > > daemon.
>> > > It wonÂŽt do what you describe -
>> > > kill the mPI proc underneath the shell, but not the
>> > > shell
>> > > itself.
>> > >
>> > > What you can eventually do is use PMIx to ask the
>> > > runtime to
>> > > selectively deliver signals to
>> > > pid/procs for you. We donÂŽt have that capability
>> > > implemented
>> > > just yet, IÂŽm afraid.
>> > >
>> > > Meantime, when I get a chance, I can code an
>> > > option that will
>> > > record the pid of the subproc that
>> > > calls MPI_Init, and then letÂŽs you deliver signals to
>> > > just
>> > > that
>> > > proc. No promises as to when that will
>> > > be done.
>> > >
>> > >
>> > > On Jun 15, 2017, at 1:37 PM, Ted Sussman
>> > > <ted.sussman@
>> > > adina.
>> > > com> wrote:
>> > >
>> > > Hello Ralph,
>> > >
>> > > I am just an Open MPI end user, so I will need to
>> > > wait for
>> > > the next official release.
>> > >
>> > > mpirun --> shell for process 0 --> executable for
>> > > process
>> > > 0
>> > > --> MPI calls
>> > > --> shell for process 1 --> executable for process
>> > > 1
>> > > --> MPI calls
>> > > ...
>> > >
>> > > I guess the question is, should MPI_ABORT kill the
>> > > executables or the shells? I naively
>> > > thought, that, since it is the executables that make
>> > > the
>> > > MPI
>> > > calls, it is the executables that
>> > > should be aborted by the call to MPI_ABORT. Since
>> > > the
>> > > shells don't make MPI calls, the
>> > > shells should not be aborted.
>> > >
>> > > And users might have several layers of shells in
>> > > between
>> > > mpirun and the executable.
>> > >
>> > > So now I will look for the latest version of Open MPI
>> > > that
>> > > has the 1.4.3 behavior.
>> > >
>> > > Sincerely,
>> > >
>> > > Ted Sussman
>> > >
>> > > On 15 Jun 2017 at 12:31, ***@open-mpi.org <mailto:***@open-mpi.org> wrote:
>> > >
>> > > >
>> > > > Yeah, things jittered a little there as we debated
>> > > the "
>> > > right" behavior. Generally, when we
>> > > see that
>> > > > happening it means that a param is required, but
>> > > somehow
>> > > we never reached that point.
>> > > >
>> > > > See if https://github.com/open-mpi/ompi/pull/3704 <https://github.com/open-mpi/ompi/pull/3704>
>> > > helps
>> > > -
>> > > if so, I can schedule it for the next
>> > > 2.x
>> > > > release if the RMs agree to take it
>> > > >
>> > > > Ralph
>> > > >
>> > > > On Jun 15, 2017, at 12:20 PM, Ted Sussman <ted.
>> > > sussman
>> > > @adina.com > wrote:
>> > > >
>> > > > Thank you for your comments.
>> > > >
>> > > > Our application relies upon "dum.sh" to clean up
>> > > after
>> > > the process exits, either if the
>> > > process
>> > > > exits normally, or if the process exits abnormally
>> > > because of MPI_ABORT. If the process
>> > > > group is killed by MPI_ABORT, this clean up will not
>> > > be performed. If exec is used to launch
>> > > > the executable from dum.sh, then dum.sh is
>> > > terminated
>> > > by the exec, so dum.sh cannot
>> > > > perform any clean up.
>> > > >
>> > > > I suppose that other user applications might work
>> > > similarly, so it would be good to have an
>> > > > MCA parameter to control the behavior of
>> > > MPI_ABORT.
>> > > >
>> > > > We could rewrite our shell script that invokes
>> > > mpirun,
>> > > so that the cleanup that is now done
>> > > > by
>> > > > dum.sh is done by the invoking shell script after
>> > > mpirun exits. Perhaps this technique is the
>> > > > preferred way to clean up after mpirun is invoked.
>> > > >
>> > > > By the way, I have also tested with Open MPI
>> > > 1.10.7,
>> > > and Open MPI 1.10.7 has different
>> > > > behavior than either Open MPI 1.4.3 or Open MPI
>> > > 2.1.
>> > > 1.
>> > > In this explanation, it is important to
>> > > > know that the aborttest executable sleeps for 20
>> > > sec.
>> > > >
>> > > > When running example 2:
>> > > >
>> > > > 1.4.3: process 1 immediately aborts
>> > > > 1.10.7: process 1 doesn't abort and never stops.
>> > > > 2.1.1 process 1 doesn't abort, but stops after it is
>> > > finished sleeping
>> > > >
>> > > > Sincerely,
>> > > >
>> > > > Ted Sussman
>> > > >
>> > > > On 15 Jun 2017 at 9:18, ***@open-mpi.org <mailto:***@open-mpi.org> wrote:
>> > > >
>> > > > Here is how the system is working:
>> > > >
>> > > > Master: each process is put into its own process
>> > > group
>> > > upon launch. When we issue a
>> > > > "kill", however, we only issue it to the individual
>> > > process (instead of the process group
>> > > > that is headed by that child process). This is
>> > > probably a bug as I donÂŽt believe that is
>> > > > what we intended, but set that aside for now.
>> > > >
>> > > > 2.x: each process is put into its own process group
>> > > upon launch. When we issue a
>> > > > "kill", we issue it to the process group. Thus,
>> > > every
>> > > child proc of that child proc will
>> > > > receive it. IIRC, this was the intended behavior.
>> > > >
>> > > > It is rather trivial to make the change (it only
>> > > involves 3 lines of code), but IÂŽm not sure
>> > > > of what our intended behavior is supposed to be.
>> > > Once
>> > > we clarify that, it is also trivial
>> > > > to add another MCA param (you can never have too
>> > > many!)
>> > > to allow you to select the
>> > > > other behavior.
>> > > >
>> > > >
>> > > > On Jun 15, 2017, at 5:23 AM, Ted Sussman <ted.
>> > > sussman@
>> > > adina.com <http://adina.com/> > wrote:
>> > > >
>> > > > Hello Gilles,
>> > > >
>> > > > Thank you for your quick answer. I confirm that if
>> > > exec is used, both processes
>> > > > immediately
>> > > > abort.
>> > > >
>> > > > Now suppose that the line
>> > > >
>> > > > echo "After aborttest:
>> > > >
>> > > OMPI_COMM_WORLD_RANK="$OMPI_COMM_
>> > > WORLD_RANK
>> > > >
>> > > > is added to the end of dum.sh.
>> > > >
>> > > > If Example 2 is run with Open MPI 1.4.3, the output
>> > > is
>> > > >
>> > > > After aborttest: OMPI_COMM_WORLD_RANK=0
>> > > >
>> > > > which shows that the shell script for the process
>> > > with
>> > > rank 0 continues after the
>> > > > abort,
>> > > > but that the shell script for the process with rank
>> > > 1
>> > > does not continue after the
>> > > > abort.
>> > > >
>> > > > If Example 2 is run with Open MPI 2.1.1, with exec
>> > > used to invoke
>> > > > aborttest02.exe, then
>> > > > there is no such output, which shows that both shell
>> > > scripts do not continue after
>> > > > the abort.
>> > > >
>> > > > I prefer the Open MPI 1.4.3 behavior because our
>> > > original application depends
>> > > > upon the
>> > > > Open MPI 1.4.3 behavior. (Our original application
>> > > will also work if both
>> > > > executables are
>> > > > aborted, and if both shell scripts continue after
>> > > the
>> > > abort.)
>> > > >
>> > > > It might be too much to expect, but is there a way
>> > > to
>> > > recover the Open MPI 1.4.3
>> > > > behavior
>> > > > using Open MPI 2.1.1?
>> > > >
>> > > > Sincerely,
>> > > >
>> > > > Ted Sussman
>> > > >
>> > > >
>> > > > On 15 Jun 2017 at 9:50, Gilles Gouaillardet wrote:
>> > > >
>> > > > Ted,
>> > > >
>> > > >
>> > > > fwiw, the 'master' branch has the behavior you
>> > > expect.
>> > > >
>> > > >
>> > > > meanwhile, you can simple edit your 'dum.sh' script
>> > > and replace
>> > > >
>> > > > /home/buildadina/src/aborttest02/aborttest02.exe
>> > > >
>> > > > with
>> > > >
>> > > > exec /home/buildadina/src/aborttest02/aborttest02.
>> > > exe
>> > > >
>> > > >
>> > > > Cheers,
>> > > >
>> > > >
>> > > > Gilles
>> > > >
>> > > >
>> > > > On 6/15/2017 3:01 AM, Ted Sussman wrote:
>> > > > Hello,
>> > > >
>> > > > My question concerns MPI_ABORT, indirect
>> > > execution
>> > > of
>> > > > executables by mpirun and Open
>> > > > MPI 2.1.1. When mpirun runs executables directly,
>> > > MPI
>> > > _ABORT
>> > > > works as expected, but
>> > > > when mpirun runs executables indirectly,
>> > > MPI_ABORT
>> > > does not
>> > > > work as expected.
>> > > >
>> > > > If Open MPI 1.4.3 is used instead of Open MPI
>> > > 2.1.1,
>> > > MPI_ABORT
>> > > > works as expected in all
>> > > > cases.
>> > > >
>> > > > The examples given below have been simplified as
>> > > far
>> > > as possible
>> > > > to show the issues.
>> > > >
>> > > > ---
>> > > >
>> > > > Example 1
>> > > >
>> > > > Consider an MPI job run in the following way:
>> > > >
>> > > > mpirun ... -app addmpw1
>> > > >
>> > > > where the appfile addmpw1 lists two executables:
>> > > >
>> > > > -n 1 -host gulftown ... aborttest02.exe
>> > > > -n 1 -host gulftown ... aborttest02.exe
>> > > >
>> > > > The two executables are executed on the local node
>> > > gulftown.
>> > > > aborttest02 calls MPI_ABORT
>> > > > for rank 0, then sleeps.
>> > > >
>> > > > The above MPI job runs as expected. Both
>> > > processes
>> > > immediately
>> > > > abort when rank 0 calls
>> > > > MPI_ABORT.
>> > > >
>> > > > ---
>> > > >
>> > > > Example 2
>> > > >
>> > > > Now change the above example as follows:
>> > > >
>> > > > mpirun ... -app addmpw2
>> > > >
>> > > > where the appfile addmpw2 lists shell scripts:
>> > > >
>> > > > -n 1 -host gulftown ... dum.sh
>> > > > -n 1 -host gulftown ... dum.sh
>> > > >
>> > > > dum.sh invokes aborttest02.exe. So aborttest02.exe
>> > > is
>> > > executed
>> > > > indirectly by mpirun.
>> > > >
>> > > > In this case, the MPI job only aborts process 0 when
>> > > rank 0 calls
>> > > > MPI_ABORT. Process 1
>> > > > continues to run. This behavior is unexpected.
>> > > >
>> > > > ----
>> > > >
>> > > > I have attached all files to this E-mail. Since
>> > > there
>> > > are absolute
>> > > > pathnames in the files, to
>> > > > reproduce my findings, you will need to update the
>> > > pathnames in the
>> > > > appfiles and shell
>> > > > scripts. To run example 1,
>> > > >
>> > > > sh run1.sh
>> > > >
>> > > > and to run example 2,
>> > > >
>> > > > sh run2.sh
>> > > >
>> > > > ---
>> > > >
>> > > > I have tested these examples with Open MPI 1.4.3
>> > > and
>> > > 2.
>> > > 0.3. In
>> > > > Open MPI 1.4.3, both
>> > > > examples work as expected. Open MPI 2.0.3 has
>> > > the
>> > > same behavior
>> > > > as Open MPI 2.1.1.
>> > > >
>> > > > ---
>> > > >
>> > > > I would prefer that Open MPI 2.1.1 aborts both
>> > > processes, even
>> > > > when the executables are
>> > > > invoked indirectly by mpirun. If there is an MCA
>> > > setting that is
>> > > > needed to make Open MPI
>> > > > 2.1.1 abort both processes, please let me know.
>> > > >
>> > > >
>> > > > Sincerely,
>> > > >
>> > > > Theodore Sussman
>> > > >
>> > > >
>> > > > The following section of this message contains a
>> > > file
>> > > attachment
>> > > > prepared for transmission using the Internet MIME
>> > > message format.
>> > > > If you are using Pegasus Mail, or any other MIME-
>> > > compliant system,
>> > > > you should be able to save it or view it from within
>> > > your mailer.
>> > > > If you cannot, please ask your system administrator
>> > > for assistance.
>> > > >
>> > > > ---- File information -----------
>> > > > File: config.log.bz2
>> > > > Date: 14 Jun 2017, 13:35
>> > > > Size: 146548 bytes.
>> > > > Type: Binary
>> > > >
>> > > >
>> > > > The following section of this message contains a
>> > > file
>> > > attachment
>> > > > prepared for transmission using the Internet MIME
>> > > message format.
>> > > > If you are using Pegasus Mail, or any other MIME-
>> > > compliant system,
>> > > > you should be able to save it or view it from within
>> > > your mailer.
>> > > > If you cannot, please ask your system administrator
>> > > for assistance.
>> > > >
>> > > > ---- File information -----------
>> > > > File: ompi_info.bz2
>> > > > Date: 14 Jun 2017, 13:35
>> > > > Size: 24088 bytes.
>> > > > Type: Binary
>> > > >
>> > > >
>> > > > The following section of this message contains a
>> > > file
>> > > attachment
>> > > > prepared for transmission using the Internet MIME
>> > > message format.
>> > > > If you are using Pegasus Mail, or any other MIME-
>> > > compliant system,
>> > > > you should be able to save it or view it from within
>> > > your mailer.
>> > > > If you cannot, please ask your system administrator
>> > > for assistance.
>> > > >
>> > > > ---- File information -----------
>> > > > File: aborttest02.tgz
>> > > > Date: 14 Jun 2017, 13:52
>> > > > Size: 4285 bytes.
>> > > > Type: Binary
>> > > >
>> > > >
>> > > >
>> > > ________________________________________
>> > > _______
>> > > > users mailing list
>> > > > ***@lists.open-mpi.org <mailto:***@lists.open-mpi.org>
>> > > >
>> > > https://rfd.newmexicoconsortium.org/mailman/listin <https://rfd.newmexicoconsortium.org/mailman/listin>
>> > > fo/users
>> > >
>> > >
>> > > >
>> > > >
>> > > ________________________________________
>> > > _______
>> > > > users mailing list
>> > > > ***@lists.open-mpi.org <mailto:***@lists.open-mpi.org>
>> > > >
>> > > https://rfd.newmexicoconsortium.org/mailman/listin <https://rfd.newmexicoconsortium.org/mailman/listin>
>> > > fo/users
>> > >
>> > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > ________________________________________
>> > > _______
>> > > > users mailing list
>> > > > ***@lists.open-mpi.org <mailto:***@lists.open-mpi.org>
>> > > >
>> > > https://rfd.newmexicoconsortium.org/mailman/listin <https://rfd.newmexicoconsortium.org/mailman/listin>
>> > > fo/users
>> > >
>> > >
>> > > >
>> > > >
>> > > ________________________________________
>> > > _______
>> > > > users mailing list
>> > > > ***@lists.open-mpi.org <mailto:***@lists.open-mpi.org>
>> > > >
>> > > https://rfd.newmexicoconsortium.org/mailman/listin <https://rfd.newmexicoconsortium.org/mailman/listin>
>> > > fo/users
>> > >
>> > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > ________________________________________
>> > > _______
>> > > > users mailing list
>> > > > ***@lists.open-mpi.org <mailto:***@lists.open-mpi.org>
>> > > >
>> > > https://rfd.newmexicoconsortium.org/mailman/listin <https://rfd.newmexicoconsortium.org/mailman/listin>
>> > > fo/users
>> > >
>> > >
>> > > >
>> > >
>> > >
>> > > __________________________________________
>> > > _____
>> > > users mailing list
>> > > ***@lists.open-mpi.org <mailto:***@lists.open-mpi.org>
>> > >
>> > > https://rfd.newmexicoconsortium.org/mailman/listin <https://rfd.newmexicoconsortium.org/mailman/listin>
>> > > fo/users
>> > >
>> > >
>> > >
>> > > _____________________________________________
>> > > __
>> > > users mailing list
>> > > ***@lists.open-mpi.org <mailto:***@lists.open-mpi.org>
>> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/us <https://rfd.newmexicoconsortium.org/mailman/listinfo/us>
>> > > ers
>> > >
>> > >
>> > > --
>> > > Jeff Squyres
>> > > ***@cisco.com <mailto:***@cisco.com>
>> > >
>> > > _______________________________________________
>> > > users mailing list
>> > > ***@lists.open-mpi.org <mailto:***@lists.open-mpi.org>
>> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>> > >
>> > >
>> > >
>> > > _______________________________________________
>> > > users mailing list
>> > > ***@lists.open-mpi.org <mailto:***@lists.open-mpi.org>
>> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>> > >
>> > > _______________________________________________
>> > > users mailing list
>> > > ***@lists.open-mpi.org <mailto:***@lists.open-mpi.org>
>> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>> > >
>> > > _______________________________________________
>> > > users mailing list
>> > > ***@lists.open-mpi.org <mailto:***@lists.open-mpi.org>
>> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>> > >
>> > >
>> > >
>> > >
>> > > The following section of this message contains a file attachment
>> > > prepared for transmission using the Internet MIME message format.
>> > > If you are using Pegasus Mail, or any other MIME-compliant system,
>> > > you should be able to save it or view it from within your mailer.
>> > > If you cannot, please ask your system administrator for assistance.
>> > >
>> > > ---- File information -----------
>> > > File: aborttest10.tgz
>> > > Date: 19 Jun 2017, 12:42
>> > > Size: 4740 bytes.
>> > > Type: Binary
>> > > <aborttest10.tgz>_______________________________________________
>> > > users mailing list
>> > > ***@lists.open-mpi.org <mailto:***@lists.open-mpi.org>
>> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>> > >
>> >
>> >
>> > The following section of this message contains a file attachment
>> > prepared for transmission using the Internet MIME message format.
>> > If you are using Pegasus Mail, or any other MIME-compliant system,
>> > you should be able to save it or view it from within your mailer.
>> > If you cannot, please ask your system administrator for assistance.
>> >
>> > ---- File information -----------
>> > File: aborttest11.tgz
>> > Date: 19 Jun 2017, 13:48
>> > Size: 3800 bytes.
>> > Type: Unknown
>> > <aborttest11.tgz> _______________________________________________
>> > users mailing list
>> > ***@lists.open-mpi.org <mailto:***@lists.open-mpi.org>
>> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>> >
>>
>>
>> _______________________________________________
>> users mailing list
>> ***@lists.open-mpi.org <mailto:***@lists.open-mpi.org>
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Ted Sussman
2017-06-27 20:00:15 UTC
Permalink
Hello Ralph,

Thanks for your quick reply and bug fix. I have obtained the update and tried it in my simple
example, and also in the original program from which the simple example was extracted.
The update works as expected :)

Sincerely,

Ted Sussman

On 27 Jun 2017 at 12:13, ***@open-mpi.org wrote:

>
> Oh my - I finally tracked it down. A simple one character error.
>
> Thanks for your patience. Fix is https://github.com/open-mpi/ompi/pull/3773 and will be ported to 2.x
> and 3.0
> Ralph
>
> On Jun 27, 2017, at 11:17 AM, ***@open-mpi.org wrote:
>
> Ideally, we should be delivering the signal to all procs in the process group of each dum.sh.
> Looking at the code in the head of the 2.x branch, that does indeed appear to be what we
> are doing, assuming that we found setpgid in your system:
>
> static int odls_default_kill_local(pid_t pid, int signum)
> {
>     pid_t pgrp;
>
> #if HAVE_SETPGID
>     pgrp = getpgid(pid);
>     if (-1 != pgrp) {
>         /* target the lead process of the process
>          * group so we ensure that the signal is
>          * seen by all members of that group. This
>          * ensures that the signal is seen by any
>          * child processes our child may have
>          * started
>          */
>         pid = pgrp;
>     }
> #endif
>     if (0 != kill(pid, signum)) {
>         if (ESRCH != errno) {
>             OPAL_OUTPUT_VERBOSE((2, orte_odls_base_framework.framework_output,
>                                  "%s odls:default:SENT KILL %d TO PID %d GOT ERRNO %d",
>                                  ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), signum, (int)pid, errno));
>             return errno;
>         }
>     }
>     OPAL_OUTPUT_VERBOSE((2, orte_odls_base_framework.framework_output,
>                          "%s odls:default:SENT KILL %d TO PID %d SUCCESS",
>                          ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), signum, (int)pid));
>     return 0;
> }
>
> For some strange reason, it appears that you aren´t see this? I´m building the branch now
> and will see if I can reproduce it.
>
> On Jun 27, 2017, at 10:58 AM, Ted Sussman <***@adina.com > wrote:
>
> Hello all,
>
> Thank you for your help and advice.  It has taken me several days to understand what
> you were trying to tell me.  I have now studied the problem in more detail, using a
> version of Open MPI 2.1.1 built with --enable-debug.
>
> -----
>
> Consider the following scenario in Open MPI 2.1.1:
>
> mpirun --> dum.sh --> aborttest.exe  (rank 0)
>        --> dum.sh --> aborttest.exe  (rank 1)
>
> aborttest.exe calls MPI_Bcast several times, then aborttest.exe rank 0 calls
> MPI_Abort.
>
> As far as I can figure out, this is what happens after aborttest.exe rank 0 calls
> MPI_Abort.
>
> 1) aborttest.exe for rank 0 exits.  aborttest.exe for rank 1 is polling (waiting for
> message from MPI_Bcast).
>
> 2) mpirun (or maybe orted?) sends the signals SIGCONT, SIGTERM, SIGKILL to both
> dum.sh processes.
>
> 3) Both dum.sh processes are killed.
>
> 4) aborttest.exe for rank 1 continues to poll. mpirun never exits.
>
> ----
>
> Now suppose that dum.sh traps SIGCONT, and that the trap handler in dum.sh sends
> signal SIGINT to $PPID.  This is what seems to happen after aborttest.exe rank 0 calls
> MPI_Abort:
>
> 1) aborttest.exe for rank 0 exits. aborttest.exe for rank 1 is polling (waiting for
> message from MPI_Bcast).
>
> 2) mpirun  (or maybe orted?) sends the signals SIGCONT, SIGTERM, SIGKILL to both
> dum.sh processes.
>
> 3) dum.sh for rank 0 catches SIGCONT and sents SIGINT to its parent.  dum.sh for
> rank 1 appears to be killed (I don't understand this, why doesn't dum.sh for rank 1 also
> catch SIGCONT?)
>
> 4) mpirun catches the SIGINT and kills aborttest.exe for rank 1, then mpirun exits.
>
> So adding the trap handler to dum.sh solves my problem.
>
> Is this the preferred solution to my problem?  Or is there a more elegant solution?
>
> Sincerely,
>
> Ted Sussman
>
>
>
>
>
>
>
>
> On 19 Jun 2017 at 11:19, ***@open-mpi.org wrote:
>
> >
> >
> >
> >     On Jun 19, 2017, at 10:53 AM, Ted Sussman <***@adina.com > wrote:
> >
> >     For what it's worth, the problem might be related to the following:
> >
> >     mpirun: -np 2 ... dum.sh
> >     dum.sh: Invoke aborttest11.exe
> >     aborttest11.exe: Call  MPI_Init, go into an infinite loop.
> >
> >     Now when mpirun is running, send signals at the processes, as follows:
> >
> >     1) kill -9 (pid for one of the aborttest11.exe processes)
> >
> >     The shell for this aborttest11.exe continues. Once this shell exits, then Open MPI
> sends
> >     signals to both shells, killing the other shell, but the remaining aborttest11.exe
> survives.  The
> >     PPID for the remaining aborttest11.exe becomes 1.
> >
> > We have no visibility into your aborttest processes since we didn´t launch them. So
> killing one of
> > them is invisible to us. We can only see the shell scripts.
> >
> >
> >     2) kill -9 (pid for one of the dum.sh processes).
> >
> >     Open MPI sends signals to both of the shells. Both shells are killed off, but both
> >     aborttest11.exe processes survive, with PPID set to 1.
> >
> > This again is a question of how you handle things in your program. The _only_
> process we can
> > see is your script. If you kill a script that started a process, then your process is
> going to have to
> > know how to detect the script has died and "suicide" - there is nothing we can do to
> help.
> >
> > Honestly, it sounds to me like the real problem here is that your .exe program isn´t
> monitoring the
> > shell above it to know when to "suicide". I don´t see how we can help you there.
> >
> >
> >
> >     On 19 Jun 2017 at 10:10, ***@open-mpi.org wrote:
> >
> >     >
> >     > That is typical behavior when you throw something into "sleep" - not much we can
> do
> >     about it, I
> >     > think.
> >     >
> >     >     On Jun 19, 2017, at 9:58 AM, Ted Sussman <***@adina.com > wrote:
> >     >
> >     >     Hello,
> >     >    
> >     >     I have rebuilt Open MPI 2.1.1 on the same computer, including --enable-debug.
> >     >    
> >     >     I have attached the abort test program aborttest10.tgz.  This version sleeps for 5 sec
> before
> >     >     calling MPI_ABORT, so that I can check the pids using ps.
> >     >    
> >     >     This is what happens (see run2.sh.out).
> >     >    
> >     >     Open MPI invokes two instances of dum.sh.  Each instance of dum.sh invokes
> aborttest.exe.
> >     >    
> >     >     Pid    Process
> >     >     -------------------
> >     >     19565  dum.sh
> >     >     19566  dum.sh
> >     >     19567 aborttest10.exe
> >     >     19568 aborttest10.exe
> >     >    
> >     >     When MPI_ABORT is called, Open MPI sends SIGCONT, SIGTERM and SIGKILL to
> both
> >     >     instances of dum.sh (pids 19565 and 19566).
> >     >    
> >     >     ps shows that both the shell processes vanish, and that one of the aborttest10.exe
> >     processes
> >     >     vanishes.  But the other aborttest10.exe remains and continues until it is finished
> sleeping.
> >     >    
> >     >     Hope that this information is useful.
> >     >    
> >     >     Sincerely,
> >     >    
> >     >     Ted Sussman
> >     >    
> >     >    
> >     >    
> >     >     On 19 Jun 2017 at 23:06,  ***@rist.or.jp   wrote:
> >     >
> >     >    
> >     >      Ted,
> >     >      
> >     >     some traces are missing  because you did not configure with --enable-debug
> >     >     i am afraid you have to do it (and you probably want to install that debug version in an
> >     >     other
> >     >     location since its performances are not good for production) in order to get all the
> logs.
> >     >      
> >     >     Cheers,
> >     >      
> >     >     Gilles
> >     >      
> >     >     ----- Original Message -----
> >     >        Hello Gilles,
> >     >    
> >     >        I retried my example, with the same results as I observed before.  The process with
> rank
> >     >     1
> >     >        does not get killed by MPI_ABORT.
> >     >    
> >     >        I have attached to this E-mail:
> >     >    
> >     >          config.log.bz2
> >     >          ompi_info.bz2  (uses ompi_info -a)
> >     >          aborttest09.tgz
> >     >    
> >     >        This testing is done on a computer running Linux 3.10.0.  This is a different computer
> >     >     than
> >     >        the computer that I previously used for testing.  You can confirm that I am using Open
> >     >     MPI
> >     >        2.1.1.
> >     >    
> >     >        tar xvzf aborttest09.tgz
> >     >        cd aborttest09
> >     >        ./sh run2.sh
> >     >    
> >     >        run2.sh contains the command
> >     >    
> >     >        /opt/openmpi-2.1.1-GNU/bin/mpirun -np 2 -mca btl tcp,self --mca odls_base_verbose
> >     >     10
> >     >        ./dum.sh
> >     >    
> >     >        The output from this run is in aborttest09/run2.sh.out.
> >     >    
> >     >        The output shows that the the "default" component is selected by odls.
> >     >    
> >     >        The only messages from odls are: odls: launch spawning child ...  (two messages).
> >     >     There
> >     >        are no messages from odls with "kill" and I see no SENDING SIGCONT / SIGKILL
> >     >        messages.
> >     >    
> >     >        I am not running from within any batch manager.
> >     >    
> >     >        Sincerely,
> >     >    
> >     >        Ted Sussman
> >     >    
> >     >        On 17 Jun 2017 at 16:02, ***@rist.or.jp wrote:
> >     >
> >     >     Ted,
> >     >    
> >     >     i do not observe the same behavior you describe with Open MPI 2.1.1
> >     >    
> >     >     # mpirun -np 2 -mca btl tcp,self --mca odls_base_verbose 5 ./abort.sh
> >     >    
> >     >     abort.sh 31361 launching abort
> >     >     abort.sh 31362 launching abort
> >     >     I am rank 0 with pid 31363
> >     >     I am rank 1 with pid 31364
> >     >     ------------------------------------------------------------------------
> >     >     --
> >     >     MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> >     >     with errorcode 1.
> >     >    
> >     >     NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> >     >     You may or may not see output from other processes, depending on
> >     >     exactly when Open MPI kills them.
> >     >     ------------------------------------------------------------------------
> >     >     --
> >     >     [linux:31356] [[18199,0],0] odls:kill_local_proc working on WILDCARD
> >     >     [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> >     >     [[18199,1],0]
> >     >     [linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],0]
> >     >     [linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 31361
> >     >     SUCCESS
> >     >     [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> >     >     [[18199,1],1]
> >     >     [linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],1]
> >     >     [linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 31362
> >     >     SUCCESS
> >     >     [linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],0]
> >     >     [linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 31361
> >     >     SUCCESS
> >     >     [linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],1]
> >     >     [linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 31362
> >     >     SUCCESS
> >     >     [linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],0]
> >     >     [linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 31361
> >     >     SUCCESS
> >     >     [linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],1]
> >     >     [linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 31362
> >     >     SUCCESS
> >     >     [linux:31356] [[18199,0],0] odls:kill_local_proc working on WILDCARD
> >     >     [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> >     >     [[18199,1],0]
> >     >     [linux:31356] [[18199,0],0] odls:kill_local_proc child [[18199,1],0] is
> >     >     not alive
> >     >     [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> >     >     [[18199,1],1]
> >     >     [linux:31356] [[18199,0],0] odls:kill_local_proc child [[18199,1],1] is
> >     >     not alive
> >     >    
> >     >    
> >     >     Open MPI did kill both shells, and they were indeed killed as evidenced
> >     >     by ps
> >     >    
> >     >     #ps -fu gilles --forest
> >     >     UID        PID  PPID  C STIME TTY          TIME CMD
> >     >     gilles    1564  1561  0 15:39 ?        00:00:01 sshd: ***@pts/1
> >     >     gilles    1565  1564  0 15:39 pts/1    00:00:00  \_ -bash
> >     >     gilles   31356  1565  3 15:57 pts/1    00:00:00      \_ /home/gilles/
> >     >     local/ompi-v2.x/bin/mpirun -np 2 -mca btl tcp,self --mca odls_base
> >     >     gilles   31364     1  1 15:57 pts/1    00:00:00 ./abort
> >     >    
> >     >    
> >     >     so trapping SIGTERM in your shell and manually killing the MPI task
> >     >     should work
> >     >     (as Jeff explained, as long as the shell script is fast enough to do
> >     >     that between SIGTERM and SIGKILL)
> >     >    
> >     >    
> >     >     if you observe a different behavior, please double check your Open MPI
> >     >     version and post the outputs of the same commands.
> >     >    
> >     >     btw, are you running from a batch manager ? if yes, which one ?
> >     >    
> >     >     Cheers,
> >     >    
> >     >     Gilles
> >     >    
> >     >     ----- Original Message -----
> >     >     Ted,
> >     >    
> >     >     if you
> >     >    
> >     >     mpirun --mca odls_base_verbose 10 ...
> >     >    
> >     >     you will see which processes get killed and how
> >     >    
> >     >     Best regards,
> >     >    
> >     >    
> >     >     Gilles
> >     >    
> >     >     ----- Original Message -----
> >     >     Hello Jeff,
> >     >    
> >     >     Thanks for your comments.
> >     >    
> >     >     I am not seeing behavior #4, on the two computers that I have
> >     >     tested
> >     >     on, using Open MPI
> >     >     2.1.1.
> >     >    
> >     >     I wonder if you can duplicate my results with the files that I have
> >     >     uploaded.
> >     >    
> >     >     Regarding what is the "correct" behavior, I am willing to modify my
> >     >     application to correspond
> >     >     to Open MPI's behavior (whatever behavior the Open MPI
> >     >     developers
> >     >     decide is best) --
> >     >     provided that Open MPI does in fact kill off both shells.
> >     >    
> >     >     So my highest priority now is to find out why Open MPI 2.1.1 does
> >     >     not
> >     >     kill off both shells on
> >     >     my computer.
> >     >    
> >     >     Sincerely,
> >     >    
> >     >     Ted Sussman
> >     >    
> >     >       On 16 Jun 2017 at 16:35, Jeff Squyres (jsquyres) wrote:
> >     >
> >     >     Ted --
> >     >    
> >     >     Sorry for jumping in late.  Here's my $0.02...
> >     >    
> >     >     In the runtime, we can do 4 things:
> >     >    
> >     >     1. Kill just the process that we forked.
> >     >     2. Kill just the process(es) that call back and identify
> >     >     themselves
> >     >     as MPI processes (we don't track this right now, but we could add that
> >     >     functionality).
> >     >     3. Union of #1 and #2.
> >     >     4. Kill all processes (to include any intermediate processes
> >     >     that
> >     >     are not included in #1 and #2).
> >     >    
> >     >     In Open MPI 2.x, #4 is the intended behavior.  There may be a
> >     >     bug
> >     >     or
> >     >     two that needs to get fixed (e.g., in your last mail, I don't see
> >     >     offhand why it waits until the MPI process finishes sleeping), but we
> >     >     should be killing the process group, which -- unless any of the
> >     >     descendant processes have explicitly left the process group -- should
> >     >     hit the entire process tree. 
> >     >    
> >     >     Sidenote: there's actually a way to be a bit more aggressive
> >     >     and
> >     >     do
> >     >     a better job of ensuring that we kill *all* processes (via creative
> >     >     use
> >     >     of PR_SET_CHILD_SUBREAPER), but that's basically a future
> >     >     enhancement
> >     >     /
> >     >     optimization.
> >     >    
> >     >     I think Gilles and Ralph proposed a good point to you: if you
> >     >     want
> >     >     to be sure to be able to do cleanup after an MPI process terminates (
> >     >     normally or abnormally), you should trap signals in your intermediate
> >     >     processes to catch what Open MPI's runtime throws and therefore know
> >     >     that it is time to cleanup. 
> >     >    
> >     >     Hypothetically, this should work in all versions of Open MPI...?
> >     >    
> >     >     I think Ralph made a pull request that adds an MCA param to
> >     >     change
> >     >     the default behavior from #4 to #1.
> >     >    
> >     >     Note, however, that there's a little time between when Open
> >     >     MPI
> >     >     sends the SIGTERM and the SIGKILL, so this solution could be racy.  If
> >     >     you find that you're running out of time to cleanup, we might be able
> >     >     to
> >     >     make the delay between the SIGTERM and SIGKILL be configurable
> >     >     (e.g.,
> >     >     via MCA param).
> >     >    
> >     >    
> >     >    
> >     >
> >     >     On Jun 16, 2017, at 10:08 AM, Ted Sussman
> >     >     <***@adina.com
> >     >    
> >     >     wrote:
> >     >    
> >     >     Hello Gilles and Ralph,
> >     >    
> >     >     Thank you for your advice so far.  I appreciate the time
> >     >     that
> >     >     you
> >     >     have spent to educate me about the details of Open MPI.
> >     >    
> >     >     But I think that there is something fundamental that I
> >     >     don't
> >     >     understand.  Consider Example 2 run with Open MPI 2.1.1.
> >     >    
> >     >     mpirun --> shell for process 0 -->  executable for process
> >     >     0 -->
> >     >     MPI calls, MPI_Abort
> >     >             --> shell for process 1 -->  executable for process 1 -->
> >     >     MPI calls
> >     >    
> >     >     After the MPI_Abort is called, ps shows that both shells
> >     >     are
> >     >     running, and that the executable for process 1 is running (in this
> >     >     case,
> >     >     process 1 is sleeping).  And mpirun does not exit until process 1 is
> >     >     finished sleeping.
> >     >    
> >     >     I cannot reconcile this observed behavior with the
> >     >     statement
> >     >
> >     >           >     2.x: each process is put into its own process group
> >     >     upon launch. When we issue a
> >     >          >     "kill", we issue it to the process group. Thus,
> >     >     every
> >     >     child proc of that child proc will
> >     >          >     receive it. IIRC, this was the intended behavior.
> >     >    
> >     >     I assume that, for my example, there are two process
> >     >     groups. 
> >     >     The
> >     >     process group for process 0 contains the shell for process 0 and the
> >     >     executable for process 0; and the process group for process 1 contains
> >     >     the shell for process 1 and the executable for process 1.  So what
> >     >     does
> >     >     MPI_ABORT do?  MPI_ABORT does not kill the process group for process
> >     >     0,
> >     >      
> >     >     since the shell for process 0 continues.  And MPI_ABORT does not kill
> >     >     the process group for process 1, since both the shell and executable
> >     >     for
> >     >     process 1 continue.
> >     >    
> >     >     If I hit Ctrl-C after MPI_Abort is called, I get the message
> >     >    
> >     >     mpirun: abort is already in progress.. hit ctrl-c again to
> >     >     forcibly terminate
> >     >    
> >     >     but I don't need to hit Ctrl-C again because mpirun
> >     >     immediately
> >     >     exits.
> >     >    
> >     >     Can you shed some light on all of this?
> >     >    
> >     >     Sincerely,
> >     >    
> >     >     Ted Sussman
> >     >    
> >     >    
> >     >     On 15 Jun 2017 at 14:44, ***@open-mpi.org wrote:
> >     >
> >     >    
> >     >     You have to understand that we have no way of
> >     >     knowing who is
> >     >     making MPI calls - all we see is
> >     >     the proc that we started, and we know someone of
> >     >     that rank is
> >     >     running (but we have no way of
> >     >     knowing which of the procs you sub-spawned it is).
> >     >    
> >     >     So the behavior you are seeking only occurred in
> >     >     some earlier
> >     >     release by sheer accident. Nor will
> >     >     you find it portable as there is no specification
> >     >     directing
> >     >     that
> >     >     behavior.
> >     >    
> >     >     The behavior I´ve provided is to either deliver the
> >     >     signal to
> >     >     _
> >     >     all_ child processes (including
> >     >     grandchildren etc.), or _only_ the immediate child
> >     >     of the
> >     >     daemon.
> >     >       It won´t do what you describe -
> >     >     kill the mPI proc underneath the shell, but not the
> >     >     shell
> >     >     itself.
> >     >    
> >     >     What you can eventually do is use PMIx to ask the
> >     >     runtime to
> >     >     selectively deliver signals to
> >     >     pid/procs for you. We don´t have that capability
> >     >     implemented
> >     >     just yet, I´m afraid.
> >     >    
> >     >     Meantime, when I get a chance, I can code an
> >     >     option that will
> >     >     record the pid of the subproc that
> >     >     calls MPI_Init, and then let´s you deliver signals to
> >     >     just
> >     >     that
> >     >     proc. No promises as to when that will
> >     >     be done.
> >     >    
> >     >    
> >     >           On Jun 15, 2017, at 1:37 PM, Ted Sussman
> >     >     <ted.sussman@
> >     >     adina.
> >     >     com> wrote:
> >     >    
> >     >          Hello Ralph,
> >     >    
> >     >           I am just an Open MPI end user, so I will need to
> >     >     wait for
> >     >     the next official release.
> >     >    
> >     >          mpirun --> shell for process 0 -->  executable for
> >     >     process
> >     >     0
> >     >     --> MPI calls
> >     >                  --> shell for process 1 -->  executable for process
> >     >     1
> >     >     --> MPI calls
> >     >                                           ...
> >     >    
> >     >          I guess the question is, should MPI_ABORT kill the
> >     >     executables or the shells?  I naively
> >     >          thought, that, since it is the executables that make
> >     >     the
> >     >     MPI
> >     >     calls, it is the executables that
> >     >          should be aborted by the call to MPI_ABORT.  Since
> >     >     the
> >     >     shells don't make MPI calls, the
> >     >           shells should not be aborted.
> >     >    
> >     >          And users might have several layers of shells in
> >     >     between
> >     >     mpirun and the executable.
> >     >    
> >     >          So now I will look for the latest version of Open MPI
> >     >     that
> >     >     has the 1.4.3 behavior.
> >     >    
> >     >          Sincerely,
> >     >    
> >     >          Ted Sussman
> >     >    
> >     >           On 15 Jun 2017 at 12:31, ***@open-mpi.org wrote:
> >     >    
> >     >          >
> >     >           > Yeah, things jittered a little there as we debated
> >     >     the "
> >     >     right" behavior. Generally, when we
> >     >          see that
> >     >          > happening it means that a param is required, but
> >     >     somehow
> >     >     we never reached that point.
> >     >          >
> >     >          > See if https://github.com/open-mpi/ompi/pull/3704  
> >     >     helps
> >     >     -
> >     >     if so, I can schedule it for the next
> >     >          2.x
> >     >           > release if the RMs agree to take it
> >     >          >
> >     >          > Ralph
> >     >           >
> >     >          >     On Jun 15, 2017, at 12:20 PM, Ted Sussman <ted.
> >     >     sussman
> >     >     @adina.com > wrote:
> >     >           >
> >     >          >     Thank you for your comments.
> >     >           >   
> >     >          >     Our application relies upon "dum.sh" to clean up
> >     >     after
> >     >     the process exits, either if the
> >     >           process
> >     >          >     exits normally, or if the process exits abnormally
> >     >     because of MPI_ABORT.  If the process
> >     >           >     group is killed by MPI_ABORT, this clean up will not
> >     >     be performed.  If exec is used to launch
> >     >          >     the executable from dum.sh, then dum.sh is
> >     >     terminated
> >     >     by the exec, so dum.sh cannot
> >     >          >     perform any clean up.
> >     >          >   
> >     >           >     I suppose that other user applications might work
> >     >     similarly, so it would be good to have an
> >     >          >     MCA parameter to control the behavior of
> >     >     MPI_ABORT.
> >     >          >   
> >     >          >     We could rewrite our shell script that invokes
> >     >     mpirun,
> >     >     so that the cleanup that is now done
> >     >          >     by
> >     >           >     dum.sh is done by the invoking shell script after
> >     >     mpirun exits.  Perhaps this technique is the
> >     >          >     preferred way to clean up after mpirun is invoked.
> >     >           >   
> >     >          >     By the way, I have also tested with Open MPI
> >     >     1.10.7,
> >     >     and Open MPI 1.10.7 has different
> >     >           >     behavior than either Open MPI 1.4.3 or Open MPI
> >     >     2.1.
> >     >     1.
> >     >        In this explanation, it is important to
> >     >           >     know that the aborttest executable sleeps for 20
> >     >     sec.
> >     >          >   
> >     >           >     When running example 2:
> >     >          >   
> >     >          >     1.4.3: process 1 immediately aborts
> >     >          >     1.10.7: process 1 doesn't abort and never stops.
> >     >           >     2.1.1 process 1 doesn't abort, but stops after it is
> >     >     finished sleeping
> >     >          >   
> >     >          >     Sincerely,
> >     >          >   
> >     >          >     Ted Sussman
> >     >           >   
> >     >          >     On 15 Jun 2017 at 9:18, ***@open-mpi.org wrote:
> >     >          >
> >     >          >     Here is how the system is working:
> >     >           >   
> >     >          >     Master: each process is put into its own process
> >     >     group
> >     >     upon launch. When we issue a
> >     >          >     "kill", however, we only issue it to the individual
> >     >     process (instead of the process group
> >     >          >     that is headed by that child process). This is
> >     >     probably a bug as I don´t believe that is
> >     >          >     what we intended, but set that aside for now.
> >     >           >   
> >     >          >     2.x: each process is put into its own process group
> >     >     upon launch. When we issue a
> >     >          >     "kill", we issue it to the process group. Thus,
> >     >     every
> >     >     child proc of that child proc will
> >     >          >     receive it. IIRC, this was the intended behavior.
> >     >           >   
> >     >          >     It is rather trivial to make the change (it only
> >     >     involves 3 lines of code), but I´m not sure
> >     >          >     of what our intended behavior is supposed to be.
> >     >     Once
> >     >     we clarify that, it is also trivial
> >     >          >     to add another MCA param (you can never have too
> >     >     many!)
> >     >       to allow you to select the
> >     >          >     other behavior.
> >     >          >   
> >     >          >
> >     >           >     On Jun 15, 2017, at 5:23 AM, Ted Sussman <ted.
> >     >     sussman@
> >     >     adina.com > wrote:
> >     >          >   
> >     >          >     Hello Gilles,
> >     >          >   
> >     >           >     Thank you for your quick answer.  I confirm that if
> >     >     exec is used, both processes
> >     >          >     immediately
> >     >           >     abort.
> >     >          >   
> >     >           >     Now suppose that the line
> >     >          >   
> >     >          >     echo "After aborttest:
> >     >          >    
> >     >     OMPI_COMM_WORLD_RANK="$OMPI_COMM_
> >     >     WORLD_RANK
> >     >           >   
> >     >          >     is added to the end of dum.sh.
> >     >          >   
> >     >          >     If Example 2 is run with Open MPI 1.4.3, the output
> >     >     is
> >     >          >   
> >     >          >     After aborttest: OMPI_COMM_WORLD_RANK=0
> >     >          >   
> >     >          >     which shows that the shell script for the process
> >     >     with
> >     >     rank 0 continues after the
> >     >           >     abort,
> >     >          >     but that the shell script for the process with rank
> >     >     1
> >     >     does not continue after the
> >     >           >     abort.
> >     >          >   
> >     >           >     If Example 2 is run with Open MPI 2.1.1, with exec
> >     >     used to invoke
> >     >          >     aborttest02.exe, then
> >     >          >     there is no such output, which shows that both shell
> >     >     scripts do not continue after
> >     >          >     the abort.
> >     >          >   
> >     >           >     I prefer the Open MPI 1.4.3 behavior because our
> >     >     original application depends
> >     >          >     upon the
> >     >           >     Open MPI 1.4.3 behavior.  (Our original application
> >     >     will also work if both
> >     >          >     executables are
> >     >           >     aborted, and if both shell scripts continue after
> >     >     the
> >     >     abort.)
> >     >          >   
> >     >           >     It might be too much to expect, but is there a way
> >     >     to
> >     >     recover the Open MPI 1.4.3
> >     >          >     behavior
> >     >           >     using Open MPI 2.1.1? 
> >     >          >   
> >     >           >     Sincerely,
> >     >          >   
> >     >          >     Ted Sussman
> >     >          >   
> >     >          >   
> >     >           >     On 15 Jun 2017 at 9:50, Gilles Gouaillardet wrote:
> >     >          >
> >     >          >     Ted,
> >     >          >   
> >     >           >   
> >     >          >     fwiw, the 'master' branch has the behavior you
> >     >     expect.
> >     >          >   
> >     >          >   
> >     >          >     meanwhile, you can simple edit your 'dum.sh' script
> >     >     and replace
> >     >           >   
> >     >          >     /home/buildadina/src/aborttest02/aborttest02.exe
> >     >           >   
> >     >          >     with
> >     >           >   
> >     >          >     exec /home/buildadina/src/aborttest02/aborttest02.
> >     >     exe
> >     >           >   
> >     >          >   
> >     >          >     Cheers,
> >     >          >   
> >     >          >   
> >     >          >     Gilles
> >     >          >   
> >     >           >   
> >     >          >     On 6/15/2017 3:01 AM, Ted Sussman wrote:
> >     >           >     Hello,
> >     >          >   
> >     >          >     My question concerns MPI_ABORT, indirect
> >     >     execution
> >     >     of
> >     >          >     executables by mpirun and Open
> >     >          >     MPI 2.1.1.  When mpirun runs executables directly,
> >     >     MPI
> >     >     _ABORT
> >     >          >     works as expected, but
> >     >           >     when mpirun runs executables indirectly,
> >     >     MPI_ABORT
> >     >     does not
> >     >          >     work as expected.
> >     >          >   
> >     >          >     If Open MPI 1.4.3 is used instead of Open MPI
> >     >     2.1.1,
> >     >     MPI_ABORT
> >     >          >     works as expected in all
> >     >           >     cases.
> >     >          >   
> >     >           >     The examples given below have been simplified as
> >     >     far
> >     >     as possible
> >     >          >     to show the issues.
> >     >          >   
> >     >          >     ---
> >     >          >   
> >     >           >     Example 1
> >     >          >   
> >     >           >     Consider an MPI job run in the following way:
> >     >          >   
> >     >           >     mpirun ... -app addmpw1
> >     >          >   
> >     >          >     where the appfile addmpw1 lists two executables:
> >     >          >   
> >     >          >     -n 1 -host gulftown ... aborttest02.exe
> >     >          >     -n 1 -host gulftown ... aborttest02.exe
> >     >           >   
> >     >          >     The two executables are executed on the local node
> >     >     gulftown.
> >     >          >      aborttest02 calls MPI_ABORT
> >     >          >     for rank 0, then sleeps.
> >     >          >   
> >     >          >     The above MPI job runs as expected.  Both
> >     >     processes
> >     >     immediately
> >     >          >     abort when rank 0 calls
> >     >          >     MPI_ABORT.
> >     >          >   
> >     >           >     ---
> >     >          >   
> >     >           >     Example 2
> >     >          >   
> >     >          >     Now change the above example as follows:
> >     >          >   
> >     >          >     mpirun ... -app addmpw2
> >     >          >   
> >     >          >     where the appfile addmpw2 lists shell scripts:
> >     >          >   
> >     >          >     -n 1 -host gulftown ... dum.sh
> >     >          >     -n 1 -host gulftown ... dum.sh
> >     >          >   
> >     >          >     dum.sh invokes aborttest02.exe.  So aborttest02.exe
> >     >     is
> >     >     executed
> >     >          >     indirectly by mpirun.
> >     >          >   
> >     >          >     In this case, the MPI job only aborts process 0 when
> >     >     rank 0 calls
> >     >           >     MPI_ABORT.  Process 1
> >     >          >     continues to run.  This behavior is unexpected.
> >     >          >   
> >     >          >     ----
> >     >           >   
> >     >          >     I have attached all files to this E-mail.  Since
> >     >     there
> >     >     are absolute
> >     >           >     pathnames in the files, to
> >     >          >     reproduce my findings, you will need to update the
> >     >     pathnames in the
> >     >           >     appfiles and shell
> >     >          >     scripts.  To run example 1,
> >     >           >   
> >     >          >     sh run1.sh
> >     >           >   
> >     >          >     and to run example 2,
> >     >          >   
> >     >          >     sh run2.sh
> >     >          >   
> >     >           >     ---
> >     >          >   
> >     >           >     I have tested these examples with Open MPI 1.4.3
> >     >     and
> >     >     2.
> >     >     0.3.  In
> >     >          >     Open MPI 1.4.3, both
> >     >           >     examples work as expected.  Open MPI 2.0.3 has
> >     >     the
> >     >     same behavior
> >     >          >     as Open MPI 2.1.1.
> >     >          >   
> >     >          >     ---
> >     >           >   
> >     >          >     I would prefer that Open MPI 2.1.1 aborts both
> >     >     processes, even
> >     >          >     when the executables are
> >     >          >     invoked indirectly by mpirun.  If there is an MCA
> >     >     setting that is
> >     >          >     needed to make Open MPI
> >     >          >     2.1.1 abort both processes, please let me know.
> >     >           >   
> >     >          >   
> >     >          >     Sincerely,
> >     >          >   
> >     >          >     Theodore Sussman
> >     >           >   
> >     >          >   
> >     >           >     The following section of this message contains a
> >     >     file
> >     >     attachment
> >     >          >     prepared for transmission using the Internet MIME
> >     >     message format.
> >     >           >     If you are using Pegasus Mail, or any other MIME-
> >     >     compliant system,
> >     >          >     you should be able to save it or view it from within
> >     >     your mailer.
> >     >          >     If you cannot, please ask your system administrator
> >     >     for assistance.
> >     >          >   
> >     >          >       ---- File information -----------
> >     >          >         File:  config.log.bz2
> >     >          >         Date:  14 Jun 2017, 13:35
> >     >          >         Size:  146548 bytes.
> >     >           >         Type:  Binary
> >     >          >   
> >     >           >   
> >     >          >     The following section of this message contains a
> >     >     file
> >     >     attachment
> >     >           >     prepared for transmission using the Internet MIME
> >     >     message format.
> >     >          >     If you are using Pegasus Mail, or any other MIME-
> >     >     compliant system,
> >     >          >     you should be able to save it or view it from within
> >     >     your mailer.
> >     >          >     If you cannot, please ask your system administrator
> >     >     for assistance.
> >     >          >   
> >     >          >       ---- File information -----------
> >     >          >         File:  ompi_info.bz2
> >     >          >         Date:  14 Jun 2017, 13:35
> >     >           >         Size:  24088 bytes.
> >     >          >         Type:  Binary
> >     >           >   
> >     >          >   
> >     >           >     The following section of this message contains a
> >     >     file
> >     >     attachment
> >     >          >     prepared for transmission using the Internet MIME
> >     >     message format.
> >     >           >     If you are using Pegasus Mail, or any other MIME-
> >     >     compliant system,
> >     >          >     you should be able to save it or view it from within
> >     >     your mailer.
> >     >          >     If you cannot, please ask your system administrator
> >     >     for assistance.
> >     >          >   
> >     >          >       ---- File information -----------
> >     >          >         File:  aborttest02.tgz
> >     >          >         Date:  14 Jun 2017, 13:52
> >     >          >         Size:  4285 bytes.
> >     >           >         Type:  Binary
> >     >          >   
> >     >           >   
> >     >          >    
> >     >     ________________________________________
> >     >     _______
> >     >           >     users mailing list
> >     >          >     ***@lists.open-mpi.org
> >     >           >    
> >     >     https://rfd.newmexicoconsortium.org/mailman/listin
> >     >     fo/users
> >     >
> >     >
> >     >          >   
> >     >          >    
> >     >     ________________________________________
> >     >     _______
> >     >           >     users mailing list
> >     >          >     ***@lists.open-mpi.org
> >     >          >    
> >     >     https://rfd.newmexicoconsortium.org/mailman/listin
> >     >     fo/users
> >     >
> >     >
> >     >          >   
> >     >          >   
> >     >           >   
> >     >          >    
> >     >     ________________________________________
> >     >     _______
> >     >           >     users mailing list
> >     >          >     ***@lists.open-mpi.org
> >     >           >    
> >     >     https://rfd.newmexicoconsortium.org/mailman/listin
> >     >     fo/users
> >     >
> >     >
> >     >          >   
> >     >          >    
> >     >     ________________________________________
> >     >     _______
> >     >           >     users mailing list
> >     >          >     ***@lists.open-mpi.org
> >     >          >    
> >     >     https://rfd.newmexicoconsortium.org/mailman/listin
> >     >     fo/users
> >     >
> >     >
> >     >          >   
> >     >          >   
> >     >           >   
> >     >          >    
> >     >     ________________________________________
> >     >     _______
> >     >           >     users mailing list
> >     >          >     ***@lists.open-mpi.org
> >     >           >    
> >     >     https://rfd.newmexicoconsortium.org/mailman/listin
> >     >     fo/users
> >     >
> >     >
> >     >          >
> >     >    
> >     >           
> >     >          __________________________________________
> >     >     _____
> >     >           users mailing list
> >     >          ***@lists.open-mpi.org
> >     >         
> >     >      https://rfd.newmexicoconsortium.org/mailman/listin
> >     >     fo/users
> >     >
> >     >    
> >     >       
> >     >     _____________________________________________
> >     >     __
> >     >     users mailing list
> >     >     ***@lists.open-mpi.org
> >     >     https://rfd.newmexicoconsortium.org/mailman/listinfo/us
> >     >     ers
> >     >    
> >     >    
> >     >     --
> >     >     Jeff Squyres
> >     >     ***@cisco.com
> >     >    
> >     >     _______________________________________________
> >     >     users mailing list
> >     >     ***@lists.open-mpi.org
> >     >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >     >    
> >     >    
> >     >    
> >     >     _______________________________________________
> >     >     users mailing list
> >     >     ***@lists.open-mpi.org
> >     >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >     >
> >     >     _______________________________________________
> >     >     users mailing list
> >     >     ***@lists.open-mpi.org
> >     >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >     >
> >     >     _______________________________________________
> >     >     users mailing list
> >     >     ***@lists.open-mpi.org
> >     >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >     >    
> >     >          
> >     >    
> >     >    
> >     >     The following section of this message contains a file attachment
> >     >     prepared for transmission using the Internet MIME message format.
> >     >     If you are using Pegasus Mail, or any other MIME-compliant system,
> >     >     you should be able to save it or view it from within your mailer.
> >     >     If you cannot, please ask your system administrator for assistance.
> >     >    
> >     >       ---- File information -----------
> >     >         File:  aborttest10.tgz
> >     >         Date:  19 Jun 2017, 12:42
> >     >         Size:  4740 bytes.
> >     >         Type:  Binary
> >     >     <aborttest10.tgz>_______________________________________________
> >     >     users mailing list
> >     >     ***@lists.open-mpi.org
> >     >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >     >
> >
> >       
> >     The following section of this message contains a file attachment
> >     prepared for transmission using the Internet MIME message format.
> >     If you are using Pegasus Mail, or any other MIME-compliant system,
> >     you should be able to save it or view it from within your mailer.
> >     If you cannot, please ask your system administrator for assistance.
> >    
> >       ---- File information -----------
> >         File:  aborttest11.tgz
> >         Date:  19 Jun 2017, 13:48
> >         Size:  3800 bytes.
> >         Type:  Unknown
> >     <aborttest11.tgz> _______________________________________________
> >     users mailing list
> >     ***@lists.open-mpi.org
> >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
>
>   
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
r***@open-mpi.org
2017-06-19 14:05:37 UTC
Permalink
Did you configure OMPI with --enable-debug? Many of the messages are turned “off” in non-debug builds

> On Jun 19, 2017, at 6:33 AM, Ted Sussman <***@adina.com> wrote:
>
> Hello Gilles,
>
> I retried my example, with the same results as I observed before. The process with rank 1 does not get killed by MPI_ABORT.
>
> I have attached to this E-mail:
>
> config.log.bz2
> ompi_info.bz2 (uses ompi_info -a)
> aborttest09.tgz
>
> This testing is done on a computer running Linux 3.10.0. This is a different computer than the computer that I previously used for testing. You can confirm that I am using Open MPI 2.1.1.
>
> tar xvzf aborttest09.tgz
> cd aborttest09
> ./sh run2.sh
>
> run2.sh contains the command
>
> /opt/openmpi-2.1.1-GNU/bin/mpirun -np 2 -mca btl tcp,self --mca odls_base_verbose 10 ./dum.sh
>
> The output from this run is in aborttest09/run2.sh.out.
>
> The output shows that the the "default" component is selected by odls.
>
> The only messages from odls are: odls: launch spawning child ... (two messages). There are no messages from odls with "kill" and I see no SENDING SIGCONT / SIGKILL messages.
>
> I am not running from within any batch manager.
>
> Sincerely,
>
> Ted Sussman
>
> On 17 Jun 2017 at 16:02, ***@rist.or.jp wrote:
>
> > Ted,
> >
> > i do not observe the same behavior you describe with Open MPI 2.1.1
> >
> > # mpirun -np 2 -mca btl tcp,self --mca odls_base_verbose 5 ./abort.sh
> >
> > abort.sh 31361 launching abort
> > abort.sh 31362 launching abort
> > I am rank 0 with pid 31363
> > I am rank 1 with pid 31364
> > ------------------------------------------------------------------------
> > --
> > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> > with errorcode 1.
> >
> > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> > You may or may not see output from other processes, depending on
> > exactly when Open MPI kills them.
> > ------------------------------------------------------------------------
> > --
> > [linux:31356] [[18199,0],0] odls:kill_local_proc working on WILDCARD
> > [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> > [[18199,1],0]
> > [linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],0]
> > [linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 31361
> > SUCCESS
> > [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> > [[18199,1],1]
> > [linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],1]
> > [linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 31362
> > SUCCESS
> > [linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],0]
> > [linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 31361
> > SUCCESS
> > [linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],1]
> > [linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 31362
> > SUCCESS
> > [linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],0]
> > [linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 31361
> > SUCCESS
> > [linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],1]
> > [linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 31362
> > SUCCESS
> > [linux:31356] [[18199,0],0] odls:kill_local_proc working on WILDCARD
> > [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> > [[18199,1],0]
> > [linux:31356] [[18199,0],0] odls:kill_local_proc child [[18199,1],0] is
> > not alive
> > [linux:31356] [[18199,0],0] odls:kill_local_proc checking child process
> > [[18199,1],1]
> > [linux:31356] [[18199,0],0] odls:kill_local_proc child [[18199,1],1] is
> > not alive
> >
> >
> > Open MPI did kill both shells, and they were indeed killed as evidenced
> > by ps
> >
> > #ps -fu gilles --forest
> > UID PID PPID C STIME TTY TIME CMD
> > gilles 1564 1561 0 15:39 ? 00:00:01 sshd: ***@pts/1
> > gilles 1565 1564 0 15:39 pts/1 00:00:00 \_ -bash
> > gilles 31356 1565 3 15:57 pts/1 00:00:00 \_ /home/gilles/
> > local/ompi-v2.x/bin/mpirun -np 2 -mca btl tcp,self --mca odls_base
> > gilles 31364 1 1 15:57 pts/1 00:00:00 ./abort
> >
> >
> > so trapping SIGTERM in your shell and manually killing the MPI task
> > should work
> > (as Jeff explained, as long as the shell script is fast enough to do
> > that between SIGTERM and SIGKILL)
> >
> >
> > if you observe a different behavior, please double check your Open MPI
> > version and post the outputs of the same commands.
> >
> > btw, are you running from a batch manager ? if yes, which one ?
> >
> > Cheers,
> >
> > Gilles
> >
> > ----- Original Message -----
> > > Ted,
> > >
> > > if you
> > >
> > > mpirun --mca odls_base_verbose 10 ...
> > >
> > > you will see which processes get killed and how
> > >
> > > Best regards,
> > >
> > >
> > > Gilles
> > >
> > > ----- Original Message -----
> > > > Hello Jeff,
> > > >
> > > > Thanks for your comments.
> > > >
> > > > I am not seeing behavior #4, on the two computers that I have tested
> > > on, using Open MPI
> > > > 2.1.1.
> > > >
> > > > I wonder if you can duplicate my results with the files that I have
> > > uploaded.
> > > >
> > > > Regarding what is the "correct" behavior, I am willing to modify my
> > > application to correspond
> > > > to Open MPI's behavior (whatever behavior the Open MPI developers
> > > decide is best) --
> > > > provided that Open MPI does in fact kill off both shells.
> > > >
> > > > So my highest priority now is to find out why Open MPI 2.1.1 does
> > not
> > > kill off both shells on
> > > > my computer.
> > > >
> > > > Sincerely,
> > > >
> > > > Ted Sussman
> > > >
> > > > On 16 Jun 2017 at 16:35, Jeff Squyres (jsquyres) wrote:
> > > >
> > > > > Ted --
> > > > >
> > > > > Sorry for jumping in late. Here's my $0.02...
> > > > >
> > > > > In the runtime, we can do 4 things:
> > > > >
> > > > > 1. Kill just the process that we forked.
> > > > > 2. Kill just the process(es) that call back and identify
> > themselves
> > > as MPI processes (we don't track this right now, but we could add that
> > > functionality).
> > > > > 3. Union of #1 and #2.
> > > > > 4. Kill all processes (to include any intermediate processes that
> > > are not included in #1 and #2).
> > > > >
> > > > > In Open MPI 2.x, #4 is the intended behavior. There may be a bug
> > or
> > > two that needs to get fixed (e.g., in your last mail, I don't see
> > > offhand why it waits until the MPI process finishes sleeping), but we
> > > should be killing the process group, which -- unless any of the
> > > descendant processes have explicitly left the process group -- should
> > > hit the entire process tree.
> > > > >
> > > > > Sidenote: there's actually a way to be a bit more aggressive and
> > do
> > > a better job of ensuring that we kill *all* processes (via creative
> > use
> > > of PR_SET_CHILD_SUBREAPER), but that's basically a future enhancement
> > /
> > > optimization.
> > > > >
> > > > > I think Gilles and Ralph proposed a good point to you: if you want
> > > to be sure to be able to do cleanup after an MPI process terminates (
> > > normally or abnormally), you should trap signals in your intermediate
> > > processes to catch what Open MPI's runtime throws and therefore know
> > > that it is time to cleanup.
> > > > >
> > > > > Hypothetically, this should work in all versions of Open MPI...?
> > > > >
> > > > > I think Ralph made a pull request that adds an MCA param to change
> > > the default behavior from #4 to #1.
> > > > >
> > > > > Note, however, that there's a little time between when Open MPI
> > > sends the SIGTERM and the SIGKILL, so this solution could be racy. If
> > > you find that you're running out of time to cleanup, we might be able
> > to
> > > make the delay between the SIGTERM and SIGKILL be configurable (e.g.,
> > > via MCA param).
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > On Jun 16, 2017, at 10:08 AM, Ted Sussman <***@adina.com
> > >
> > > wrote:
> > > > > >
> > > > > > Hello Gilles and Ralph,
> > > > > >
> > > > > > Thank you for your advice so far. I appreciate the time that
> > you
> > > have spent to educate me about the details of Open MPI.
> > > > > >
> > > > > > But I think that there is something fundamental that I don't
> > > understand. Consider Example 2 run with Open MPI 2.1.1.
> > > > > >
> > > > > > mpirun --> shell for process 0 --> executable for process 0 -->
> > > MPI calls, MPI_Abort
> > > > > > --> shell for process 1 --> executable for process 1 -->
> > > MPI calls
> > > > > >
> > > > > > After the MPI_Abort is called, ps shows that both shells are
> > > running, and that the executable for process 1 is running (in this
> > case,
> > > process 1 is sleeping). And mpirun does not exit until process 1 is
> > > finished sleeping.
> > > > > >
> > > > > > I cannot reconcile this observed behavior with the statement
> > > > > >
> > > > > > > > 2.x: each process is put into its own process group
> > > upon launch. When we issue a
> > > > > > > > "kill", we issue it to the process group. Thus,
> > every
> > > child proc of that child proc will
> > > > > > > > receive it. IIRC, this was the intended behavior.
> > > > > >
> > > > > > I assume that, for my example, there are two process groups.
> > The
> > > process group for process 0 contains the shell for process 0 and the
> > > executable for process 0; and the process group for process 1 contains
> > > the shell for process 1 and the executable for process 1. So what
> > does
> > > MPI_ABORT do? MPI_ABORT does not kill the process group for process 0,
> >
> > > since the shell for process 0 continues. And MPI_ABORT does not kill
> > > the process group for process 1, since both the shell and executable
> > for
> > > process 1 continue.
> > > > > >
> > > > > > If I hit Ctrl-C after MPI_Abort is called, I get the message
> > > > > >
> > > > > > mpirun: abort is already in progress.. hit ctrl-c again to
> > > forcibly terminate
> > > > > >
> > > > > > but I don't need to hit Ctrl-C again because mpirun immediately
> > > exits.
> > > > > >
> > > > > > Can you shed some light on all of this?
> > > > > >
> > > > > > Sincerely,
> > > > > >
> > > > > > Ted Sussman
> > > > > >
> > > > > >
> > > > > > On 15 Jun 2017 at 14:44, ***@open-mpi.org wrote:
> > > > > >
> > > > > > >
> > > > > > > You have to understand that we have no way of knowing who is
> > > making MPI calls - all we see is
> > > > > > > the proc that we started, and we know someone of that rank is
> > > running (but we have no way of
> > > > > > > knowing which of the procs you sub-spawned it is).
> > > > > > >
> > > > > > > So the behavior you are seeking only occurred in some earlier
> > > release by sheer accident. Nor will
> > > > > > > you find it portable as there is no specification directing
> > that
> > > behavior.
> > > > > > >
> > > > > > > The behavior IÂŽve provided is to either deliver the signal to
> > _
> > > all_ child processes (including
> > > > > > > grandchildren etc.), or _only_ the immediate child of the
> > daemon.
> > > It wonÂŽt do what you describe -
> > > > > > > kill the mPI proc underneath the shell, but not the shell
> > itself.
> > > > > > >
> > > > > > > What you can eventually do is use PMIx to ask the runtime to
> > > selectively deliver signals to
> > > > > > > pid/procs for you. We donÂŽt have that capability implemented
> > > just yet, IÂŽm afraid.
> > > > > > >
> > > > > > > Meantime, when I get a chance, I can code an option that will
> > > record the pid of the subproc that
> > > > > > > calls MPI_Init, and then letÂŽs you deliver signals to just
> > that
> > > proc. No promises as to when that will
> > > > > > > be done.
> > > > > > >
> > > > > > >
> > > > > > > On Jun 15, 2017, at 1:37 PM, Ted Sussman <ted.sussman@
> > adina.
> > > com> wrote:
> > > > > > >
> > > > > > > Hello Ralph,
> > > > > > >
> > > > > > > I am just an Open MPI end user, so I will need to wait for
> > > the next official release.
> > > > > > >
> > > > > > > mpirun --> shell for process 0 --> executable for process
> > 0
> > > --> MPI calls
> > > > > > > --> shell for process 1 --> executable for process
> > 1
> > > --> MPI calls
> > > > > > > ...
> > > > > > >
> > > > > > > I guess the question is, should MPI_ABORT kill the
> > > executables or the shells? I naively
> > > > > > > thought, that, since it is the executables that make the
> > MPI
> > > calls, it is the executables that
> > > > > > > should be aborted by the call to MPI_ABORT. Since the
> > > shells don't make MPI calls, the
> > > > > > > shells should not be aborted.
> > > > > > >
> > > > > > > And users might have several layers of shells in between
> > > mpirun and the executable.
> > > > > > >
> > > > > > > So now I will look for the latest version of Open MPI that
> > > has the 1.4.3 behavior.
> > > > > > >
> > > > > > > Sincerely,
> > > > > > >
> > > > > > > Ted Sussman
> > > > > > >
> > > > > > > On 15 Jun 2017 at 12:31, ***@open-mpi.org wrote:
> > > > > > >
> > > > > > > >
> > > > > > > > Yeah, things jittered a little there as we debated the "
> > > right" behavior. Generally, when we
> > > > > > > see that
> > > > > > > > happening it means that a param is required, but somehow
> > > we never reached that point.
> > > > > > > >
> > > > > > > > See if https://github.com/open-mpi/ompi/pull/3704 helps
> > -
> > > if so, I can schedule it for the next
> > > > > > > 2.x
> > > > > > > > release if the RMs agree to take it
> > > > > > > >
> > > > > > > > Ralph
> > > > > > > >
> > > > > > > > On Jun 15, 2017, at 12:20 PM, Ted Sussman <ted.
> > sussman
> > > @adina.com > wrote:
> > > > > > > >
> > > > > > > > Thank you for your comments.
> > > > > > > >
> > > > > > > > Our application relies upon "dum.sh" to clean up
> > after
> > > the process exits, either if the
> > > > > > > process
> > > > > > > > exits normally, or if the process exits abnormally
> > > because of MPI_ABORT. If the process
> > > > > > > > group is killed by MPI_ABORT, this clean up will not
> > > be performed. If exec is used to launch
> > > > > > > > the executable from dum.sh, then dum.sh is
> > terminated
> > > by the exec, so dum.sh cannot
> > > > > > > > perform any clean up.
> > > > > > > >
> > > > > > > > I suppose that other user applications might work
> > > similarly, so it would be good to have an
> > > > > > > > MCA parameter to control the behavior of MPI_ABORT.
> > > > > > > >
> > > > > > > > We could rewrite our shell script that invokes
> > mpirun,
> > > so that the cleanup that is now done
> > > > > > > > by
> > > > > > > > dum.sh is done by the invoking shell script after
> > > mpirun exits. Perhaps this technique is the
> > > > > > > > preferred way to clean up after mpirun is invoked.
> > > > > > > >
> > > > > > > > By the way, I have also tested with Open MPI 1.10.7,
> > > and Open MPI 1.10.7 has different
> > > > > > > > behavior than either Open MPI 1.4.3 or Open MPI 2.1.
> > 1.
> > > In this explanation, it is important to
> > > > > > > > know that the aborttest executable sleeps for 20 sec.
> > > > > > > >
> > > > > > > > When running example 2:
> > > > > > > >
> > > > > > > > 1.4.3: process 1 immediately aborts
> > > > > > > > 1.10.7: process 1 doesn't abort and never stops.
> > > > > > > > 2.1.1 process 1 doesn't abort, but stops after it is
> > > finished sleeping
> > > > > > > >
> > > > > > > > Sincerely,
> > > > > > > >
> > > > > > > > Ted Sussman
> > > > > > > >
> > > > > > > > On 15 Jun 2017 at 9:18, ***@open-mpi.org wrote:
> > > > > > > >
> > > > > > > > Here is how the system is working:
> > > > > > > >
> > > > > > > > Master: each process is put into its own process
> > group
> > > upon launch. When we issue a
> > > > > > > > "kill", however, we only issue it to the individual
> > > process (instead of the process group
> > > > > > > > that is headed by that child process). This is
> > > probably a bug as I donÂŽt believe that is
> > > > > > > > what we intended, but set that aside for now.
> > > > > > > >
> > > > > > > > 2.x: each process is put into its own process group
> > > upon launch. When we issue a
> > > > > > > > "kill", we issue it to the process group. Thus,
> > every
> > > child proc of that child proc will
> > > > > > > > receive it. IIRC, this was the intended behavior.
> > > > > > > >
> > > > > > > > It is rather trivial to make the change (it only
> > > involves 3 lines of code), but IÂŽm not sure
> > > > > > > > of what our intended behavior is supposed to be.
> > Once
> > > we clarify that, it is also trivial
> > > > > > > > to add another MCA param (you can never have too
> > many!)
> > > to allow you to select the
> > > > > > > > other behavior.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Jun 15, 2017, at 5:23 AM, Ted Sussman <ted.
> > sussman@
> > > adina.com > wrote:
> > > > > > > >
> > > > > > > > Hello Gilles,
> > > > > > > >
> > > > > > > > Thank you for your quick answer. I confirm that if
> > > exec is used, both processes
> > > > > > > > immediately
> > > > > > > > abort.
> > > > > > > >
> > > > > > > > Now suppose that the line
> > > > > > > >
> > > > > > > > echo "After aborttest:
> > > > > > > > OMPI_COMM_WORLD_RANK="$OMPI_COMM_WORLD_RANK
> > > > > > > >
> > > > > > > > is added to the end of dum.sh.
> > > > > > > >
> > > > > > > > If Example 2 is run with Open MPI 1.4.3, the output
> > is
> > > > > > > >
> > > > > > > > After aborttest: OMPI_COMM_WORLD_RANK=0
> > > > > > > >
> > > > > > > > which shows that the shell script for the process
> > with
> > > rank 0 continues after the
> > > > > > > > abort,
> > > > > > > > but that the shell script for the process with rank
> > 1
> > > does not continue after the
> > > > > > > > abort.
> > > > > > > >
> > > > > > > > If Example 2 is run with Open MPI 2.1.1, with exec
> > > used to invoke
> > > > > > > > aborttest02.exe, then
> > > > > > > > there is no such output, which shows that both shell
> > > scripts do not continue after
> > > > > > > > the abort.
> > > > > > > >
> > > > > > > > I prefer the Open MPI 1.4.3 behavior because our
> > > original application depends
> > > > > > > > upon the
> > > > > > > > Open MPI 1.4.3 behavior. (Our original application
> > > will also work if both
> > > > > > > > executables are
> > > > > > > > aborted, and if both shell scripts continue after
> > the
> > > abort.)
> > > > > > > >
> > > > > > > > It might be too much to expect, but is there a way
> > to
> > > recover the Open MPI 1.4.3
> > > > > > > > behavior
> > > > > > > > using Open MPI 2.1.1?
> > > > > > > >
> > > > > > > > Sincerely,
> > > > > > > >
> > > > > > > > Ted Sussman
> > > > > > > >
> > > > > > > >
> > > > > > > > On 15 Jun 2017 at 9:50, Gilles Gouaillardet wrote:
> > > > > > > >
> > > > > > > > Ted,
> > > > > > > >
> > > > > > > >
> > > > > > > > fwiw, the 'master' branch has the behavior you
> > expect.
> > > > > > > >
> > > > > > > >
> > > > > > > > meanwhile, you can simple edit your 'dum.sh' script
> > > and replace
> > > > > > > >
> > > > > > > > /home/buildadina/src/aborttest02/aborttest02.exe
> > > > > > > >
> > > > > > > > with
> > > > > > > >
> > > > > > > > exec /home/buildadina/src/aborttest02/aborttest02.
> > exe
> > > > > > > >
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > >
> > > > > > > >
> > > > > > > > Gilles
> > > > > > > >
> > > > > > > >
> > > > > > > > On 6/15/2017 3:01 AM, Ted Sussman wrote:
> > > > > > > > Hello,
> > > > > > > >
> > > > > > > > My question concerns MPI_ABORT, indirect execution
> > of
> > > > > > > > executables by mpirun and Open
> > > > > > > > MPI 2.1.1. When mpirun runs executables directly,
> > MPI
> > > _ABORT
> > > > > > > > works as expected, but
> > > > > > > > when mpirun runs executables indirectly, MPI_ABORT
> > > does not
> > > > > > > > work as expected.
> > > > > > > >
> > > > > > > > If Open MPI 1.4.3 is used instead of Open MPI 2.1.1,
> > > MPI_ABORT
> > > > > > > > works as expected in all
> > > > > > > > cases.
> > > > > > > >
> > > > > > > > The examples given below have been simplified as far
> > > as possible
> > > > > > > > to show the issues.
> > > > > > > >
> > > > > > > > ---
> > > > > > > >
> > > > > > > > Example 1
> > > > > > > >
> > > > > > > > Consider an MPI job run in the following way:
> > > > > > > >
> > > > > > > > mpirun ... -app addmpw1
> > > > > > > >
> > > > > > > > where the appfile addmpw1 lists two executables:
> > > > > > > >
> > > > > > > > -n 1 -host gulftown ... aborttest02.exe
> > > > > > > > -n 1 -host gulftown ... aborttest02.exe
> > > > > > > >
> > > > > > > > The two executables are executed on the local node
> > > gulftown.
> > > > > > > > aborttest02 calls MPI_ABORT
> > > > > > > > for rank 0, then sleeps.
> > > > > > > >
> > > > > > > > The above MPI job runs as expected. Both processes
> > > immediately
> > > > > > > > abort when rank 0 calls
> > > > > > > > MPI_ABORT.
> > > > > > > >
> > > > > > > > ---
> > > > > > > >
> > > > > > > > Example 2
> > > > > > > >
> > > > > > > > Now change the above example as follows:
> > > > > > > >
> > > > > > > > mpirun ... -app addmpw2
> > > > > > > >
> > > > > > > > where the appfile addmpw2 lists shell scripts:
> > > > > > > >
> > > > > > > > -n 1 -host gulftown ... dum.sh
> > > > > > > > -n 1 -host gulftown ... dum.sh
> > > > > > > >
> > > > > > > > dum.sh invokes aborttest02.exe. So aborttest02.exe
> > is
> > > executed
> > > > > > > > indirectly by mpirun.
> > > > > > > >
> > > > > > > > In this case, the MPI job only aborts process 0 when
> > > rank 0 calls
> > > > > > > > MPI_ABORT. Process 1
> > > > > > > > continues to run. This behavior is unexpected.
> > > > > > > >
> > > > > > > > ----
> > > > > > > >
> > > > > > > > I have attached all files to this E-mail. Since
> > there
> > > are absolute
> > > > > > > > pathnames in the files, to
> > > > > > > > reproduce my findings, you will need to update the
> > > pathnames in the
> > > > > > > > appfiles and shell
> > > > > > > > scripts. To run example 1,
> > > > > > > >
> > > > > > > > sh run1.sh
> > > > > > > >
> > > > > > > > and to run example 2,
> > > > > > > >
> > > > > > > > sh run2.sh
> > > > > > > >
> > > > > > > > ---
> > > > > > > >
> > > > > > > > I have tested these examples with Open MPI 1.4.3 and
> > 2.
> > > 0.3. In
> > > > > > > > Open MPI 1.4.3, both
> > > > > > > > examples work as expected. Open MPI 2.0.3 has the
> > > same behavior
> > > > > > > > as Open MPI 2.1.1.
> > > > > > > >
> > > > > > > > ---
> > > > > > > >
> > > > > > > > I would prefer that Open MPI 2.1.1 aborts both
> > > processes, even
> > > > > > > > when the executables are
> > > > > > > > invoked indirectly by mpirun. If there is an MCA
> > > setting that is
> > > > > > > > needed to make Open MPI
> > > > > > > > 2.1.1 abort both processes, please let me know.
> > > > > > > >
> > > > > > > >
> > > > > > > > Sincerely,
> > > > > > > >
> > > > > > > > Theodore Sussman
> > > > > > > >
> > > > > > > >
> > > > > > > > The following section of this message contains a
> > file
> > > attachment
> > > > > > > > prepared for transmission using the Internet MIME
> > > message format.
> > > > > > > > If you are using Pegasus Mail, or any other MIME-
> > > compliant system,
> > > > > > > > you should be able to save it or view it from within
> > > your mailer.
> > > > > > > > If you cannot, please ask your system administrator
> > > for assistance.
> > > > > > > >
> > > > > > > > ---- File information -----------
> > > > > > > > File: config.log.bz2
> > > > > > > > Date: 14 Jun 2017, 13:35
> > > > > > > > Size: 146548 bytes.
> > > > > > > > Type: Binary
> > > > > > > >
> > > > > > > >
> > > > > > > > The following section of this message contains a
> > file
> > > attachment
> > > > > > > > prepared for transmission using the Internet MIME
> > > message format.
> > > > > > > > If you are using Pegasus Mail, or any other MIME-
> > > compliant system,
> > > > > > > > you should be able to save it or view it from within
> > > your mailer.
> > > > > > > > If you cannot, please ask your system administrator
> > > for assistance.
> > > > > > > >
> > > > > > > > ---- File information -----------
> > > > > > > > File: ompi_info.bz2
> > > > > > > > Date: 14 Jun 2017, 13:35
> > > > > > > > Size: 24088 bytes.
> > > > > > > > Type: Binary
> > > > > > > >
> > > > > > > >
> > > > > > > > The following section of this message contains a
> > file
> > > attachment
> > > > > > > > prepared for transmission using the Internet MIME
> > > message format.
> > > > > > > > If you are using Pegasus Mail, or any other MIME-
> > > compliant system,
> > > > > > > > you should be able to save it or view it from within
> > > your mailer.
> > > > > > > > If you cannot, please ask your system administrator
> > > for assistance.
> > > > > > > >
> > > > > > > > ---- File information -----------
> > > > > > > > File: aborttest02.tgz
> > > > > > > > Date: 14 Jun 2017, 13:52
> > > > > > > > Size: 4285 bytes.
> > > > > > > > Type: Binary
> > > > > > > >
> > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > users mailing list
> > > > > > > > ***@lists.open-mpi.org
> > > > > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
> > >
> > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > users mailing list
> > > > > > > > ***@lists.open-mpi.org
> > > > > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
> > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > users mailing list
> > > > > > > > ***@lists.open-mpi.org
> > > > > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
> > >
> > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > users mailing list
> > > > > > > > ***@lists.open-mpi.org
> > > > > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
> > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > users mailing list
> > > > > > > > ***@lists.open-mpi.org
> > > > > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
> > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > users mailing list
> > > > > > > ***@lists.open-mpi.org
> > > > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > > > > > >
> > > > > >
> > > > > >
> > > > > > _______________________________________________
> > > > > > users mailing list
> > > > > > ***@lists.open-mpi.org
> > > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > > > >
> > > > >
> > > > > --
> > > > > Jeff Squyres
> > > > > ***@cisco.com
> > > > >
> > > > > _______________________________________________
> > > > > users mailing list
> > > > > ***@lists.open-mpi.org
> > > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > users mailing list
> > > > ***@lists.open-mpi.org
> > > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > > >
> > > _______________________________________________
> > > users mailing list
> > > ***@lists.open-mpi.org
> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > >
> > _______________________________________________
> > users mailing list
> > ***@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
> The following section of this message contains a file attachment
> prepared for transmission using the Internet MIME message format.
> If you are using Pegasus Mail, or any other MIME-compliant system,
> you should be able to save it or view it from within your mailer.
> If you cannot, please ask your system administrator for assistance.
>
> ---- File information -----------
> File: aborttest09.tgz
> Date: 19 Jun 2017, 9:15
> Size: 6716 bytes.
> Type: Binary
> <aborttest09.tgz>The following section of this message contains a file attachment
> prepared for transmission using the Internet MIME message format.
> If you are using Pegasus Mail, or any other MIME-compliant system,
> you should be able to save it or view it from within your mailer.
> If you cannot, please ask your system administrator for assistance.
>
> ---- File information -----------
> File: config.log.bz2
> Date: 19 Jun 2017, 9:15
> Size: 150733 bytes.
> Type: Binary
> <config.log.bz2>The following section of this message contains a file attachment
> prepared for transmission using the Internet MIME message format.
> If you are using Pegasus Mail, or any other MIME-compliant system,
> you should be able to save it or view it from within your mailer.
> If you cannot, please ask your system administrator for assistance.
>
> ---- File information -----------
> File: ompi_info.bz2
> Date: 19 Jun 2017, 9:15
> Size: 21282 bytes.
> Type: Binary
> <ompi_info.bz2>_______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Loading...