[OMPI users] OpenMPI 2.1.1, --map-to socket, application context files

Discussion:

Ted Sussman

2017-06-29 18:24:06 UTC

Hello all,

Today I have a problem with the --map-to socket feature of Open MPI 2.1.1 when used with
application context files.

In the examples below, I am testing on a 2 socket computer, each socket with 4 cores.

---

Example 1:

.../openmpi-2.1.1/bin/mpirun --report-bindings \
-map-by socket \
-np 2 \
afftest01.exe

returns

...MCW rank 0 bound to socket 0 ... : [B/B/B/B][./././.]
...MCW rank 1 bound to socket 1 ... : [./././.][B/B/B/B]

which is what I would expect.

---

Example 2:

Create appfile as:

-np 1 afftest01.exe
-np 1 afftest01.exe

Then

.../openmpi-2.1.1/bin/mpirun --report-bindings \
-map-by socket \
-app appfile

returns

...MCW rank 0 bound to socket 0 ... : [B/B/B/B][./././.]
...MCW rank 1 bound to socket 0 ... : [B/B/B/B][./././.]

which is not what I expect. I expect the same bindings as in Example 1.

---

Example 3:

Using the same appfile as in Example 2,

.../openmpi-1.4.3/bin/mpirun --report-bindings \
-bysocket --bind-to-core \
-app appfile

returns

... odls:default:fork binding child ... to socket 0 cpus 0002
... odls:default:fork binding child ... to socket 1 cpus 0001

which is what I would expect. Here I use --bind-to-core just to get the bindings printed.

---

The examples show that the --map-by socket feature does not work as expected when
application context files are used. However the older -bysocket feature worked as expected
in OpenMPI 1.4.3 when application context files are used.

If I am using the wrong syntax in Example 2, please let me know.

Sincerely,

Ted Sussman

r***@open-mpi.org

2017-06-30 02:09:04 UTC

Permalink

Itâs a difficult call to make as to which is the correct behavior. In Example 1, you are executing a single app_context that has two procs in it. In Example 2, you are executing two app_contexts, each with a single proc in it.

Now some people say that the two should be treated the same, with the second app_context in Example 2 being mapped starting from the end of the first app_context. In this model, a comm_spawn would also start from the end of the earlier app_context, and thus the new proc would not be on the same node (or socket, in this case) as its parent.

Other people argue for the opposite behavior - that each app_context should start from the first available slot in the allocation. In that model, a comm_spawn would result in the first child occupying the same node (or socket) as its parent, assuming an available slot.

Weâve bounced around a bit on the behavior over the years as different groups voiced their opinions. OMPI 1.4.3 is _very_ old and fell in the prior camp, while 2.1.1 is just released and is in the second camp. I honestly donât recall where the change occurred, or even how consistent we have necessarily been over the years. It isnât something that people raise very often.

Iâve pretty much resolved to leave the default behavior as it currently sits, but plan to add an option to support the alternative behavior as there seems no clear cut consensus in the user community for this behavior. Not sure when Iâll get to it - definitely not for the 2.x series, and maybe not for 3.x since that is about to be released.

> On Jun 29, 2017, at 11:24 AM, Ted Sussman <***@adina.com> wrote:
>
> Hello all,
>
> Today I have a problem with the --map-to socket feature of Open MPI 2.1.1 when used with application context files.
>
> In the examples below, I am testing on a 2 socket computer, each socket with 4 cores.
>
> ---
>
> Example 1:
>
> .../openmpi-2.1.1/bin/mpirun --report-bindings \
> -map-by socket \
> -np 2 \
> afftest01.exe
>
> returns
>
> ...MCW rank 0 bound to socket 0 ... : [B/B/B/B][./././.]
> ...MCW rank 1 bound to socket 1 ... : [./././.][B/B/B/B]
>
> which is what I would expect.
>
> ---
>
> Example 2:
>
> Create appfile as:
>
> -np 1 afftest01.exe
> -np 1 afftest01.exe
>
> Then
>
> .../openmpi-2.1.1/bin/mpirun --report-bindings \
> -map-by socket \
> -app appfile
>
> returns
>
> ...MCW rank 0 bound to socket 0 ... : [B/B/B/B][./././.]
> ...MCW rank 1 bound to socket 0 ... : [B/B/B/B][./././.]
>
> which is not what I expect. I expect the same bindings as in Example 1.
>
> ---
>
> Example 3:
>
> Using the same appfile as in Example 2,
>
> .../openmpi-1.4.3/bin/mpirun --report-bindings \
> -bysocket --bind-to-core \
> -app appfile
>
> returns
>
> ... odls:default:fork binding child ... to socket 0 cpus 0002
> ... odls:default:fork binding child ... to socket 1 cpus 0001
>
> which is what I would expect. Here I use --bind-to-core just to get the bindings printed.
>
> ---
>
> The examples show that the --map-by socket feature does not work as expected when application context files are used. However the older -bysocket feature worked as expected in OpenMPI 1.4.3 when application context files are used.
>
> If I am using the wrong syntax in Example 2, please let me know.
>
> Sincerely,
>
> Ted Sussman
>
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Ted Sussman

2017-06-30 12:03:39 UTC

Permalink

Hello Ralph,

Thank you for your comments.

My understanding, from reading Jeff's blog on V1.5 processor affinity, is that the bindings in
Example 1 balance the load better than the bindings in Example 2.

Therefore I would like to obtain the bindings in Example 1, but using Open MPI 2.1.1, and
using application context files.

How can I do this?

Sincerely,

Ted Sussman

On 29 Jun 2017 at 19:09, ***@open-mpi.org wrote:

>
> It´s a difficult call to make as to which is the correct behavior. In Example 1, you are executing a
> single app_context that has two procs in it. In Example 2, you are executing two app_contexts,
> each with a single proc in it.
>
> Now some people say that the two should be treated the same, with the second app_context in
> Example 2 being mapped starting from the end of the first app_context. In this model, a
> comm_spawn would also start from the end of the earlier app_context, and thus the new proc
> would not be on the same node (or socket, in this case) as its parent.
>
> Other people argue for the opposite behavior - that each app_context should start from the first
> available slot in the allocation. In that model, a comm_spawn would result in the first child
> occupying the same node (or socket) as its parent, assuming an available slot.
>
> We´ve bounced around a bit on the behavior over the years as different groups voiced their
> opinions. OMPI 1.4.3 is _very_ old and fell in the prior camp, while 2.1.1 is just released and is in
> the second camp. I honestly don´t recall where the change occurred, or even how consistent we
> have necessarily been over the years. It isn´t something that people raise very often.
>
> I´ve pretty much resolved to leave the default behavior as it currently sits, but plan to add an option
> to support the alternative behavior as there seems no clear cut consensus in the user community
> for this behavior. Not sure when I´ll get to it - definitely not for the 2.x series, and maybe not for 3.x
> since that is about to be released.
>
> On Jun 29, 2017, at 11:24 AM, Ted Sussman <***@adina.com> wrote:
>
> Hello all,
>
> Today I have a problem with the --map-to socket feature of Open MPI 2.1.1 when used with
> application context files.
>
> In the examples below, I am testing on a 2 socket computer, each socket with 4 cores.
>
> ---
>
> Example 1:
>
> .../openmpi-2.1.1/bin/mpirun --report-bindings \
>             -map-by socket \
>             -np 2 \
>             afftest01.exe
>
> returns
>
> ...MCW rank 0 bound to socket 0 ... : [B/B/B/B][./././.]
> ...MCW rank 1 bound to socket 1 ... : [./././.][B/B/B/B]
>
> which is what I would expect.
>
> ---
>
> Example 2:
>
> Create appfile as:
>
> -np 1 afftest01.exe
> -np 1 afftest01.exe
>
> Then
>
> .../openmpi-2.1.1/bin/mpirun --report-bindings \
>             -map-by socket \
>             -app appfile
>
> returns
>
> ...MCW rank 0 bound to socket 0 ... : [B/B/B/B][./././.]
> ...MCW rank 1 bound to socket 0 ... : [B/B/B/B][./././.]
>
> which is not what I expect. I expect the same bindings as in Example 1.
>
> ---
>
> Example 3:
>
> Using the same appfile as in Example 2,
>
> .../openmpi-1.4.3/bin/mpirun --report-bindings \
>             -bysocket --bind-to-core \
>             -app appfile
>
> returns
>
> ... odls:default:fork binding child ... to socket 0 cpus 0002
> ... odls:default:fork binding child ... to socket 1 cpus 0001
>
> which is what I would expect. Here I use --bind-to-core just to get the bindings printed.
>
> ---
>
> The examples show that the --map-by socket feature does not work as expected when
> application context files are used. However the older -bysocket feature worked as expected
> in OpenMPI 1.4.3 when application context files are used.
>
> If I am using the wrong syntax in Example 2, please let me know.
>
> Sincerely,
>
> Ted Sussman
>
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>

r***@open-mpi.org

2017-06-30 14:41:58 UTC

Permalink

Well, yes and no. Yes, your cpu loads will balance better across nodes (balancing across sockets doesn’t do much for you). However, your overall application performance may be the poorest in that arrangement if your app uses a lot of communication as the layout minimizes the use of shared memory.

Laying out an app requires a little thought about its characteristics. If it is mostly compute with a little communication, then spreading the procs out makes the most sense. If it has a lot of communication, then compressing the procs into the minimum space makes the most sense. This is the most commonly used layout.

I haven’t looked at app context files in ages, but I think you could try this:

-np 1 afftest01.exe; -np 1 afftest01.exe

> On Jun 30, 2017, at 5:03 AM, Ted Sussman <***@adina.com> wrote:
>
> Hello Ralph,
>
> Thank you for your comments.
>
> My understanding, from reading Jeff's blog on V1.5 processor affinity, is that the bindings in
> Example 1 balance the load better than the bindings in Example 2.
>
> Therefore I would like to obtain the bindings in Example 1, but using Open MPI 2.1.1, and
> using application context files.
>
> How can I do this?
>
> Sincerely,
>
> Ted Sussman
>
> On 29 Jun 2017 at 19:09, ***@open-mpi.org wrote:
>
>>
>> It´s a difficult call to make as to which is the correct behavior. In Example 1, you are executing a
>> single app_context that has two procs in it. In Example 2, you are executing two app_contexts,
>> each with a single proc in it.
>>
>> Now some people say that the two should be treated the same, with the second app_context in
>> Example 2 being mapped starting from the end of the first app_context. In this model, a
>> comm_spawn would also start from the end of the earlier app_context, and thus the new proc
>> would not be on the same node (or socket, in this case) as its parent.
>>
>> Other people argue for the opposite behavior - that each app_context should start from the first
>> available slot in the allocation. In that model, a comm_spawn would result in the first child
>> occupying the same node (or socket) as its parent, assuming an available slot.
>>
>> We´ve bounced around a bit on the behavior over the years as different groups voiced their
>> opinions. OMPI 1.4.3 is _very_ old and fell in the prior camp, while 2.1.1 is just released and is in
>> the second camp. I honestly don´t recall where the change occurred, or even how consistent we
>> have necessarily been over the years. It isn´t something that people raise very often.
>>
>> I´ve pretty much resolved to leave the default behavior as it currently sits, but plan to add an option
>> to support the alternative behavior as there seems no clear cut consensus in the user community
>> for this behavior. Not sure when I´ll get to it - definitely not for the 2.x series, and maybe not for 3.x
>> since that is about to be released.
>>
>> On Jun 29, 2017, at 11:24 AM, Ted Sussman <***@adina.com> wrote:
>>
>> Hello all,
>>
>> Today I have a problem with the --map-to socket feature of Open MPI 2.1.1 when used with
>> application context files.
>>
>> In the examples below, I am testing on a 2 socket computer, each socket with 4 cores.
>>
>> ---
>>
>> Example 1:
>>
>> .../openmpi-2.1.1/bin/mpirun --report-bindings \
>> -map-by socket \
>> -np 2 \
>> afftest01.exe
>>
>> returns
>>
>> ...MCW rank 0 bound to socket 0 ... : [B/B/B/B][./././.]
>> ...MCW rank 1 bound to socket 1 ... : [./././.][B/B/B/B]
>>
>> which is what I would expect.
>>
>> ---
>>
>> Example 2:
>>
>> Create appfile as:
>>
>> -np 1 afftest01.exe
>> -np 1 afftest01.exe
>>
>> Then
>>
>> .../openmpi-2.1.1/bin/mpirun --report-bindings \
>> -map-by socket \
>> -app appfile
>>
>> returns
>>
>> ...MCW rank 0 bound to socket 0 ... : [B/B/B/B][./././.]
>> ...MCW rank 1 bound to socket 0 ... : [B/B/B/B][./././.]
>>
>> which is not what I expect. I expect the same bindings as in Example 1.
>>
>> ---
>>
>> Example 3:
>>
>> Using the same appfile as in Example 2,
>>
>> .../openmpi-1.4.3/bin/mpirun --report-bindings \
>> -bysocket --bind-to-core \
>> -app appfile
>>
>> returns
>>
>> ... odls:default:fork binding child ... to socket 0 cpus 0002
>> ... odls:default:fork binding child ... to socket 1 cpus 0001
>>
>> which is what I would expect. Here I use --bind-to-core just to get the bindings printed.
>>
>> ---
>>
>> The examples show that the --map-by socket feature does not work as expected when
>> application context files are used. However the older -bysocket feature worked as expected
>> in OpenMPI 1.4.3 when application context files are used.
>>
>> If I am using the wrong syntax in Example 2, please let me know.
>>
>> Sincerely,
>>
>> Ted Sussman
>>
>>
>> _______________________________________________
>> users mailing list
>> ***@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
>
>
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Ted Sussman

2017-06-30 15:42:13 UTC

Permalink

Hello Ralph,

I need to support several different apps, each app with entirely different MPI communication
needs, and each app either single-threaded or multi-threaded. For example, one app tends
to do very little message passing, and another app does much more message passing.

And some of our end users are very performance-conscious, so we want to give our end
users tools for controlling performance. And of course all of our end users will be running on
different hardware.

So I wanted to do some benchmarking of the affinity options, in order to give some guidelines
to our end users. My understanding is that it is necessary to actually try the different affinity
options, and that it is very difficult, if not impossible, to predict which affinity options, if any,
gives a performance benefit beforehand.

It is quite possible that our apps would work better with

MCW rank 0 bound to socket 0 ... : [B/B/B/B][./././.]
MCW rank 1 bound to socket 0 ... : [B/B/B/B][./././.]

instead of

MCW rank 0 bound to socket 0 ... : [B/B/B/B][./././.]
MCW rank 1 bound to socket 1 ... : [./././.][B/B/B/B]

but again there is no way to know this beforehand. It is nice to have the option to try both,
which we could do in Open MPI 1.4.3.

Our apps all use app context files. App context files are very convenient since we can pass
different options to the executable for each rank, in particular, the pathname of the working
directory that each rank uses. And the app context files are very readable, since everything
is not on one long mpirun command line.

So for us it is important to have all of the affinity parameters work with the app context files.

I tried an app context file of the format

> -np 1 afftest01.exe; -np 1 afftest01.exe

but it didn't work. Only rank 0 was created. Is there a different syntax that will work?

Sincerely,

Ted Sussman

I must say that I am surpi
On 30 Jun 2017 at 7:41, ***@open-mpi.org wrote:

> Well, yes and no. Yes, your cpu loads will balance better across nodes (balancing across sockets doesnŽt do much for you). However, your overall application performance may be the poorest in that arrangement if your app uses a lot of communication as the layout minimizes the use of shared memory.
>
> Laying out an app requires a little thought about its characteristics. If it is mostly compute with a little communication, then spreading the procs out makes the most sense. If it has a lot of communication, then compressing the procs into the minimum space makes the most sense. This is the most commonly used layout.
>
> I havenŽt looked at app context files in ages, but I think you could try this:
>
> -np 1 afftest01.exe; -np 1 afftest01.exe
>
>
> > On Jun 30, 2017, at 5:03 AM, Ted Sussman <***@adina.com> wrote:
> >
> > Hello Ralph,
> >
> > Thank you for your comments.
> >
> > My understanding, from reading Jeff's blog on V1.5 processor affinity, is that the bindings in
> > Example 1 balance the load better than the bindings in Example 2.
> >
> > Therefore I would like to obtain the bindings in Example 1, but using Open MPI 2.1.1, and
> > using application context files.
> >
> > How can I do this?
> >
> > Sincerely,
> >
> > Ted Sussman
> >
> > On 29 Jun 2017 at 19:09, ***@open-mpi.org wrote:
> >
> >>
> >> ItŽs a difficult call to make as to which is the correct behavior. In Example 1, you are executing a
> >> single app_context that has two procs in it. In Example 2, you are executing two app_contexts,
> >> each with a single proc in it.
> >>
> >> Now some people say that the two should be treated the same, with the second app_context in
> >> Example 2 being mapped starting from the end of the first app_context. In this model, a
> >> comm_spawn would also start from the end of the earlier app_context, and thus the new proc
> >> would not be on the same node (or socket, in this case) as its parent.
> >>
> >> Other people argue for the opposite behavior - that each app_context should start from the first
> >> available slot in the allocation. In that model, a comm_spawn would result in the first child
> >> occupying the same node (or socket) as its parent, assuming an available slot.
> >>
> >> WeŽve bounced around a bit on the behavior over the years as different groups voiced their
> >> opinions. OMPI 1.4.3 is _very_ old and fell in the prior camp, while 2.1.1 is just released and is in
> >> the second camp. I honestly donŽt recall where the change occurred, or even how consistent we
> >> have necessarily been over the years. It isnŽt something that people raise very often.
> >>
> >> IŽve pretty much resolved to leave the default behavior as it currently sits, but plan to add an option
> >> to support the alternative behavior as there seems no clear cut consensus in the user community
> >> for this behavior. Not sure when IŽll get to it - definitely not for the 2.x series, and maybe not for 3.x
> >> since that is about to be released.
> >>
> >> On Jun 29, 2017, at 11:24 AM, Ted Sussman <***@adina.com> wrote:
> >>
> >> Hello all,
> >>
> >> Today I have a problem with the --map-to socket feature of Open MPI 2.1.1 when used with
> >> application context files.
> >>
> >> In the examples below, I am testing on a 2 socket computer, each socket with 4 cores.
> >>
> >> ---
> >>
> >> Example 1:
> >>
> >> .../openmpi-2.1.1/bin/mpirun --report-bindings \
> >> -map-by socket \
> >> -np 2 \
> >> afftest01.exe
> >>
> >> returns
> >>
> >> ...MCW rank 0 bound to socket 0 ... : [B/B/B/B][./././.]
> >> ...MCW rank 1 bound to socket 1 ... : [./././.][B/B/B/B]
> >>
> >> which is what I would expect.
> >>
> >> ---
> >>
> >> Example 2:
> >>
> >> Create appfile as:
> >>
> >> -np 1 afftest01.exe
> >> -np 1 afftest01.exe
> >>
> >> Then
> >>
> >> .../openmpi-2.1.1/bin/mpirun --report-bindings \
> >> -map-by socket \
> >> -app appfile
> >>
> >> returns
> >>
> >> ...MCW rank 0 bound to socket 0 ... : [B/B/B/B][./././.]
> >> ...MCW rank 1 bound to socket 0 ... : [B/B/B/B][./././.]
> >>
> >> which is not what I expect. I expect the same bindings as in Example 1.
> >>
> >> ---
> >>
> >> Example 3:
> >>
> >> Using the same appfile as in Example 2,
> >>
> >> .../openmpi-1.4.3/bin/mpirun --report-bindings \
> >> -bysocket --bind-to-core \
> >> -app appfile
> >>
> >> returns
> >>
> >> ... odls:default:fork binding child ... to socket 0 cpus 0002
> >> ... odls:default:fork binding child ... to socket 1 cpus 0001
> >>
> >> which is what I would expect. Here I use --bind-to-core just to get the bindings printed.
> >>
> >> ---
> >>
> >> The examples show that the --map-by socket feature does not work as expected when
> >> application context files are used. However the older -bysocket feature worked as expected
> >> in OpenMPI 1.4.3 when application context files are used.
> >>
> >> If I am using the wrong syntax in Example 2, please let me know.
> >>
> >> Sincerely,
> >>
> >> Ted Sussman
> >>
> >>
> >> _______________________________________________
> >> users mailing list
> >> ***@lists.open-mpi.org
> >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >>
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > ***@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

r***@open-mpi.org

2017-06-30 18:40:44 UTC

Permalink

Well, FWIW, it looks like master (and hence 3.0) behave the way you wanted:

$ mpirun -map-by socket --report-bindings --app ./appfile
[rhc001:48492] MCW rank 0: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../../../..]
[rhc001:48492] MCW rank 1: [../../../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
$

with an appfile of
-n 1 hostname
-n 1 hostname

Interestingly, a "--map-by node" directive still winds up with the procs filling the first node before moving to the second. Not entirely sure whatâs going on there. Ditto when I add the âspanâ qualifier (--map-by socket:span) - all the procs stay on the first node until full, which isnât the expected behavior.

I very much doubt weâd backport the code supporting this stuff to the v2.x series, so perhaps upgrade to 3.0 when it gets released in the very near future?

As I said, this has gone back/forth too many times, so Iâm going to âfreezeâ it at the 3.0 behavior (perhaps exploring why --map-by node and span qualifier arenât doing the expected for multiple app_contexts) and add the option from there.

> On Jun 30, 2017, at 8:42 AM, Ted Sussman <***@adina.com> wrote:
>
> Hello Ralph,
>
> I need to support several different apps, each app with entirely different MPI communication needs, and each app either single-threaded or multi-threaded. For example, one app tends to do very little message passing, and another app does much more message passing.
>
> And some of our end users are very performance-conscious, so we want to give our end users tools for controlling performance. And of course all of our end users will be running on different hardware.
>
> So I wanted to do some benchmarking of the affinity options, in order to give some guidelines to our end users. My understanding is that it is necessary to actually try the different affinity options, and that it is very difficult, if not impossible, to predict which affinity options, if any, gives a performance benefit beforehand.
>
> It is quite possible that our apps would work better with
>
> MCW rank 0 bound to socket 0 ... : [B/B/B/B][./././.]
> MCW rank 1 bound to socket 0 ... : [B/B/B/B][./././.]
>
> instead of
>
> MCW rank 0 bound to socket 0 ... : [B/B/B/B][./././.]
> MCW rank 1 bound to socket 1 ... : [./././.][B/B/B/B]
>
> but again there is no way to know this beforehand. It is nice to have the option to try both, which we could do in Open MPI 1.4.3.
>
> Our apps all use app context files. App context files are very convenient since we can pass different options to the executable for each rank, in particular, the pathname of the working directory that each rank uses. And the app context files are very readable, since everything is not on one long mpirun command line.
>
> So for us it is important to have all of the affinity parameters work with the app context files.
>
> I tried an app context file of the format
>
> > -np 1 afftest01.exe; -np 1 afftest01.exe
>
> but it didn't work. Only rank 0 was created. Is there a different syntax that will work?
>
> Sincerely,
>
> Ted Sussman
>
>
>
>
> I must say that I am surpi
> On 30 Jun 2017 at 7:41, ***@open-mpi.org wrote:
>
> > Well, yes and no. Yes, your cpu loads will balance better across nodes (balancing across sockets doesnât do much for you). However, your overall application performance may be the poorest in that arrangement if your app uses a lot of communication as the layout minimizes the use of shared memory.
> >
> > Laying out an app requires a little thought about its characteristics. If it is mostly compute with a little communication, then spreading the procs out makes the most sense. If it has a lot of communication, then compressing the procs into the minimum space makes the most sense. This is the most commonly used layout.
> >
> > I havenât looked at app context files in ages, but I think you could try this:
> >
> > -np 1 afftest01.exe; -np 1 afftest01.exe
> >
> >
> > > On Jun 30, 2017, at 5:03 AM, Ted Sussman <***@adina.com> wrote:
> > >
> > > Hello Ralph,
> > >
> > > Thank you for your comments.
> > >
> > > My understanding, from reading Jeff's blog on V1.5 processor affinity, is that the bindings in
> > > Example 1 balance the load better than the bindings in Example 2.
> > >
> > > Therefore I would like to obtain the bindings in Example 1, but using Open MPI 2.1.1, and
> > > using application context files.
> > >
> > > How can I do this?
> > >
> > > Sincerely,
> > >
> > > Ted Sussman
> > >
> > > On 29 Jun 2017 at 19:09, ***@open-mpi.org wrote:
> > >
> > >>
> > >> ItÂŽs a difficult call to make as to which is the correct behavior. In Example 1, you are executing a
> > >> single app_context that has two procs in it. In Example 2, you are executing two app_contexts,
> > >> each with a single proc in it.
> > >>
> > >> Now some people say that the two should be treated the same, with the second app_context in
> > >> Example 2 being mapped starting from the end of the first app_context. In this model, a
> > >> comm_spawn would also start from the end of the earlier app_context, and thus the new proc
> > >> would not be on the same node (or socket, in this case) as its parent.
> > >>
> > >> Other people argue for the opposite behavior - that each app_context should start from the first
> > >> available slot in the allocation. In that model, a comm_spawn would result in the first child
> > >> occupying the same node (or socket) as its parent, assuming an available slot.
> > >>
> > >> WeÂŽve bounced around a bit on the behavior over the years as different groups voiced their
> > >> opinions. OMPI 1.4.3 is _very_ old and fell in the prior camp, while 2.1.1 is just released and is in
> > >> the second camp. I honestly donÂŽt recall where the change occurred, or even how consistent we
> > >> have necessarily been over the years. It isnÂŽt something that people raise very often.
> > >>
> > >> IÂŽve pretty much resolved to leave the default behavior as it currently sits, but plan to add an option
> > >> to support the alternative behavior as there seems no clear cut consensus in the user community
> > >> for this behavior. Not sure when IÂŽll get to it - definitely not for the 2.x series, and maybe not for 3.x
> > >> since that is about to be released.
> > >>
> > >> On Jun 29, 2017, at 11:24 AM, Ted Sussman <***@adina.com> wrote:
> > >>
> > >> Hello all,
> > >>
> > >> Today I have a problem with the --map-to socket feature of Open MPI 2.1.1 when used with
> > >> application context files.
> > >>
> > >> In the examples below, I am testing on a 2 socket computer, each socket with 4 cores.
> > >>
> > >> ---
> > >>
> > >> Example 1:
> > >>
> > >> .../openmpi-2.1.1/bin/mpirun --report-bindings \
> > >> -map-by socket \
> > >> -np 2 \
> > >> afftest01.exe
> > >>
> > >> returns
> > >>
> > >> ...MCW rank 0 bound to socket 0 ... : [B/B/B/B][./././.]
> > >> ...MCW rank 1 bound to socket 1 ... : [./././.][B/B/B/B]
> > >>
> > >> which is what I would expect.
> > >>
> > >> ---
> > >>
> > >> Example 2:
> > >>
> > >> Create appfile as:
> > >>
> > >> -np 1 afftest01.exe
> > >> -np 1 afftest01.exe
> > >>
> > >> Then
> > >>
> > >> .../openmpi-2.1.1/bin/mpirun --report-bindings \
> > >> -map-by socket \
> > >> -app appfile
> > >>
> > >> returns
> > >>
> > >> ...MCW rank 0 bound to socket 0 ... : [B/B/B/B][./././.]
> > >> ...MCW rank 1 bound to socket 0 ... : [B/B/B/B][./././.]
> > >>
> > >> which is not what I expect. I expect the same bindings as in Example 1.
> > >>
> > >> ---
> > >>
> > >> Example 3:
> > >>
> > >> Using the same appfile as in Example 2,
> > >>
> > >> .../openmpi-1.4.3/bin/mpirun --report-bindings \
> > >> -bysocket --bind-to-core \
> > >> -app appfile
> > >>
> > >> returns
> > >>
> > >> ... odls:default:fork binding child ... to socket 0 cpus 0002
> > >> ... odls:default:fork binding child ... to socket 1 cpus 0001
> > >>
> > >> which is what I would expect. Here I use --bind-to-core just to get the bindings printed.
> > >>
> > >> ---
> > >>
> > >> The examples show that the --map-by socket feature does not work as expected when
> > >> application context files are used. However the older -bysocket feature worked as expected
> > >> in OpenMPI 1.4.3 when application context files are used.
> > >>
> > >> If I am using the wrong syntax in Example 2, please let me know.
> > >>
> > >> Sincerely,
> > >>
> > >> Ted Sussman
> > >>
> > >>
> > >> _______________________________________________
> > >> users mailing list
> > >> ***@lists.open-mpi.org
> > >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > >>
> > >
> > >
> > >
> > > _______________________________________________
> > > users mailing list
> > > ***@lists.open-mpi.org
> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >
> > _______________________________________________
> > users mailing list
> > ***@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users