migrate unless it was pinned by the OpenMP runtime.
same core.
Post by Heinz-Ado ArnoldsDear rhc,
to make it more clear what I try to achieve, I collected some examples for
several combinations of command line options. Would be great if you find
time to look to these below. The most promise one is example "4".
I'd like to have 4 MPI jobs starting 1 OpenMP job each with 10 threads,
running on 2 nodes, each having 2 sockets, with 10 cores & 10 hwthreads.
Only 10 cores (no hwthreads) should be used on each socket.
4 MPI -> 1 OpenMP with 10 thread (i.e. 4x10 threads)
2 nodes, 2 sockets each, 10 cores & 10 hwthreads each
1. mpirun -np 4 --map-by ppr:2:node --mca plm_rsh_agent "qrsh"
-report-bindings ./myid
pascal-2-05...DE 20
pascal-1-03...DE 20
[pascal-2-05:28817] MCW rank 0 bound to socket 0[core 0[hwt 0-1]],
socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt
0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core
6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket
0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/
BB][../../../../../../../../../..]
[pascal-2-05:28817] MCW rank 1 bound to socket 1[core 10[hwt 0-1]],
socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core
13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]],
socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core
18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..
][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
[pascal-1-03:19256] MCW rank 2 bound to socket 0[core 0[hwt 0-1]],
socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt
0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core
6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket
0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/
BB][../../../../../../../../../..]
[pascal-1-03:19256] MCW rank 3 bound to socket 1[core 10[hwt 0-1]],
socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core
13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]],
socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core
18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..
][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0001(pid
28833), 018, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0002(pid
28833), 014, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0003(pid
28833), 028, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0004(pid
28833), 012, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0005(pid
28833), 030, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0006(pid
28833), 016, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0007(pid
28833), 038, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0008(pid
28833), 034, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0009(pid
28833), 020, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0010(pid
28833), 022, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0001(pid
28834), 007, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0002(pid
28834), 037, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0003(pid
28834), 039, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0004(pid
28834), 035, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0005(pid
28834), 031, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0006(pid
28834), 005, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0007(pid
28834), 027, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0008(pid
28834), 017, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0009(pid
28834), 019, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0010(pid
28834), 029, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0001(pid
19269), 012, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0002(pid
19269), 034, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0003(pid
19269), 008, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0004(pid
19269), 038, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0005(pid
19269), 032, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0006(pid
19269), 036, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0007(pid
19269), 020, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0008(pid
19269), 002, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0009(pid
19269), 004, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0010(pid
19269), 006, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0001(pid
19268), 005, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0002(pid
19268), 029, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0003(pid
19268), 015, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0004(pid
19268), 007, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0005(pid
19268), 031, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0006(pid
19268), 013, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0007(pid
19268), 037, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0008(pid
19268), 039, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0009(pid
19268), 021, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0010(pid
19268), 023, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
I get a distribution to 4 sockets on 2 nodes as expected, but cores and
MPI Instance 0001 of 0004: MP thread #0001 runs on CPU 018, MP thread
#0007 runs on CPU 038,
MP thread #0002 runs on CPU 014, MP thread
#0008 runs on CPU 034
according to "lscpu -a -e" CPUs 18/38 resp. 14/34 are the same physical cores
2. mpirun -np 4 --map-by ppr:2:node --use-hwthread-cpus -bind-to hwthread
--mca plm_rsh_agent "qrsh" -report-bindings ./myid
pascal-1-05...DE 20
pascal-2-05...DE 20
WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.
Node: pascal-1-05
Open MPI uses the "hwloc" library to perform process and memory
binding. This error message means that hwloc has indicated that
processor binding support is not available on this machine.
On OS X, processor and memory binding is not available at all (i.e.,
the OS does not expose this functionality).
On Linux, lack of the functionality can mean that you are on a
platform where processor and memory affinity is not supported in Linux
itself, or that hwloc was built without NUMA and/or processor affinity
support. When building hwloc (which, depending on your Open MPI
installation, may be embedded in Open MPI itself), it is important to
have the libnuma header and library files available. Different linux
distributions package these files under different names; look for
packages with the word "numa" in them. You may also need a developer
version of the package (e.g., with "dev" or "devel" in the name) to
obtain the relevant header files.
If you are getting this message on a non-OS X, non-Linux platform,
then hwloc does not support processor / memory affinity on this
platform. If the OS/platform does actually support processor / memory
https://github.com/open-mpi/hwloc.
This is a warning only; your job will continue, though performance may
be degraded.
[B./../../../../../../../../..][../../../../../../../../../..]
[.B/../../../../../../../../..][../../../../../../../../../..]
[B./../../../../../../../../..][../../../../../../../../../..]
[.B/../../../../../../../../..][../../../../../../../../../..]
MPI Instance 0001 of 0004 is on pascal-1-05, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0001(pid
33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0002(pid
33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0003(pid
33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0004(pid
33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0005(pid
33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0006(pid
33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0007(pid
33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0008(pid
33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0009(pid
33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0010(pid
33193), 000, Cpus_allowed_list: 0
MPI Instance 0002 of 0004 is on pascal-1-05, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0001(pid
33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0002(pid
33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0003(pid
33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0004(pid
33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0005(pid
33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0006(pid
33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0007(pid
33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0008(pid
33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0009(pid
33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0010(pid
33192), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-2-05, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0001(pid
28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0002(pid
28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0003(pid
28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0004(pid
28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0005(pid
28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0006(pid
28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0007(pid
28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0008(pid
28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0009(pid
28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0010(pid
28930), 000, Cpus_allowed_list: 0
MPI Instance 0004 of 0004 is on pascal-2-05, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0001(pid
28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0002(pid
28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0003(pid
28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0004(pid
28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0005(pid
28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0006(pid
28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0007(pid
28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0008(pid
28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0009(pid
28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0010(pid
28929), 020, Cpus_allowed_list: 20
Only 2 CPUs are used and these are the same physical cores.
3. mpirun -np 4 --use-hwthread-cpus -bind-to hwthread --mca plm_rsh_agent
"qrsh" -report-bindings ./myid
pascal-1-03...DE 20
pascal-2-02...DE 20
WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.
Node: pascal-1-03
Open MPI uses the "hwloc" library to perform process and memory
binding. This error message means that hwloc has indicated that
processor binding support is not available on this machine.
On OS X, processor and memory binding is not available at all (i.e.,
the OS does not expose this functionality).
On Linux, lack of the functionality can mean that you are on a
platform where processor and memory affinity is not supported in Linux
itself, or that hwloc was built without NUMA and/or processor affinity
support. When building hwloc (which, depending on your Open MPI
installation, may be embedded in Open MPI itself), it is important to
have the libnuma header and library files available. Different linux
distributions package these files under different names; look for
packages with the word "numa" in them. You may also need a developer
version of the package (e.g., with "dev" or "devel" in the name) to
obtain the relevant header files.
If you are getting this message on a non-OS X, non-Linux platform,
then hwloc does not support processor / memory affinity on this
platform. If the OS/platform does actually support processor / memory
https://github.com/open-mpi/hwloc.
This is a warning only; your job will continue, though performance may
be degraded.
[B./../../../../../../../../..][../../../../../../../../../..]
[../../../../../../../../../..][B./../../../../../../../../..]
[.B/../../../../../../../../..][../../../../../../../../../..]
[../../../../../../../../../..][.B/../../../../../../../../..]
MPI Instance 0001 of 0004 is on pascal-1-03, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0001(pid
19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0002(pid
19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0003(pid
19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0004(pid
19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0005(pid
19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0006(pid
19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0007(pid
19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0008(pid
19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0009(pid
19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0010(pid
19373), 000, Cpus_allowed_list: 0
MPI Instance 0002 of 0004 is on pascal-1-03, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0001(pid
19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0002(pid
19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0003(pid
19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0004(pid
19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0005(pid
19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0006(pid
19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0007(pid
19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0008(pid
19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0009(pid
19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0010(pid
19372), 001, Cpus_allowed_list: 1
MPI Instance 0003 of 0004 is on pascal-1-03, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0001(pid
19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0002(pid
19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0003(pid
19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0004(pid
19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0005(pid
19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0006(pid
19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0007(pid
19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0008(pid
19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0009(pid
19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0010(pid
19370), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-1-03, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0001(pid
19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0002(pid
19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0003(pid
19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0004(pid
19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0005(pid
19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0006(pid
19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0007(pid
19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0008(pid
19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0009(pid
19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0010(pid
19371), 021, Cpus_allowed_list: 21
The jobs are scheduled to one machine only.
4. mpirun -np 4 --map-by ppr:2:node --use-hwthread-cpus --mca
plm_rsh_agent "qrsh" -report-bindings ./myid
pascal-1-00...DE 20
pascal-3-00...DE 20
[pascal-1-00:05867] MCW rank 0 bound to socket 0[core 0[hwt 0-1]],
socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt
0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core
6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket
0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/
BB][../../../../../../../../../..]
[pascal-1-00:05867] MCW rank 1 bound to socket 1[core 10[hwt 0-1]],
socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core
13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]],
socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core
18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..
][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
[pascal-3-00:07501] MCW rank 2 bound to socket 0[core 0[hwt 0-1]],
socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt
0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core
6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket
0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/
BB][../../../../../../../../../..]
[pascal-3-00:07501] MCW rank 3 bound to socket 1[core 10[hwt 0-1]],
socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core
13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]],
socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core
18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..
][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0001(pid
05884), 034, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0002(pid
05884), 038, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0003(pid
05884), 002, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0004(pid
05884), 008, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0005(pid
05884), 036, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0006(pid
05884), 000, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0007(pid
05884), 004, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0008(pid
05884), 006, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0009(pid
05884), 030, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0010(pid
05884), 032, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0001(pid
05883), 031, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0002(pid
05883), 017, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0003(pid
05883), 027, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0004(pid
05883), 039, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0005(pid
05883), 011, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0006(pid
05883), 033, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0007(pid
05883), 015, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0008(pid
05883), 021, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0009(pid
05883), 003, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0010(pid
05883), 025, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0001(pid
07513), 016, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0002(pid
07513), 020, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0003(pid
07513), 022, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0004(pid
07513), 018, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0005(pid
07513), 012, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0006(pid
07513), 004, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0007(pid
07513), 008, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0008(pid
07513), 006, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0009(pid
07513), 030, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0010(pid
07513), 034, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0001(pid
07514), 017, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0002(pid
07514), 025, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0003(pid
07514), 029, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0004(pid
07514), 003, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0005(pid
07514), 033, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0006(pid
07514), 001, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0007(pid
07514), 007, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0008(pid
07514), 039, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0009(pid
07514), 035, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0010(pid
07514), 031, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
This distribution looks very well with this combination of options
"--map-by ppr:2:node --use-hwthread-cpus", with one exception: looking at
"MPI Instance 0002", you'll find that "MP thread #0001" is executed on CPU
031, and "MP thread #0005" is executed on CPU 011. 011/031 are the same
physical core.
All others are real perfect! Is this error due to my fault or might
their be a small remaining binding problem in OpenMPI?
I'd appreciate any hint very much!
Kind regards,
Ado
Iâm not entirely sure I understand your reference to âreal coresâ. When
we bind you to a core, we bind you to all the HTâs that comprise that core.
So, yes, with HT enabled, the binding report will list things by HT, but
youâll always be bound to the full core if you tell us bind-to core
The default binding directive is bind-to socket when more than 2
processes are in the job, and thatâs what you are showing. You can override
that by adding "-bind-to core" to your cmd line if that is what you desire.
If you want to use individual HTs as independent processors, then
â--use-hwthread-cpus -bind-to hwthreadsâ would indeed be the right
combination.
On Apr 10, 2017, at 3:55 AM, Heinz-Ado Arnolds <
Dear OpenMPI users & developers,
I'm trying to distribute my jobs (with SGE) to a machine with a certain
number of nodes, each node having 2 sockets, each socket having 10 cores &
10 hyperthreads. I like to use only the real cores, no hyperthreading.
lscpu -a -e
CPU NODE SOCKET CORE L1d:L1i:L2:L3
0 0 0 0 0:0:0:0
1 1 1 1 1:1:1:1
2 0 0 2 2:2:2:0
3 1 1 3 3:3:3:1
4 0 0 4 4:4:4:0
5 1 1 5 5:5:5:1
6 0 0 6 6:6:6:0
7 1 1 7 7:7:7:1
8 0 0 8 8:8:8:0
9 1 1 9 9:9:9:1
10 0 0 10 10:10:10:0
11 1 1 11 11:11:11:1
12 0 0 12 12:12:12:0
13 1 1 13 13:13:13:1
14 0 0 14 14:14:14:0
15 1 1 15 15:15:15:1
16 0 0 16 16:16:16:0
17 1 1 17 17:17:17:1
18 0 0 18 18:18:18:0
19 1 1 19 19:19:19:1
20 0 0 0 0:0:0:0
21 1 1 1 1:1:1:1
22 0 0 2 2:2:2:0
23 1 1 3 3:3:3:1
24 0 0 4 4:4:4:0
25 1 1 5 5:5:5:1
26 0 0 6 6:6:6:0
27 1 1 7 7:7:7:1
28 0 0 8 8:8:8:0
29 1 1 9 9:9:9:1
30 0 0 10 10:10:10:0
31 1 1 11 11:11:11:1
32 0 0 12 12:12:12:0
33 1 1 13 13:13:13:1
34 0 0 14 14:14:14:0
35 1 1 15 15:15:15:1
36 0 0 16 16:16:16:0
37 1 1 17 17:17:17:1
38 0 0 18 18:18:18:0
39 1 1 19 19:19:19:1
How do I have to choose the options & parameters of mpirun to achieve
this behavior?
mpirun -np 4 --map-by ppr:2:node --mca plm_rsh_agent "qrsh"
-report-bindings ./myid
distributes to
[pascal-1-04:35735] MCW rank 0 bound to socket 0[core 0[hwt 0-1]],
socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt
0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core
6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket
0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/
BB][../../../../../../../../../..]
[pascal-1-04:35735] MCW rank 1 bound to socket 1[core 10[hwt 0-1]],
socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core
13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]],
socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core
18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..
][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
[pascal-1-03:00787] MCW rank 2 bound to socket 0[core 0[hwt 0-1]],
socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt
0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core
6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket
0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/
BB][../../../../../../../../../..]
[pascal-1-03:00787] MCW rank 3 bound to socket 1[core 10[hwt 0-1]],
socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core
13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]],
socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core
18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..
][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
MPI Instance 0001 of 0004 is on pascal-1-04,pascal-1-04.MPA-
Garching.MPG.DE, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0002 of 0004 is on pascal-1-04,pascal-1-04.MPA-
Garching.MPG.DE, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0003 of 0004 is on pascal-1-03,pascal-1-03.MPA-
Garching.MPG.DE, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0004 of 0004 is on pascal-1-03,pascal-1-03.MPA-
Garching.MPG.DE, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
i.e.: 2 nodes: ok, 2 sockets: ok, different set of cores: ok, but uses
all hwthreads
I have tried several combinations of --use-hwthread-cpus, --bind-to
hwthreads, but didn't find the right combination.
Would be great to get any hints?
Thank a lot in advance,
Heinz-Ado Arnolds
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Sehr geehrter Herr
Schöne GrÃŒÃe
Ado
Mit freundlichen GrÃŒÃen
H.-A. Arnolds
--
________________________________________________________________________
Dipl.-Ing. Heinz-Ado Arnolds
Max-Planck-Institut fÃŒr Astrophysik
Karl-Schwarzschild-Strasse 1
D-85748 Garching
Postfach 1317
D-85741 Garching
Phone +49 89 30000-2217
FAX +49 89 30000-3240
email arnolds[at]MPA-Garching.MPG.DE
________________________________________________________________________