Discussion:
[OMPI users] scheduling to real cores, not using hyperthreading (openmpi-2.1.0)
Heinz-Ado Arnolds
2017-04-10 10:55:51 UTC
Permalink
Dear OpenMPI users & developers,

I'm trying to distribute my jobs (with SGE) to a machine with a certain number of nodes, each node having 2 sockets, each socket having 10 cores & 10 hyperthreads. I like to use only the real cores, no hyperthreading.

lscpu -a -e

CPU NODE SOCKET CORE L1d:L1i:L2:L3
0 0 0 0 0:0:0:0
1 1 1 1 1:1:1:1
2 0 0 2 2:2:2:0
3 1 1 3 3:3:3:1
4 0 0 4 4:4:4:0
5 1 1 5 5:5:5:1
6 0 0 6 6:6:6:0
7 1 1 7 7:7:7:1
8 0 0 8 8:8:8:0
9 1 1 9 9:9:9:1
10 0 0 10 10:10:10:0
11 1 1 11 11:11:11:1
12 0 0 12 12:12:12:0
13 1 1 13 13:13:13:1
14 0 0 14 14:14:14:0
15 1 1 15 15:15:15:1
16 0 0 16 16:16:16:0
17 1 1 17 17:17:17:1
18 0 0 18 18:18:18:0
19 1 1 19 19:19:19:1
20 0 0 0 0:0:0:0
21 1 1 1 1:1:1:1
22 0 0 2 2:2:2:0
23 1 1 3 3:3:3:1
24 0 0 4 4:4:4:0
25 1 1 5 5:5:5:1
26 0 0 6 6:6:6:0
27 1 1 7 7:7:7:1
28 0 0 8 8:8:8:0
29 1 1 9 9:9:9:1
30 0 0 10 10:10:10:0
31 1 1 11 11:11:11:1
32 0 0 12 12:12:12:0
33 1 1 13 13:13:13:1
34 0 0 14 14:14:14:0
35 1 1 15 15:15:15:1
36 0 0 16 16:16:16:0
37 1 1 17 17:17:17:1
38 0 0 18 18:18:18:0
39 1 1 19 19:19:19:1

How do I have to choose the options & parameters of mpirun to achieve this behavior?

mpirun -np 4 --map-by ppr:2:node --mca plm_rsh_agent "qrsh" -report-bindings ./myid

distributes to

[pascal-1-04:35735] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-1-04:35735] MCW rank 1 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
[pascal-1-03:00787] MCW rank 2 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-1-03:00787] MCW rank 3 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
MPI Instance 0001 of 0004 is on pascal-1-04,pascal-1-04.MPA-Garching.MPG.DE, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0002 of 0004 is on pascal-1-04,pascal-1-04.MPA-Garching.MPG.DE, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0003 of 0004 is on pascal-1-03,pascal-1-03.MPA-Garching.MPG.DE, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0004 of 0004 is on pascal-1-03,pascal-1-03.MPA-Garching.MPG.DE, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39

i.e.: 2 nodes: ok, 2 sockets: ok, different set of cores: ok, but uses all hwthreads

I have tried several combinations of --use-hwthread-cpus, --bind-to hwthreads, but didn't find the right combination.

Would be great to get any hints?

Thank a lot in advance,

Heinz-Ado Arnolds
r***@open-mpi.org
2017-04-10 23:36:21 UTC
Permalink
I’m not entirely sure I understand your reference to “real cores”. When we bind you to a core, we bind you to all the HT’s that comprise that core. So, yes, with HT enabled, the binding report will list things by HT, but you’ll always be bound to the full core if you tell us bind-to core

The default binding directive is bind-to socket when more than 2 processes are in the job, and that’s what you are showing. You can override that by adding "-bind-to core" to your cmd line if that is what you desire.

If you want to use individual HTs as independent processors, then “--use-hwthread-cpus -bind-to hwthreads” would indeed be the right combination.
Post by Heinz-Ado Arnolds
Dear OpenMPI users & developers,
I'm trying to distribute my jobs (with SGE) to a machine with a certain number of nodes, each node having 2 sockets, each socket having 10 cores & 10 hyperthreads. I like to use only the real cores, no hyperthreading.
lscpu -a -e
CPU NODE SOCKET CORE L1d:L1i:L2:L3
0 0 0 0 0:0:0:0
1 1 1 1 1:1:1:1
2 0 0 2 2:2:2:0
3 1 1 3 3:3:3:1
4 0 0 4 4:4:4:0
5 1 1 5 5:5:5:1
6 0 0 6 6:6:6:0
7 1 1 7 7:7:7:1
8 0 0 8 8:8:8:0
9 1 1 9 9:9:9:1
10 0 0 10 10:10:10:0
11 1 1 11 11:11:11:1
12 0 0 12 12:12:12:0
13 1 1 13 13:13:13:1
14 0 0 14 14:14:14:0
15 1 1 15 15:15:15:1
16 0 0 16 16:16:16:0
17 1 1 17 17:17:17:1
18 0 0 18 18:18:18:0
19 1 1 19 19:19:19:1
20 0 0 0 0:0:0:0
21 1 1 1 1:1:1:1
22 0 0 2 2:2:2:0
23 1 1 3 3:3:3:1
24 0 0 4 4:4:4:0
25 1 1 5 5:5:5:1
26 0 0 6 6:6:6:0
27 1 1 7 7:7:7:1
28 0 0 8 8:8:8:0
29 1 1 9 9:9:9:1
30 0 0 10 10:10:10:0
31 1 1 11 11:11:11:1
32 0 0 12 12:12:12:0
33 1 1 13 13:13:13:1
34 0 0 14 14:14:14:0
35 1 1 15 15:15:15:1
36 0 0 16 16:16:16:0
37 1 1 17 17:17:17:1
38 0 0 18 18:18:18:0
39 1 1 19 19:19:19:1
How do I have to choose the options & parameters of mpirun to achieve this behavior?
mpirun -np 4 --map-by ppr:2:node --mca plm_rsh_agent "qrsh" -report-bindings ./myid
distributes to
[pascal-1-04:35735] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-1-04:35735] MCW rank 1 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
[pascal-1-03:00787] MCW rank 2 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-1-03:00787] MCW rank 3 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
MPI Instance 0001 of 0004 is on pascal-1-04,pascal-1-04.MPA-Garching.MPG.DE, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0002 of 0004 is on pascal-1-04,pascal-1-04.MPA-Garching.MPG.DE, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0003 of 0004 is on pascal-1-03,pascal-1-03.MPA-Garching.MPG.DE, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0004 of 0004 is on pascal-1-03,pascal-1-03.MPA-Garching.MPG.DE, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
i.e.: 2 nodes: ok, 2 sockets: ok, different set of cores: ok, but uses all hwthreads
I have tried several combinations of --use-hwthread-cpus, --bind-to hwthreads, but didn't find the right combination.
Would be great to get any hints?
Thank a lot in advance,
Heinz-Ado Arnolds
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Heinz-Ado Arnolds
2017-04-12 11:26:27 UTC
Permalink
Dear rhc,

to make it more clear what I try to achieve, I collected some examples for several combinations of command line options. Would be great if you find time to look to these below. The most promise one is example "4".

I'd like to have 4 MPI jobs starting 1 OpenMP job each with 10 threads, running on 2 nodes, each having 2 sockets, with 10 cores & 10 hwthreads. Only 10 cores (no hwthreads) should be used on each socket.

4 MPI -> 1 OpenMP with 10 thread (i.e. 4x10 threads)
2 nodes, 2 sockets each, 10 cores & 10 hwthreads each

1. mpirun -np 4 --map-by ppr:2:node --mca plm_rsh_agent "qrsh" -report-bindings ./myid

Machines :
pascal-2-05...DE 20
pascal-1-03...DE 20

[pascal-2-05:28817] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-2-05:28817] MCW rank 1 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
[pascal-1-03:19256] MCW rank 2 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-1-03:19256] MCW rank 3 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
MPI Instance 0001 of 0004 is on pascal-2-05, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0001(pid 28833), 018, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0002(pid 28833), 014, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0003(pid 28833), 028, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0004(pid 28833), 012, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0005(pid 28833), 030, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0006(pid 28833), 016, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0007(pid 28833), 038, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0008(pid 28833), 034, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0009(pid 28833), 020, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0010(pid 28833), 022, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0002 of 0004 is on pascal-2-05, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0001(pid 28834), 007, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0002(pid 28834), 037, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0003(pid 28834), 039, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0004(pid 28834), 035, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0005(pid 28834), 031, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0006(pid 28834), 005, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0007(pid 28834), 027, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0008(pid 28834), 017, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0009(pid 28834), 019, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0010(pid 28834), 029, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0003 of 0004 is on pascal-1-03, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0001(pid 19269), 012, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0002(pid 19269), 034, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0003(pid 19269), 008, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0004(pid 19269), 038, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0005(pid 19269), 032, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0006(pid 19269), 036, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0007(pid 19269), 020, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0008(pid 19269), 002, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0009(pid 19269), 004, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0010(pid 19269), 006, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0004 of 0004 is on pascal-1-03, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0001(pid 19268), 005, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0002(pid 19268), 029, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0003(pid 19268), 015, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0004(pid 19268), 007, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0005(pid 19268), 031, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0006(pid 19268), 013, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0007(pid 19268), 037, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0008(pid 19268), 039, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0009(pid 19268), 021, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0010(pid 19268), 023, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39

I get a distribution to 4 sockets on 2 nodes as expected, but cores and corresponding hwthreads are used simultaneously:
MPI Instance 0001 of 0004: MP thread #0001 runs on CPU 018, MP thread #0007 runs on CPU 038,
MP thread #0002 runs on CPU 014, MP thread #0008 runs on CPU 034
according to "lscpu -a -e" CPUs 18/38 resp. 14/34 are the same physical cores

2. mpirun -np 4 --map-by ppr:2:node --use-hwthread-cpus -bind-to hwthread --mca plm_rsh_agent "qrsh" -report-bindings ./myid

Machines :
pascal-1-05...DE 20
pascal-2-05...DE 20

I get this warning:

WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.

Node: pascal-1-05

Open MPI uses the "hwloc" library to perform process and memory
binding. This error message means that hwloc has indicated that
processor binding support is not available on this machine.

On OS X, processor and memory binding is not available at all (i.e.,
the OS does not expose this functionality).

On Linux, lack of the functionality can mean that you are on a
platform where processor and memory affinity is not supported in Linux
itself, or that hwloc was built without NUMA and/or processor affinity
support. When building hwloc (which, depending on your Open MPI
installation, may be embedded in Open MPI itself), it is important to
have the libnuma header and library files available. Different linux
distributions package these files under different names; look for
packages with the word "numa" in them. You may also need a developer
version of the package (e.g., with "dev" or "devel" in the name) to
obtain the relevant header files.

If you are getting this message on a non-OS X, non-Linux platform,
then hwloc does not support processor / memory affinity on this
platform. If the OS/platform does actually support processor / memory
affinity, then you should contact the hwloc maintainers:
https://github.com/open-mpi/hwloc.

This is a warning only; your job will continue, though performance may
be degraded.

and these results:

[pascal-1-05:33175] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B./../../../../../../../../..][../../../../../../../../../..]
[pascal-1-05:33175] MCW rank 1 bound to socket 0[core 0[hwt 1]]: [.B/../../../../../../../../..][../../../../../../../../../..]
[pascal-2-05:28916] MCW rank 2 bound to socket 0[core 0[hwt 0]]: [B./../../../../../../../../..][../../../../../../../../../..]
[pascal-2-05:28916] MCW rank 3 bound to socket 0[core 0[hwt 1]]: [.B/../../../../../../../../..][../../../../../../../../../..]
MPI Instance 0001 of 0004 is on pascal-1-05, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0001(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0002(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0003(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0004(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0005(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0006(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0007(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0008(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0009(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0010(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0002 of 0004 is on pascal-1-05, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0001(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0002(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0003(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0004(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0005(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0006(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0007(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0008(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0009(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0010(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-2-05, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0001(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0002(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0003(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0004(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0005(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0006(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0007(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0008(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0009(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0010(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0004 of 0004 is on pascal-2-05, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0001(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0002(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0003(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0004(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0005(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0006(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0007(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0008(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0009(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0010(pid 28929), 020, Cpus_allowed_list: 20

Only 2 CPUs are used and these are the same physical cores.

3. mpirun -np 4 --use-hwthread-cpus -bind-to hwthread --mca plm_rsh_agent "qrsh" -report-bindings ./myid

Machines :
pascal-1-03...DE 20
pascal-2-02...DE 20

I get a warning again:

WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.

Node: pascal-1-03

Open MPI uses the "hwloc" library to perform process and memory
binding. This error message means that hwloc has indicated that
processor binding support is not available on this machine.

On OS X, processor and memory binding is not available at all (i.e.,
the OS does not expose this functionality).

On Linux, lack of the functionality can mean that you are on a
platform where processor and memory affinity is not supported in Linux
itself, or that hwloc was built without NUMA and/or processor affinity
support. When building hwloc (which, depending on your Open MPI
installation, may be embedded in Open MPI itself), it is important to
have the libnuma header and library files available. Different linux
distributions package these files under different names; look for
packages with the word "numa" in them. You may also need a developer
version of the package (e.g., with "dev" or "devel" in the name) to
obtain the relevant header files.

If you are getting this message on a non-OS X, non-Linux platform,
then hwloc does not support processor / memory affinity on this
platform. If the OS/platform does actually support processor / memory
affinity, then you should contact the hwloc maintainers:
https://github.com/open-mpi/hwloc.

This is a warning only; your job will continue, though performance may
be degraded.

and these results:

[pascal-1-03:19345] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B./../../../../../../../../..][../../../../../../../../../..]
[pascal-1-03:19345] MCW rank 1 bound to socket 1[core 10[hwt 0]]: [../../../../../../../../../..][B./../../../../../../../../..]
[pascal-1-03:19345] MCW rank 2 bound to socket 0[core 0[hwt 1]]: [.B/../../../../../../../../..][../../../../../../../../../..]
[pascal-1-03:19345] MCW rank 3 bound to socket 1[core 10[hwt 1]]: [../../../../../../../../../..][.B/../../../../../../../../..]
MPI Instance 0001 of 0004 is on pascal-1-03, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0001(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0002(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0003(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0004(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0005(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0006(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0007(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0008(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0009(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0010(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0002 of 0004 is on pascal-1-03, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0001(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0002(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0003(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0004(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0005(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0006(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0007(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0008(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0009(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0010(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0003 of 0004 is on pascal-1-03, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0001(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0002(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0003(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0004(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0005(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0006(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0007(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0008(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0009(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0010(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-1-03, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0001(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0002(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0003(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0004(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0005(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0006(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0007(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0008(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0009(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0010(pid 19371), 021, Cpus_allowed_list: 21

The jobs are scheduled to one machine only.

4. mpirun -np 4 --map-by ppr:2:node --use-hwthread-cpus --mca plm_rsh_agent "qrsh" -report-bindings ./myid

Machines :
pascal-1-00...DE 20
pascal-3-00...DE 20

[pascal-1-00:05867] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-1-00:05867] MCW rank 1 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
[pascal-3-00:07501] MCW rank 2 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-3-00:07501] MCW rank 3 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
MPI Instance 0001 of 0004 is on pascal-1-00, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0001(pid 05884), 034, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0002(pid 05884), 038, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0003(pid 05884), 002, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0004(pid 05884), 008, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0005(pid 05884), 036, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0006(pid 05884), 000, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0007(pid 05884), 004, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0008(pid 05884), 006, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0009(pid 05884), 030, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0010(pid 05884), 032, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0002 of 0004 is on pascal-1-00, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0001(pid 05883), 031, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0002(pid 05883), 017, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0003(pid 05883), 027, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0004(pid 05883), 039, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0005(pid 05883), 011, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0006(pid 05883), 033, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0007(pid 05883), 015, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0008(pid 05883), 021, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0009(pid 05883), 003, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0010(pid 05883), 025, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0003 of 0004 is on pascal-3-00, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0001(pid 07513), 016, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0002(pid 07513), 020, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0003(pid 07513), 022, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0004(pid 07513), 018, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0005(pid 07513), 012, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0006(pid 07513), 004, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0007(pid 07513), 008, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0008(pid 07513), 006, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0009(pid 07513), 030, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0010(pid 07513), 034, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0004 of 0004 is on pascal-3-00, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0001(pid 07514), 017, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0002(pid 07514), 025, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0003(pid 07514), 029, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0004(pid 07514), 003, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0005(pid 07514), 033, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0006(pid 07514), 001, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0007(pid 07514), 007, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0008(pid 07514), 039, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0009(pid 07514), 035, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0010(pid 07514), 031, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39

This distribution looks very well with this combination of options "--map-by ppr:2:node --use-hwthread-cpus", with one exception: looking at "MPI Instance 0002", you'll find that "MP thread #0001" is executed on CPU 031, and "MP thread #0005" is executed on CPU 011. 011/031 are the same physical core.
All others are real perfect! Is this error due to my fault or might their be a small remaining binding problem in OpenMPI?

I'd appreciate any hint very much!

Kind regards,

Ado
I’m not entirely sure I understand your reference to “real cores”. When we bind you to a core, we bind you to all the HT’s that comprise that core. So, yes, with HT enabled, the binding report will list things by HT, but you’ll always be bound to the full core if you tell us bind-to core
The default binding directive is bind-to socket when more than 2 processes are in the job, and that’s what you are showing. You can override that by adding "-bind-to core" to your cmd line if that is what you desire.
If you want to use individual HTs as independent processors, then “--use-hwthread-cpus -bind-to hwthreads” would indeed be the right combination.
Post by Heinz-Ado Arnolds
Dear OpenMPI users & developers,
I'm trying to distribute my jobs (with SGE) to a machine with a certain number of nodes, each node having 2 sockets, each socket having 10 cores & 10 hyperthreads. I like to use only the real cores, no hyperthreading.
lscpu -a -e
CPU NODE SOCKET CORE L1d:L1i:L2:L3
0 0 0 0 0:0:0:0
1 1 1 1 1:1:1:1
2 0 0 2 2:2:2:0
3 1 1 3 3:3:3:1
4 0 0 4 4:4:4:0
5 1 1 5 5:5:5:1
6 0 0 6 6:6:6:0
7 1 1 7 7:7:7:1
8 0 0 8 8:8:8:0
9 1 1 9 9:9:9:1
10 0 0 10 10:10:10:0
11 1 1 11 11:11:11:1
12 0 0 12 12:12:12:0
13 1 1 13 13:13:13:1
14 0 0 14 14:14:14:0
15 1 1 15 15:15:15:1
16 0 0 16 16:16:16:0
17 1 1 17 17:17:17:1
18 0 0 18 18:18:18:0
19 1 1 19 19:19:19:1
20 0 0 0 0:0:0:0
21 1 1 1 1:1:1:1
22 0 0 2 2:2:2:0
23 1 1 3 3:3:3:1
24 0 0 4 4:4:4:0
25 1 1 5 5:5:5:1
26 0 0 6 6:6:6:0
27 1 1 7 7:7:7:1
28 0 0 8 8:8:8:0
29 1 1 9 9:9:9:1
30 0 0 10 10:10:10:0
31 1 1 11 11:11:11:1
32 0 0 12 12:12:12:0
33 1 1 13 13:13:13:1
34 0 0 14 14:14:14:0
35 1 1 15 15:15:15:1
36 0 0 16 16:16:16:0
37 1 1 17 17:17:17:1
38 0 0 18 18:18:18:0
39 1 1 19 19:19:19:1
How do I have to choose the options & parameters of mpirun to achieve this behavior?
mpirun -np 4 --map-by ppr:2:node --mca plm_rsh_agent "qrsh" -report-bindings ./myid
distributes to
[pascal-1-04:35735] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-1-04:35735] MCW rank 1 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
[pascal-1-03:00787] MCW rank 2 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-1-03:00787] MCW rank 3 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
MPI Instance 0001 of 0004 is on pascal-1-04,pascal-1-04.MPA-Garching.MPG.DE, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0002 of 0004 is on pascal-1-04,pascal-1-04.MPA-Garching.MPG.DE, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0003 of 0004 is on pascal-1-03,pascal-1-03.MPA-Garching.MPG.DE, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0004 of 0004 is on pascal-1-03,pascal-1-03.MPA-Garching.MPG.DE, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
i.e.: 2 nodes: ok, 2 sockets: ok, different set of cores: ok, but uses all hwthreads
I have tried several combinations of --use-hwthread-cpus, --bind-to hwthreads, but didn't find the right combination.
Would be great to get any hints?
Thank a lot in advance,
Heinz-Ado Arnolds
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Sehr geehrter Herr

Schöne GrÌße

Ado

Mit freundlichen GrÌßen

H.-A. Arnolds
--
________________________________________________________________________

Dipl.-Ing. Heinz-Ado Arnolds

Max-Planck-Institut fÃŒr Astrophysik
Karl-Schwarzschild-Strasse 1
D-85748 Garching

Postfach 1317
D-85741 Garching

Phone +49 89 30000-2217
FAX +49 89 30000-3240
email arnolds[at]MPA-Garching.MPG.DE
________________________________________________________________________
r***@open-mpi.org
2017-04-12 13:16:38 UTC
Permalink
Open MPI isn’t doing anything wrong - it is doing exactly what it should, and exactly what you are asking it to do. The problem you are having is that OpenMP isn’t placing the threads exactly where you would like inside the process-level “envelope” that Open MPI has bound the entire process to.

All Open MPI does is to bind you to a range of cores. OpenMP has full control over the binding of its threads within that envelope. So I suspect the problem you are encountering is with your calls to OpenMP - you aren’t quite specifying the correct thread layout pattern there.

FWIW: we have a working group looking at better ways to coordinate OpenMP and MPI operations, especially these binding issues. So hopefully this will get easier over time.

HTH
Ralph
Post by Heinz-Ado Arnolds
Dear rhc,
to make it more clear what I try to achieve, I collected some examples for several combinations of command line options. Would be great if you find time to look to these below. The most promise one is example "4".
I'd like to have 4 MPI jobs starting 1 OpenMP job each with 10 threads, running on 2 nodes, each having 2 sockets, with 10 cores & 10 hwthreads. Only 10 cores (no hwthreads) should be used on each socket.
4 MPI -> 1 OpenMP with 10 thread (i.e. 4x10 threads)
2 nodes, 2 sockets each, 10 cores & 10 hwthreads each
1. mpirun -np 4 --map-by ppr:2:node --mca plm_rsh_agent "qrsh" -report-bindings ./myid
pascal-2-05...DE 20
pascal-1-03...DE 20
[pascal-2-05:28817] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-2-05:28817] MCW rank 1 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
[pascal-1-03:19256] MCW rank 2 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-1-03:19256] MCW rank 3 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
MPI Instance 0001 of 0004 is on pascal-2-05, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0001(pid 28833), 018, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0002(pid 28833), 014, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0003(pid 28833), 028, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0004(pid 28833), 012, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0005(pid 28833), 030, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0006(pid 28833), 016, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0007(pid 28833), 038, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0008(pid 28833), 034, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0009(pid 28833), 020, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0010(pid 28833), 022, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0002 of 0004 is on pascal-2-05, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0001(pid 28834), 007, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0002(pid 28834), 037, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0003(pid 28834), 039, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0004(pid 28834), 035, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0005(pid 28834), 031, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0006(pid 28834), 005, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0007(pid 28834), 027, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0008(pid 28834), 017, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0009(pid 28834), 019, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0010(pid 28834), 029, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0003 of 0004 is on pascal-1-03, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0001(pid 19269), 012, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0002(pid 19269), 034, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0003(pid 19269), 008, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0004(pid 19269), 038, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0005(pid 19269), 032, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0006(pid 19269), 036, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0007(pid 19269), 020, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0008(pid 19269), 002, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0009(pid 19269), 004, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0010(pid 19269), 006, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0004 of 0004 is on pascal-1-03, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0001(pid 19268), 005, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0002(pid 19268), 029, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0003(pid 19268), 015, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0004(pid 19268), 007, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0005(pid 19268), 031, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0006(pid 19268), 013, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0007(pid 19268), 037, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0008(pid 19268), 039, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0009(pid 19268), 021, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0010(pid 19268), 023, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0001 of 0004: MP thread #0001 runs on CPU 018, MP thread #0007 runs on CPU 038,
MP thread #0002 runs on CPU 014, MP thread #0008 runs on CPU 034
according to "lscpu -a -e" CPUs 18/38 resp. 14/34 are the same physical cores
2. mpirun -np 4 --map-by ppr:2:node --use-hwthread-cpus -bind-to hwthread --mca plm_rsh_agent "qrsh" -report-bindings ./myid
pascal-1-05...DE 20
pascal-2-05...DE 20
WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.
Node: pascal-1-05
Open MPI uses the "hwloc" library to perform process and memory
binding. This error message means that hwloc has indicated that
processor binding support is not available on this machine.
On OS X, processor and memory binding is not available at all (i.e.,
the OS does not expose this functionality).
On Linux, lack of the functionality can mean that you are on a
platform where processor and memory affinity is not supported in Linux
itself, or that hwloc was built without NUMA and/or processor affinity
support. When building hwloc (which, depending on your Open MPI
installation, may be embedded in Open MPI itself), it is important to
have the libnuma header and library files available. Different linux
distributions package these files under different names; look for
packages with the word "numa" in them. You may also need a developer
version of the package (e.g., with "dev" or "devel" in the name) to
obtain the relevant header files.
If you are getting this message on a non-OS X, non-Linux platform,
then hwloc does not support processor / memory affinity on this
platform. If the OS/platform does actually support processor / memory
https://github.com/open-mpi/hwloc.
This is a warning only; your job will continue, though performance may
be degraded.
[pascal-1-05:33175] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B./../../../../../../../../..][../../../../../../../../../..]
[pascal-1-05:33175] MCW rank 1 bound to socket 0[core 0[hwt 1]]: [.B/../../../../../../../../..][../../../../../../../../../..]
[pascal-2-05:28916] MCW rank 2 bound to socket 0[core 0[hwt 0]]: [B./../../../../../../../../..][../../../../../../../../../..]
[pascal-2-05:28916] MCW rank 3 bound to socket 0[core 0[hwt 1]]: [.B/../../../../../../../../..][../../../../../../../../../..]
MPI Instance 0001 of 0004 is on pascal-1-05, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0001(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0002(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0003(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0004(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0005(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0006(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0007(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0008(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0009(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0010(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0002 of 0004 is on pascal-1-05, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0001(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0002(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0003(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0004(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0005(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0006(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0007(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0008(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0009(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0010(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-2-05, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0001(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0002(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0003(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0004(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0005(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0006(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0007(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0008(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0009(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0010(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0004 of 0004 is on pascal-2-05, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0001(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0002(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0003(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0004(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0005(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0006(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0007(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0008(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0009(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0010(pid 28929), 020, Cpus_allowed_list: 20
Only 2 CPUs are used and these are the same physical cores.
3. mpirun -np 4 --use-hwthread-cpus -bind-to hwthread --mca plm_rsh_agent "qrsh" -report-bindings ./myid
pascal-1-03...DE 20
pascal-2-02...DE 20
WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.
Node: pascal-1-03
Open MPI uses the "hwloc" library to perform process and memory
binding. This error message means that hwloc has indicated that
processor binding support is not available on this machine.
On OS X, processor and memory binding is not available at all (i.e.,
the OS does not expose this functionality).
On Linux, lack of the functionality can mean that you are on a
platform where processor and memory affinity is not supported in Linux
itself, or that hwloc was built without NUMA and/or processor affinity
support. When building hwloc (which, depending on your Open MPI
installation, may be embedded in Open MPI itself), it is important to
have the libnuma header and library files available. Different linux
distributions package these files under different names; look for
packages with the word "numa" in them. You may also need a developer
version of the package (e.g., with "dev" or "devel" in the name) to
obtain the relevant header files.
If you are getting this message on a non-OS X, non-Linux platform,
then hwloc does not support processor / memory affinity on this
platform. If the OS/platform does actually support processor / memory
https://github.com/open-mpi/hwloc.
This is a warning only; your job will continue, though performance may
be degraded.
[pascal-1-03:19345] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B./../../../../../../../../..][../../../../../../../../../..]
[pascal-1-03:19345] MCW rank 1 bound to socket 1[core 10[hwt 0]]: [../../../../../../../../../..][B./../../../../../../../../..]
[pascal-1-03:19345] MCW rank 2 bound to socket 0[core 0[hwt 1]]: [.B/../../../../../../../../..][../../../../../../../../../..]
[pascal-1-03:19345] MCW rank 3 bound to socket 1[core 10[hwt 1]]: [../../../../../../../../../..][.B/../../../../../../../../..]
MPI Instance 0001 of 0004 is on pascal-1-03, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0001(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0002(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0003(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0004(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0005(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0006(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0007(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0008(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0009(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0010(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0002 of 0004 is on pascal-1-03, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0001(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0002(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0003(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0004(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0005(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0006(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0007(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0008(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0009(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0010(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0003 of 0004 is on pascal-1-03, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0001(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0002(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0003(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0004(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0005(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0006(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0007(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0008(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0009(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0010(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-1-03, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0001(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0002(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0003(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0004(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0005(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0006(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0007(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0008(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0009(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0010(pid 19371), 021, Cpus_allowed_list: 21
The jobs are scheduled to one machine only.
4. mpirun -np 4 --map-by ppr:2:node --use-hwthread-cpus --mca plm_rsh_agent "qrsh" -report-bindings ./myid
pascal-1-00...DE 20
pascal-3-00...DE 20
[pascal-1-00:05867] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-1-00:05867] MCW rank 1 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
[pascal-3-00:07501] MCW rank 2 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-3-00:07501] MCW rank 3 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
MPI Instance 0001 of 0004 is on pascal-1-00, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0001(pid 05884), 034, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0002(pid 05884), 038, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0003(pid 05884), 002, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0004(pid 05884), 008, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0005(pid 05884), 036, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0006(pid 05884), 000, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0007(pid 05884), 004, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0008(pid 05884), 006, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0009(pid 05884), 030, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0010(pid 05884), 032, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0002 of 0004 is on pascal-1-00, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0001(pid 05883), 031, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0002(pid 05883), 017, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0003(pid 05883), 027, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0004(pid 05883), 039, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0005(pid 05883), 011, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0006(pid 05883), 033, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0007(pid 05883), 015, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0008(pid 05883), 021, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0009(pid 05883), 003, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0010(pid 05883), 025, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0003 of 0004 is on pascal-3-00, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0001(pid 07513), 016, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0002(pid 07513), 020, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0003(pid 07513), 022, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0004(pid 07513), 018, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0005(pid 07513), 012, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0006(pid 07513), 004, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0007(pid 07513), 008, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0008(pid 07513), 006, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0009(pid 07513), 030, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0010(pid 07513), 034, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0004 of 0004 is on pascal-3-00, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0001(pid 07514), 017, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0002(pid 07514), 025, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0003(pid 07514), 029, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0004(pid 07514), 003, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0005(pid 07514), 033, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0006(pid 07514), 001, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0007(pid 07514), 007, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0008(pid 07514), 039, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0009(pid 07514), 035, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0010(pid 07514), 031, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
This distribution looks very well with this combination of options "--map-by ppr:2:node --use-hwthread-cpus", with one exception: looking at "MPI Instance 0002", you'll find that "MP thread #0001" is executed on CPU 031, and "MP thread #0005" is executed on CPU 011. 011/031 are the same physical core.
All others are real perfect! Is this error due to my fault or might their be a small remaining binding problem in OpenMPI?
I'd appreciate any hint very much!
Kind regards,
Ado
Post by r***@open-mpi.org
I’m not entirely sure I understand your reference to “real cores”. When we bind you to a core, we bind you to all the HT’s that comprise that core. So, yes, with HT enabled, the binding report will list things by HT, but you’ll always be bound to the full core if you tell us bind-to core
The default binding directive is bind-to socket when more than 2 processes are in the job, and that’s what you are showing. You can override that by adding "-bind-to core" to your cmd line if that is what you desire.
If you want to use individual HTs as independent processors, then “--use-hwthread-cpus -bind-to hwthreads” would indeed be the right combination.
Post by Heinz-Ado Arnolds
Dear OpenMPI users & developers,
I'm trying to distribute my jobs (with SGE) to a machine with a certain number of nodes, each node having 2 sockets, each socket having 10 cores & 10 hyperthreads. I like to use only the real cores, no hyperthreading.
lscpu -a -e
CPU NODE SOCKET CORE L1d:L1i:L2:L3
0 0 0 0 0:0:0:0
1 1 1 1 1:1:1:1
2 0 0 2 2:2:2:0
3 1 1 3 3:3:3:1
4 0 0 4 4:4:4:0
5 1 1 5 5:5:5:1
6 0 0 6 6:6:6:0
7 1 1 7 7:7:7:1
8 0 0 8 8:8:8:0
9 1 1 9 9:9:9:1
10 0 0 10 10:10:10:0
11 1 1 11 11:11:11:1
12 0 0 12 12:12:12:0
13 1 1 13 13:13:13:1
14 0 0 14 14:14:14:0
15 1 1 15 15:15:15:1
16 0 0 16 16:16:16:0
17 1 1 17 17:17:17:1
18 0 0 18 18:18:18:0
19 1 1 19 19:19:19:1
20 0 0 0 0:0:0:0
21 1 1 1 1:1:1:1
22 0 0 2 2:2:2:0
23 1 1 3 3:3:3:1
24 0 0 4 4:4:4:0
25 1 1 5 5:5:5:1
26 0 0 6 6:6:6:0
27 1 1 7 7:7:7:1
28 0 0 8 8:8:8:0
29 1 1 9 9:9:9:1
30 0 0 10 10:10:10:0
31 1 1 11 11:11:11:1
32 0 0 12 12:12:12:0
33 1 1 13 13:13:13:1
34 0 0 14 14:14:14:0
35 1 1 15 15:15:15:1
36 0 0 16 16:16:16:0
37 1 1 17 17:17:17:1
38 0 0 18 18:18:18:0
39 1 1 19 19:19:19:1
How do I have to choose the options & parameters of mpirun to achieve this behavior?
mpirun -np 4 --map-by ppr:2:node --mca plm_rsh_agent "qrsh" -report-bindings ./myid
distributes to
[pascal-1-04:35735] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-1-04:35735] MCW rank 1 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
[pascal-1-03:00787] MCW rank 2 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-1-03:00787] MCW rank 3 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
MPI Instance 0001 of 0004 is on pascal-1-04,pascal-1-04.MPA-Garching.MPG.DE, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0002 of 0004 is on pascal-1-04,pascal-1-04.MPA-Garching.MPG.DE, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0003 of 0004 is on pascal-1-03,pascal-1-03.MPA-Garching.MPG.DE, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0004 of 0004 is on pascal-1-03,pascal-1-03.MPA-Garching.MPG.DE, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
i.e.: 2 nodes: ok, 2 sockets: ok, different set of cores: ok, but uses all hwthreads
I have tried several combinations of --use-hwthread-cpus, --bind-to hwthreads, but didn't find the right combination.
Would be great to get any hints?
Thank a lot in advance,
Heinz-Ado Arnolds
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Sehr geehrter Herr
Schöne Grüße
Ado
Mit freundlichen Grüßen
H.-A. Arnolds
--
________________________________________________________________________
Dipl.-Ing. Heinz-Ado Arnolds
Max-Planck-Institut für Astrophysik
Karl-Schwarzschild-Strasse 1
D-85748 Garching
Postfach 1317
D-85741 Garching
Phone +49 89 30000-2217
FAX +49 89 30000-3240
email arnolds[at]MPA-Garching.MPG.DE
________________________________________________________________________
Gilles Gouaillardet
2017-04-12 13:40:20 UTC
Permalink
That should be a two steps tango
- Open MPI bind a MPI task to a socket
- the OpenMP runtime bind OpenMP threads to cores (or hyper threads) inside
the socket assigned by Open MPI

which compiler are you using ?
do you set some environment variables to direct OpenMP to bind threads ?

Also, how do you measure the hyperthread a given OpenMP thread is on ?
is it the hyperthread used at a given time ? If yes, then the thread might
migrate unless it was pinned by the OpenMP runtime.

If you are not sure, please post the source of your program so we can have
a look

Last but not least, as long as OpenMP threads are pinned to distinct cores,
you should not worry about them migrating between hyperthreads from the
same core.

Cheers,

Gilles
Post by Heinz-Ado Arnolds
Dear rhc,
to make it more clear what I try to achieve, I collected some examples for
several combinations of command line options. Would be great if you find
time to look to these below. The most promise one is example "4".
I'd like to have 4 MPI jobs starting 1 OpenMP job each with 10 threads,
running on 2 nodes, each having 2 sockets, with 10 cores & 10 hwthreads.
Only 10 cores (no hwthreads) should be used on each socket.
4 MPI -> 1 OpenMP with 10 thread (i.e. 4x10 threads)
2 nodes, 2 sockets each, 10 cores & 10 hwthreads each
1. mpirun -np 4 --map-by ppr:2:node --mca plm_rsh_agent "qrsh"
-report-bindings ./myid
pascal-2-05...DE 20
pascal-1-03...DE 20
[pascal-2-05:28817] MCW rank 0 bound to socket 0[core 0[hwt 0-1]],
socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt
0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core
6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket
0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/
BB][../../../../../../../../../..]
[pascal-2-05:28817] MCW rank 1 bound to socket 1[core 10[hwt 0-1]],
socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core
13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]],
socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core
18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..
][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
[pascal-1-03:19256] MCW rank 2 bound to socket 0[core 0[hwt 0-1]],
socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt
0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core
6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket
0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/
BB][../../../../../../../../../..]
[pascal-1-03:19256] MCW rank 3 bound to socket 1[core 10[hwt 0-1]],
socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core
13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]],
socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core
18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..
][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0001(pid
28833), 018, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0002(pid
28833), 014, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0003(pid
28833), 028, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0004(pid
28833), 012, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0005(pid
28833), 030, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0006(pid
28833), 016, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0007(pid
28833), 038, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0008(pid
28833), 034, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0009(pid
28833), 020, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0010(pid
28833), 022, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0001(pid
28834), 007, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0002(pid
28834), 037, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0003(pid
28834), 039, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0004(pid
28834), 035, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0005(pid
28834), 031, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0006(pid
28834), 005, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0007(pid
28834), 027, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0008(pid
28834), 017, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0009(pid
28834), 019, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0010(pid
28834), 029, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0001(pid
19269), 012, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0002(pid
19269), 034, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0003(pid
19269), 008, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0004(pid
19269), 038, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0005(pid
19269), 032, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0006(pid
19269), 036, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0007(pid
19269), 020, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0008(pid
19269), 002, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0009(pid
19269), 004, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0010(pid
19269), 006, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0001(pid
19268), 005, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0002(pid
19268), 029, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0003(pid
19268), 015, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0004(pid
19268), 007, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0005(pid
19268), 031, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0006(pid
19268), 013, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0007(pid
19268), 037, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0008(pid
19268), 039, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0009(pid
19268), 021, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0010(pid
19268), 023, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
I get a distribution to 4 sockets on 2 nodes as expected, but cores and
MPI Instance 0001 of 0004: MP thread #0001 runs on CPU 018, MP thread
#0007 runs on CPU 038,
MP thread #0002 runs on CPU 014, MP thread
#0008 runs on CPU 034
according to "lscpu -a -e" CPUs 18/38 resp. 14/34 are the same physical cores
2. mpirun -np 4 --map-by ppr:2:node --use-hwthread-cpus -bind-to hwthread
--mca plm_rsh_agent "qrsh" -report-bindings ./myid
pascal-1-05...DE 20
pascal-2-05...DE 20
WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.
Node: pascal-1-05
Open MPI uses the "hwloc" library to perform process and memory
binding. This error message means that hwloc has indicated that
processor binding support is not available on this machine.
On OS X, processor and memory binding is not available at all (i.e.,
the OS does not expose this functionality).
On Linux, lack of the functionality can mean that you are on a
platform where processor and memory affinity is not supported in Linux
itself, or that hwloc was built without NUMA and/or processor affinity
support. When building hwloc (which, depending on your Open MPI
installation, may be embedded in Open MPI itself), it is important to
have the libnuma header and library files available. Different linux
distributions package these files under different names; look for
packages with the word "numa" in them. You may also need a developer
version of the package (e.g., with "dev" or "devel" in the name) to
obtain the relevant header files.
If you are getting this message on a non-OS X, non-Linux platform,
then hwloc does not support processor / memory affinity on this
platform. If the OS/platform does actually support processor / memory
https://github.com/open-mpi/hwloc.
This is a warning only; your job will continue, though performance may
be degraded.
[B./../../../../../../../../..][../../../../../../../../../..]
[.B/../../../../../../../../..][../../../../../../../../../..]
[B./../../../../../../../../..][../../../../../../../../../..]
[.B/../../../../../../../../..][../../../../../../../../../..]
MPI Instance 0001 of 0004 is on pascal-1-05, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0001(pid
33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0002(pid
33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0003(pid
33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0004(pid
33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0005(pid
33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0006(pid
33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0007(pid
33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0008(pid
33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0009(pid
33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0010(pid
33193), 000, Cpus_allowed_list: 0
MPI Instance 0002 of 0004 is on pascal-1-05, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0001(pid
33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0002(pid
33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0003(pid
33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0004(pid
33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0005(pid
33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0006(pid
33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0007(pid
33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0008(pid
33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0009(pid
33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0010(pid
33192), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-2-05, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0001(pid
28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0002(pid
28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0003(pid
28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0004(pid
28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0005(pid
28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0006(pid
28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0007(pid
28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0008(pid
28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0009(pid
28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0010(pid
28930), 000, Cpus_allowed_list: 0
MPI Instance 0004 of 0004 is on pascal-2-05, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0001(pid
28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0002(pid
28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0003(pid
28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0004(pid
28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0005(pid
28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0006(pid
28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0007(pid
28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0008(pid
28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0009(pid
28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0010(pid
28929), 020, Cpus_allowed_list: 20
Only 2 CPUs are used and these are the same physical cores.
3. mpirun -np 4 --use-hwthread-cpus -bind-to hwthread --mca plm_rsh_agent
"qrsh" -report-bindings ./myid
pascal-1-03...DE 20
pascal-2-02...DE 20
WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.
Node: pascal-1-03
Open MPI uses the "hwloc" library to perform process and memory
binding. This error message means that hwloc has indicated that
processor binding support is not available on this machine.
On OS X, processor and memory binding is not available at all (i.e.,
the OS does not expose this functionality).
On Linux, lack of the functionality can mean that you are on a
platform where processor and memory affinity is not supported in Linux
itself, or that hwloc was built without NUMA and/or processor affinity
support. When building hwloc (which, depending on your Open MPI
installation, may be embedded in Open MPI itself), it is important to
have the libnuma header and library files available. Different linux
distributions package these files under different names; look for
packages with the word "numa" in them. You may also need a developer
version of the package (e.g., with "dev" or "devel" in the name) to
obtain the relevant header files.
If you are getting this message on a non-OS X, non-Linux platform,
then hwloc does not support processor / memory affinity on this
platform. If the OS/platform does actually support processor / memory
https://github.com/open-mpi/hwloc.
This is a warning only; your job will continue, though performance may
be degraded.
[B./../../../../../../../../..][../../../../../../../../../..]
[../../../../../../../../../..][B./../../../../../../../../..]
[.B/../../../../../../../../..][../../../../../../../../../..]
[../../../../../../../../../..][.B/../../../../../../../../..]
MPI Instance 0001 of 0004 is on pascal-1-03, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0001(pid
19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0002(pid
19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0003(pid
19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0004(pid
19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0005(pid
19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0006(pid
19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0007(pid
19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0008(pid
19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0009(pid
19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0010(pid
19373), 000, Cpus_allowed_list: 0
MPI Instance 0002 of 0004 is on pascal-1-03, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0001(pid
19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0002(pid
19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0003(pid
19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0004(pid
19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0005(pid
19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0006(pid
19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0007(pid
19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0008(pid
19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0009(pid
19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0010(pid
19372), 001, Cpus_allowed_list: 1
MPI Instance 0003 of 0004 is on pascal-1-03, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0001(pid
19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0002(pid
19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0003(pid
19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0004(pid
19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0005(pid
19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0006(pid
19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0007(pid
19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0008(pid
19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0009(pid
19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0010(pid
19370), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-1-03, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0001(pid
19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0002(pid
19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0003(pid
19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0004(pid
19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0005(pid
19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0006(pid
19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0007(pid
19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0008(pid
19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0009(pid
19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0010(pid
19371), 021, Cpus_allowed_list: 21
The jobs are scheduled to one machine only.
4. mpirun -np 4 --map-by ppr:2:node --use-hwthread-cpus --mca
plm_rsh_agent "qrsh" -report-bindings ./myid
pascal-1-00...DE 20
pascal-3-00...DE 20
[pascal-1-00:05867] MCW rank 0 bound to socket 0[core 0[hwt 0-1]],
socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt
0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core
6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket
0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/
BB][../../../../../../../../../..]
[pascal-1-00:05867] MCW rank 1 bound to socket 1[core 10[hwt 0-1]],
socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core
13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]],
socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core
18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..
][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
[pascal-3-00:07501] MCW rank 2 bound to socket 0[core 0[hwt 0-1]],
socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt
0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core
6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket
0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/
BB][../../../../../../../../../..]
[pascal-3-00:07501] MCW rank 3 bound to socket 1[core 10[hwt 0-1]],
socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core
13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]],
socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core
18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..
][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0001(pid
05884), 034, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0002(pid
05884), 038, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0003(pid
05884), 002, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0004(pid
05884), 008, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0005(pid
05884), 036, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0006(pid
05884), 000, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0007(pid
05884), 004, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0008(pid
05884), 006, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0009(pid
05884), 030, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0010(pid
05884), 032, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0001(pid
05883), 031, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0002(pid
05883), 017, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0003(pid
05883), 027, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0004(pid
05883), 039, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0005(pid
05883), 011, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0006(pid
05883), 033, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0007(pid
05883), 015, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0008(pid
05883), 021, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0009(pid
05883), 003, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0010(pid
05883), 025, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0001(pid
07513), 016, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0002(pid
07513), 020, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0003(pid
07513), 022, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0004(pid
07513), 018, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0005(pid
07513), 012, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0006(pid
07513), 004, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0007(pid
07513), 008, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0008(pid
07513), 006, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0009(pid
07513), 030, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0010(pid
07513), 034, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0001(pid
07514), 017, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0002(pid
07514), 025, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0003(pid
07514), 029, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0004(pid
07514), 003, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0005(pid
07514), 033, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0006(pid
07514), 001, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0007(pid
07514), 007, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0008(pid
07514), 039, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0009(pid
07514), 035, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0010(pid
07514), 031, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
This distribution looks very well with this combination of options
"--map-by ppr:2:node --use-hwthread-cpus", with one exception: looking at
"MPI Instance 0002", you'll find that "MP thread #0001" is executed on CPU
031, and "MP thread #0005" is executed on CPU 011. 011/031 are the same
physical core.
All others are real perfect! Is this error due to my fault or might
their be a small remaining binding problem in OpenMPI?
I'd appreciate any hint very much!
Kind regards,
Ado
I’m not entirely sure I understand your reference to “real cores”. When
we bind you to a core, we bind you to all the HT’s that comprise that core.
So, yes, with HT enabled, the binding report will list things by HT, but
you’ll always be bound to the full core if you tell us bind-to core
The default binding directive is bind-to socket when more than 2
processes are in the job, and that’s what you are showing. You can override
that by adding "-bind-to core" to your cmd line if that is what you desire.
If you want to use individual HTs as independent processors, then
“--use-hwthread-cpus -bind-to hwthreads” would indeed be the right
combination.
On Apr 10, 2017, at 3:55 AM, Heinz-Ado Arnolds <
Dear OpenMPI users & developers,
I'm trying to distribute my jobs (with SGE) to a machine with a certain
number of nodes, each node having 2 sockets, each socket having 10 cores &
10 hyperthreads. I like to use only the real cores, no hyperthreading.
lscpu -a -e
CPU NODE SOCKET CORE L1d:L1i:L2:L3
0 0 0 0 0:0:0:0
1 1 1 1 1:1:1:1
2 0 0 2 2:2:2:0
3 1 1 3 3:3:3:1
4 0 0 4 4:4:4:0
5 1 1 5 5:5:5:1
6 0 0 6 6:6:6:0
7 1 1 7 7:7:7:1
8 0 0 8 8:8:8:0
9 1 1 9 9:9:9:1
10 0 0 10 10:10:10:0
11 1 1 11 11:11:11:1
12 0 0 12 12:12:12:0
13 1 1 13 13:13:13:1
14 0 0 14 14:14:14:0
15 1 1 15 15:15:15:1
16 0 0 16 16:16:16:0
17 1 1 17 17:17:17:1
18 0 0 18 18:18:18:0
19 1 1 19 19:19:19:1
20 0 0 0 0:0:0:0
21 1 1 1 1:1:1:1
22 0 0 2 2:2:2:0
23 1 1 3 3:3:3:1
24 0 0 4 4:4:4:0
25 1 1 5 5:5:5:1
26 0 0 6 6:6:6:0
27 1 1 7 7:7:7:1
28 0 0 8 8:8:8:0
29 1 1 9 9:9:9:1
30 0 0 10 10:10:10:0
31 1 1 11 11:11:11:1
32 0 0 12 12:12:12:0
33 1 1 13 13:13:13:1
34 0 0 14 14:14:14:0
35 1 1 15 15:15:15:1
36 0 0 16 16:16:16:0
37 1 1 17 17:17:17:1
38 0 0 18 18:18:18:0
39 1 1 19 19:19:19:1
How do I have to choose the options & parameters of mpirun to achieve
this behavior?
mpirun -np 4 --map-by ppr:2:node --mca plm_rsh_agent "qrsh"
-report-bindings ./myid
distributes to
[pascal-1-04:35735] MCW rank 0 bound to socket 0[core 0[hwt 0-1]],
socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt
0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core
6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket
0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/
BB][../../../../../../../../../..]
[pascal-1-04:35735] MCW rank 1 bound to socket 1[core 10[hwt 0-1]],
socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core
13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]],
socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core
18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..
][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
[pascal-1-03:00787] MCW rank 2 bound to socket 0[core 0[hwt 0-1]],
socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt
0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core
6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket
0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/
BB][../../../../../../../../../..]
[pascal-1-03:00787] MCW rank 3 bound to socket 1[core 10[hwt 0-1]],
socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core
13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]],
socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core
18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..
][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
MPI Instance 0001 of 0004 is on pascal-1-04,pascal-1-04.MPA-
Garching.MPG.DE, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0002 of 0004 is on pascal-1-04,pascal-1-04.MPA-
Garching.MPG.DE, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
MPI Instance 0003 of 0004 is on pascal-1-03,pascal-1-03.MPA-
Garching.MPG.DE, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,
22,24,26,28,30,32,34,36,38
MPI Instance 0004 of 0004 is on pascal-1-03,pascal-1-03.MPA-
Garching.MPG.DE, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,
23,25,27,29,31,33,35,37,39
i.e.: 2 nodes: ok, 2 sockets: ok, different set of cores: ok, but uses
all hwthreads
I have tried several combinations of --use-hwthread-cpus, --bind-to
hwthreads, but didn't find the right combination.
Would be great to get any hints?
Thank a lot in advance,
Heinz-Ado Arnolds
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Sehr geehrter Herr
Schöne GrÌße
Ado
Mit freundlichen GrÌßen
H.-A. Arnolds
--
________________________________________________________________________
Dipl.-Ing. Heinz-Ado Arnolds
Max-Planck-Institut fÃŒr Astrophysik
Karl-Schwarzschild-Strasse 1
D-85748 Garching
Postfach 1317
D-85741 Garching
Phone +49 89 30000-2217
FAX +49 89 30000-3240
email arnolds[at]MPA-Garching.MPG.DE
________________________________________________________________________
Heinz-Ado Arnolds
2017-04-12 15:23:52 UTC
Permalink
Dear Gilles,

thanks for your answer.

- compiler: gcc-6.3.0
- OpenMP environment vars: OMP_PROC_BIND=true, GOMP_CPU_AFFINITY not set
- hyperthread a given OpenMP thread is on: it's printed in the output below as a 3-digit number after the first ",", read by sched_getcpu() in the OpenMP test code
- the migration between cores/hyperthreads should be prevented by OMP_PROC_BIND=true
- I didn't find a migration, but the similar use of one core/hyperthread by two OpenMP threads in example "4"/"MPI Instance 0002": 011/031 are both on core #11.

Are there any hints how to cleanly transfer the OpenMPI binding to the OpenMP tasks?

Thanks and kind regards,

Ado
Post by Gilles Gouaillardet
That should be a two steps tango
- Open MPI bind a MPI task to a socket
- the OpenMP runtime bind OpenMP threads to cores (or hyper threads) inside the socket assigned by Open MPI
which compiler are you using ?
do you set some environment variables to direct OpenMP to bind threads ?
Also, how do you measure the hyperthread a given OpenMP thread is on ?
is it the hyperthread used at a given time ? If yes, then the thread might migrate unless it was pinned by the OpenMP runtime.
If you are not sure, please post the source of your program so we can have a look
Last but not least, as long as OpenMP threads are pinned to distinct cores, you should not worry about them migrating between hyperthreads from the same core.
Cheers,
Gilles
Dear rhc,
to make it more clear what I try to achieve, I collected some examples for several combinations of command line options. Would be great if you find time to look to these below. The most promise one is example "4".
I'd like to have 4 MPI jobs starting 1 OpenMP job each with 10 threads, running on 2 nodes, each having 2 sockets, with 10 cores & 10 hwthreads. Only 10 cores (no hwthreads) should be used on each socket.
4 MPI -> 1 OpenMP with 10 thread (i.e. 4x10 threads)
2 nodes, 2 sockets each, 10 cores & 10 hwthreads each
1. mpirun -np 4 --map-by ppr:2:node --mca plm_rsh_agent "qrsh" -report-bindings ./myid
pascal-2-05...DE 20
pascal-1-03...DE 20
[pascal-2-05:28817] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-2-05:28817] MCW rank 1 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
[pascal-1-03:19256] MCW rank 2 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-1-03:19256] MCW rank 3 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
MPI Instance 0001 of 0004 is on pascal-2-05, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0001(pid 28833), 018, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0002(pid 28833), 014, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0003(pid 28833), 028, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0004(pid 28833), 012, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0005(pid 28833), 030, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0006(pid 28833), 016, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0007(pid 28833), 038, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0008(pid 28833), 034, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0009(pid 28833), 020, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0010(pid 28833), 022, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0002 of 0004 is on pascal-2-05, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0001(pid 28834), 007, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0002(pid 28834), 037, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0003(pid 28834), 039, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0004(pid 28834), 035, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0005(pid 28834), 031, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0006(pid 28834), 005, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0007(pid 28834), 027, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0008(pid 28834), 017, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0009(pid 28834), 019, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0010(pid 28834), 029, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0003 of 0004 is on pascal-1-03, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0001(pid 19269), 012, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0002(pid 19269), 034, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0003(pid 19269), 008, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0004(pid 19269), 038, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0005(pid 19269), 032, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0006(pid 19269), 036, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0007(pid 19269), 020, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0008(pid 19269), 002, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0009(pid 19269), 004, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0010(pid 19269), 006, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0004 of 0004 is on pascal-1-03, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0001(pid 19268), 005, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0002(pid 19268), 029, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0003(pid 19268), 015, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0004(pid 19268), 007, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0005(pid 19268), 031, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0006(pid 19268), 013, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0007(pid 19268), 037, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0008(pid 19268), 039, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0009(pid 19268), 021, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0010(pid 19268), 023, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0001 of 0004: MP thread #0001 runs on CPU 018, MP thread #0007 runs on CPU 038,
MP thread #0002 runs on CPU 014, MP thread #0008 runs on CPU 034
according to "lscpu -a -e" CPUs 18/38 resp. 14/34 are the same physical cores
2. mpirun -np 4 --map-by ppr:2:node --use-hwthread-cpus -bind-to hwthread --mca plm_rsh_agent "qrsh" -report-bindings ./myid
pascal-1-05...DE 20
pascal-2-05...DE 20
WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.
Node: pascal-1-05
Open MPI uses the "hwloc" library to perform process and memory
binding. This error message means that hwloc has indicated that
processor binding support is not available on this machine.
On OS X, processor and memory binding is not available at all (i.e.,
the OS does not expose this functionality).
On Linux, lack of the functionality can mean that you are on a
platform where processor and memory affinity is not supported in Linux
itself, or that hwloc was built without NUMA and/or processor affinity
support. When building hwloc (which, depending on your Open MPI
installation, may be embedded in Open MPI itself), it is important to
have the libnuma header and library files available. Different linux
distributions package these files under different names; look for
packages with the word "numa" in them. You may also need a developer
version of the package (e.g., with "dev" or "devel" in the name) to
obtain the relevant header files.
If you are getting this message on a non-OS X, non-Linux platform,
then hwloc does not support processor / memory affinity on this
platform. If the OS/platform does actually support processor / memory
https://github.com/open-mpi/hwloc <https://github.com/open-mpi/hwloc>.
This is a warning only; your job will continue, though performance may
be degraded.
[pascal-1-05:33175] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B./../../../../../../../../..][../../../../../../../../../..]
[pascal-1-05:33175] MCW rank 1 bound to socket 0[core 0[hwt 1]]: [.B/../../../../../../../../..][../../../../../../../../../..]
[pascal-2-05:28916] MCW rank 2 bound to socket 0[core 0[hwt 0]]: [B./../../../../../../../../..][../../../../../../../../../..]
[pascal-2-05:28916] MCW rank 3 bound to socket 0[core 0[hwt 1]]: [.B/../../../../../../../../..][../../../../../../../../../..]
MPI Instance 0001 of 0004 is on pascal-1-05, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0001(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0002(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0003(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0004(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0005(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0006(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0007(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0008(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0009(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0010(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0002 of 0004 is on pascal-1-05, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0001(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0002(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0003(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0004(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0005(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0006(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0007(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0008(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0009(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0010(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-2-05, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0001(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0002(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0003(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0004(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0005(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0006(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0007(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0008(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0009(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0010(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0004 of 0004 is on pascal-2-05, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0001(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0002(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0003(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0004(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0005(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0006(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0007(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0008(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0009(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0010(pid 28929), 020, Cpus_allowed_list: 20
Only 2 CPUs are used and these are the same physical cores.
3. mpirun -np 4 --use-hwthread-cpus -bind-to hwthread --mca plm_rsh_agent "qrsh" -report-bindings ./myid
pascal-1-03...DE 20
pascal-2-02...DE 20
WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.
Node: pascal-1-03
Open MPI uses the "hwloc" library to perform process and memory
binding. This error message means that hwloc has indicated that
processor binding support is not available on this machine.
On OS X, processor and memory binding is not available at all (i.e.,
the OS does not expose this functionality).
On Linux, lack of the functionality can mean that you are on a
platform where processor and memory affinity is not supported in Linux
itself, or that hwloc was built without NUMA and/or processor affinity
support. When building hwloc (which, depending on your Open MPI
installation, may be embedded in Open MPI itself), it is important to
have the libnuma header and library files available. Different linux
distributions package these files under different names; look for
packages with the word "numa" in them. You may also need a developer
version of the package (e.g., with "dev" or "devel" in the name) to
obtain the relevant header files.
If you are getting this message on a non-OS X, non-Linux platform,
then hwloc does not support processor / memory affinity on this
platform. If the OS/platform does actually support processor / memory
https://github.com/open-mpi/hwloc <https://github.com/open-mpi/hwloc>.
This is a warning only; your job will continue, though performance may
be degraded.
[pascal-1-03:19345] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B./../../../../../../../../..][../../../../../../../../../..]
[pascal-1-03:19345] MCW rank 1 bound to socket 1[core 10[hwt 0]]: [../../../../../../../../../..][B./../../../../../../../../..]
[pascal-1-03:19345] MCW rank 2 bound to socket 0[core 0[hwt 1]]: [.B/../../../../../../../../..][../../../../../../../../../..]
[pascal-1-03:19345] MCW rank 3 bound to socket 1[core 10[hwt 1]]: [../../../../../../../../../..][.B/../../../../../../../../..]
MPI Instance 0001 of 0004 is on pascal-1-03, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0001(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0002(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0003(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0004(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0005(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0006(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0007(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0008(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0009(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0010(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0002 of 0004 is on pascal-1-03, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0001(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0002(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0003(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0004(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0005(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0006(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0007(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0008(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0009(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0010(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0003 of 0004 is on pascal-1-03, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0001(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0002(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0003(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0004(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0005(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0006(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0007(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0008(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0009(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0010(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-1-03, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0001(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0002(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0003(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0004(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0005(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0006(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0007(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0008(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0009(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0010(pid 19371), 021, Cpus_allowed_list: 21
The jobs are scheduled to one machine only.
4. mpirun -np 4 --map-by ppr:2:node --use-hwthread-cpus --mca plm_rsh_agent "qrsh" -report-bindings ./myid
pascal-1-00...DE 20
pascal-3-00...DE 20
[pascal-1-00:05867] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-1-00:05867] MCW rank 1 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
[pascal-3-00:07501] MCW rank 2 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-3-00:07501] MCW rank 3 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
MPI Instance 0001 of 0004 is on pascal-1-00, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0001(pid 05884), 034, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0002(pid 05884), 038, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0003(pid 05884), 002, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0004(pid 05884), 008, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0005(pid 05884), 036, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0006(pid 05884), 000, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0007(pid 05884), 004, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0008(pid 05884), 006, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0009(pid 05884), 030, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0010(pid 05884), 032, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0002 of 0004 is on pascal-1-00, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0001(pid 05883), 031, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0002(pid 05883), 017, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0003(pid 05883), 027, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0004(pid 05883), 039, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0005(pid 05883), 011, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0006(pid 05883), 033, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0007(pid 05883), 015, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0008(pid 05883), 021, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0009(pid 05883), 003, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0010(pid 05883), 025, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0003 of 0004 is on pascal-3-00, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0001(pid 07513), 016, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0002(pid 07513), 020, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0003(pid 07513), 022, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0004(pid 07513), 018, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0005(pid 07513), 012, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0006(pid 07513), 004, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0007(pid 07513), 008, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0008(pid 07513), 006, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0009(pid 07513), 030, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0010(pid 07513), 034, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0004 of 0004 is on pascal-3-00, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0001(pid 07514), 017, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0002(pid 07514), 025, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0003(pid 07514), 029, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0004(pid 07514), 003, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0005(pid 07514), 033, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0006(pid 07514), 001, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0007(pid 07514), 007, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0008(pid 07514), 039, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0009(pid 07514), 035, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0010(pid 07514), 031, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
This distribution looks very well with this combination of options "--map-by ppr:2:node --use-hwthread-cpus", with one exception: looking at "MPI Instance 0002", you'll find that "MP thread #0001" is executed on CPU 031, and "MP thread #0005" is executed on CPU 011. 011/031 are the same physical core.
All others are real perfect! Is this error due to my fault or might their be a small remaining binding problem in OpenMPI?
I'd appreciate any hint very much!
Kind regards,
Ado
I’m not entirely sure I understand your reference to “real cores”. When we bind you to a core, we bind you to all the HT’s that comprise that core. So, yes, with HT enabled, the binding report will list things by HT, but you’ll always be bound to the full core if you tell us bind-to core
The default binding directive is bind-to socket when more than 2 processes are in the job, and that’s what you are showing. You can override that by adding "-bind-to core" to your cmd line if that is what you desire.
If you want to use individual HTs as independent processors, then “--use-hwthread-cpus -bind-to hwthreads” would indeed be the right combination.
Post by Heinz-Ado Arnolds
Dear OpenMPI users & developers,
I'm trying to distribute my jobs (with SGE) to a machine with a certain number of nodes, each node having 2 sockets, each socket having 10 cores & 10 hyperthreads. I like to use only the real cores, no hyperthreading.
lscpu -a -e
CPU NODE SOCKET CORE L1d:L1i:L2:L3
0 0 0 0 0:0:0:0
1 1 1 1 1:1:1:1
2 0 0 2 2:2:2:0
3 1 1 3 3:3:3:1
4 0 0 4 4:4:4:0
5 1 1 5 5:5:5:1
6 0 0 6 6:6:6:0
7 1 1 7 7:7:7:1
8 0 0 8 8:8:8:0
9 1 1 9 9:9:9:1
10 0 0 10 10:10:10:0
11 1 1 11 11:11:11:1
12 0 0 12 12:12:12:0
13 1 1 13 13:13:13:1
14 0 0 14 14:14:14:0
15 1 1 15 15:15:15:1
16 0 0 16 16:16:16:0
17 1 1 17 17:17:17:1
18 0 0 18 18:18:18:0
19 1 1 19 19:19:19:1
20 0 0 0 0:0:0:0
21 1 1 1 1:1:1:1
22 0 0 2 2:2:2:0
23 1 1 3 3:3:3:1
24 0 0 4 4:4:4:0
25 1 1 5 5:5:5:1
26 0 0 6 6:6:6:0
27 1 1 7 7:7:7:1
28 0 0 8 8:8:8:0
29 1 1 9 9:9:9:1
30 0 0 10 10:10:10:0
31 1 1 11 11:11:11:1
32 0 0 12 12:12:12:0
33 1 1 13 13:13:13:1
34 0 0 14 14:14:14:0
35 1 1 15 15:15:15:1
36 0 0 16 16:16:16:0
37 1 1 17 17:17:17:1
38 0 0 18 18:18:18:0
39 1 1 19 19:19:19:1
How do I have to choose the options & parameters of mpirun to achieve this behavior?
mpirun -np 4 --map-by ppr:2:node --mca plm_rsh_agent "qrsh" -report-bindings ./myid
distributes to
[pascal-1-04:35735] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-1-04:35735] MCW rank 1 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
[pascal-1-03:00787] MCW rank 2 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-1-03:00787] MCW rank 3 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
MPI Instance 0001 of 0004 is on pascal-1-04,pascal-1-04.MPA-Garching.MPG.DE <http://pascal-1-04.MPA-Garching.MPG.DE>, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0002 of 0004 is on pascal-1-04,pascal-1-04.MPA-Garching.MPG.DE <http://pascal-1-04.MPA-Garching.MPG.DE>, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0003 of 0004 is on pascal-1-03,pascal-1-03.MPA-Garching.MPG.DE <http://pascal-1-03.MPA-Garching.MPG.DE>, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0004 of 0004 is on pascal-1-03,pascal-1-03.MPA-Garching.MPG.DE <http://pascal-1-03.MPA-Garching.MPG.DE>, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
i.e.: 2 nodes: ok, 2 sockets: ok, different set of cores: ok, but uses all hwthreads
I have tried several combinations of --use-hwthread-cpus, --bind-to hwthreads, but didn't find the right combination.
Would be great to get any hints?
Thank a lot in advance,
Heinz-Ado Arnolds
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Gilles Gouaillardet
2017-04-13 06:48:59 UTC
Permalink
Heinz-Ado,


it seems the OpenMP runtime did *not* bind the OMP threads at all as
requested,

and the root cause could be the OMP_PROC_BIND environment variable was
not propagated

can you try

mpirun -x OMP_PROC_BIND ...

and see if it helps ?


Cheers,
Post by Heinz-Ado Arnolds
Dear Gilles,
thanks for your answer.
- compiler: gcc-6.3.0
- OpenMP environment vars: OMP_PROC_BIND=true, GOMP_CPU_AFFINITY not set
- hyperthread a given OpenMP thread is on: it's printed in the output below as a 3-digit number after the first ",", read by sched_getcpu() in the OpenMP test code
- the migration between cores/hyperthreads should be prevented by OMP_PROC_BIND=true
- I didn't find a migration, but the similar use of one core/hyperthread by two OpenMP threads in example "4"/"MPI Instance 0002": 011/031 are both on core #11.
Are there any hints how to cleanly transfer the OpenMPI binding to the OpenMP tasks?
Thanks and kind regards,
Ado
Post by Gilles Gouaillardet
That should be a two steps tango
- Open MPI bind a MPI task to a socket
- the OpenMP runtime bind OpenMP threads to cores (or hyper threads) inside the socket assigned by Open MPI
which compiler are you using ?
do you set some environment variables to direct OpenMP to bind threads ?
Also, how do you measure the hyperthread a given OpenMP thread is on ?
is it the hyperthread used at a given time ? If yes, then the thread might migrate unless it was pinned by the OpenMP runtime.
If you are not sure, please post the source of your program so we can have a look
Last but not least, as long as OpenMP threads are pinned to distinct cores, you should not worry about them migrating between hyperthreads from the same core.
Cheers,
Gilles
Dear rhc,
to make it more clear what I try to achieve, I collected some examples for several combinations of command line options. Would be great if you find time to look to these below. The most promise one is example "4".
I'd like to have 4 MPI jobs starting 1 OpenMP job each with 10 threads, running on 2 nodes, each having 2 sockets, with 10 cores & 10 hwthreads. Only 10 cores (no hwthreads) should be used on each socket.
4 MPI -> 1 OpenMP with 10 thread (i.e. 4x10 threads)
2 nodes, 2 sockets each, 10 cores & 10 hwthreads each
1. mpirun -np 4 --map-by ppr:2:node --mca plm_rsh_agent "qrsh" -report-bindings ./myid
pascal-2-05...DE 20
pascal-1-03...DE 20
[pascal-2-05:28817] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-2-05:28817] MCW rank 1 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
[pascal-1-03:19256] MCW rank 2 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-1-03:19256] MCW rank 3 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
MPI Instance 0001 of 0004 is on pascal-2-05, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0001(pid 28833), 018, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0002(pid 28833), 014, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0003(pid 28833), 028, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0004(pid 28833), 012, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0005(pid 28833), 030, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0006(pid 28833), 016, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0007(pid 28833), 038, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0008(pid 28833), 034, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0009(pid 28833), 020, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0010(pid 28833), 022, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0002 of 0004 is on pascal-2-05, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0001(pid 28834), 007, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0002(pid 28834), 037, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0003(pid 28834), 039, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0004(pid 28834), 035, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0005(pid 28834), 031, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0006(pid 28834), 005, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0007(pid 28834), 027, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0008(pid 28834), 017, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0009(pid 28834), 019, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0010(pid 28834), 029, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0003 of 0004 is on pascal-1-03, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0001(pid 19269), 012, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0002(pid 19269), 034, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0003(pid 19269), 008, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0004(pid 19269), 038, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0005(pid 19269), 032, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0006(pid 19269), 036, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0007(pid 19269), 020, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0008(pid 19269), 002, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0009(pid 19269), 004, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0010(pid 19269), 006, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0004 of 0004 is on pascal-1-03, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0001(pid 19268), 005, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0002(pid 19268), 029, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0003(pid 19268), 015, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0004(pid 19268), 007, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0005(pid 19268), 031, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0006(pid 19268), 013, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0007(pid 19268), 037, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0008(pid 19268), 039, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0009(pid 19268), 021, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0010(pid 19268), 023, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0001 of 0004: MP thread #0001 runs on CPU 018, MP thread #0007 runs on CPU 038,
MP thread #0002 runs on CPU 014, MP thread #0008 runs on CPU 034
according to "lscpu -a -e" CPUs 18/38 resp. 14/34 are the same physical cores
2. mpirun -np 4 --map-by ppr:2:node --use-hwthread-cpus -bind-to hwthread --mca plm_rsh_agent "qrsh" -report-bindings ./myid
pascal-1-05...DE 20
pascal-2-05...DE 20
WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.
Node: pascal-1-05
Open MPI uses the "hwloc" library to perform process and memory
binding. This error message means that hwloc has indicated that
processor binding support is not available on this machine.
On OS X, processor and memory binding is not available at all (i.e.,
the OS does not expose this functionality).
On Linux, lack of the functionality can mean that you are on a
platform where processor and memory affinity is not supported in Linux
itself, or that hwloc was built without NUMA and/or processor affinity
support. When building hwloc (which, depending on your Open MPI
installation, may be embedded in Open MPI itself), it is important to
have the libnuma header and library files available. Different linux
distributions package these files under different names; look for
packages with the word "numa" in them. You may also need a developer
version of the package (e.g., with "dev" or "devel" in the name) to
obtain the relevant header files.
If you are getting this message on a non-OS X, non-Linux platform,
then hwloc does not support processor / memory affinity on this
platform. If the OS/platform does actually support processor / memory
https://github.com/open-mpi/hwloc <https://github.com/open-mpi/hwloc>.
This is a warning only; your job will continue, though performance may
be degraded.
[pascal-1-05:33175] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B./../../../../../../../../..][../../../../../../../../../..]
[pascal-1-05:33175] MCW rank 1 bound to socket 0[core 0[hwt 1]]: [.B/../../../../../../../../..][../../../../../../../../../..]
[pascal-2-05:28916] MCW rank 2 bound to socket 0[core 0[hwt 0]]: [B./../../../../../../../../..][../../../../../../../../../..]
[pascal-2-05:28916] MCW rank 3 bound to socket 0[core 0[hwt 1]]: [.B/../../../../../../../../..][../../../../../../../../../..]
MPI Instance 0001 of 0004 is on pascal-1-05, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0001(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0002(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0003(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0004(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0005(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0006(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0007(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0008(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0009(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0010(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0002 of 0004 is on pascal-1-05, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0001(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0002(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0003(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0004(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0005(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0006(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0007(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0008(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0009(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0010(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-2-05, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0001(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0002(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0003(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0004(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0005(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0006(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0007(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0008(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0009(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0010(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0004 of 0004 is on pascal-2-05, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0001(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0002(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0003(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0004(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0005(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0006(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0007(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0008(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0009(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0010(pid 28929), 020, Cpus_allowed_list: 20
Only 2 CPUs are used and these are the same physical cores.
3. mpirun -np 4 --use-hwthread-cpus -bind-to hwthread --mca plm_rsh_agent "qrsh" -report-bindings ./myid
pascal-1-03...DE 20
pascal-2-02...DE 20
WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.
Node: pascal-1-03
Open MPI uses the "hwloc" library to perform process and memory
binding. This error message means that hwloc has indicated that
processor binding support is not available on this machine.
On OS X, processor and memory binding is not available at all (i.e.,
the OS does not expose this functionality).
On Linux, lack of the functionality can mean that you are on a
platform where processor and memory affinity is not supported in Linux
itself, or that hwloc was built without NUMA and/or processor affinity
support. When building hwloc (which, depending on your Open MPI
installation, may be embedded in Open MPI itself), it is important to
have the libnuma header and library files available. Different linux
distributions package these files under different names; look for
packages with the word "numa" in them. You may also need a developer
version of the package (e.g., with "dev" or "devel" in the name) to
obtain the relevant header files.
If you are getting this message on a non-OS X, non-Linux platform,
then hwloc does not support processor / memory affinity on this
platform. If the OS/platform does actually support processor / memory
https://github.com/open-mpi/hwloc <https://github.com/open-mpi/hwloc>.
This is a warning only; your job will continue, though performance may
be degraded.
[pascal-1-03:19345] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B./../../../../../../../../..][../../../../../../../../../..]
[pascal-1-03:19345] MCW rank 1 bound to socket 1[core 10[hwt 0]]: [../../../../../../../../../..][B./../../../../../../../../..]
[pascal-1-03:19345] MCW rank 2 bound to socket 0[core 0[hwt 1]]: [.B/../../../../../../../../..][../../../../../../../../../..]
[pascal-1-03:19345] MCW rank 3 bound to socket 1[core 10[hwt 1]]: [../../../../../../../../../..][.B/../../../../../../../../..]
MPI Instance 0001 of 0004 is on pascal-1-03, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0001(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0002(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0003(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0004(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0005(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0006(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0007(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0008(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0009(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0010(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0002 of 0004 is on pascal-1-03, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0001(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0002(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0003(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0004(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0005(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0006(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0007(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0008(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0009(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0010(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0003 of 0004 is on pascal-1-03, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0001(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0002(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0003(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0004(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0005(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0006(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0007(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0008(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0009(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0010(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-1-03, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0001(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0002(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0003(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0004(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0005(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0006(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0007(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0008(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0009(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0010(pid 19371), 021, Cpus_allowed_list: 21
The jobs are scheduled to one machine only.
4. mpirun -np 4 --map-by ppr:2:node --use-hwthread-cpus --mca plm_rsh_agent "qrsh" -report-bindings ./myid
pascal-1-00...DE 20
pascal-3-00...DE 20
[pascal-1-00:05867] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-1-00:05867] MCW rank 1 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
[pascal-3-00:07501] MCW rank 2 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-3-00:07501] MCW rank 3 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
MPI Instance 0001 of 0004 is on pascal-1-00, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0001(pid 05884), 034, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0002(pid 05884), 038, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0003(pid 05884), 002, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0004(pid 05884), 008, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0005(pid 05884), 036, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0006(pid 05884), 000, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0007(pid 05884), 004, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0008(pid 05884), 006, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0009(pid 05884), 030, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0010(pid 05884), 032, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0002 of 0004 is on pascal-1-00, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0001(pid 05883), 031, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0002(pid 05883), 017, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0003(pid 05883), 027, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0004(pid 05883), 039, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0005(pid 05883), 011, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0006(pid 05883), 033, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0007(pid 05883), 015, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0008(pid 05883), 021, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0009(pid 05883), 003, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0010(pid 05883), 025, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0003 of 0004 is on pascal-3-00, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0001(pid 07513), 016, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0002(pid 07513), 020, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0003(pid 07513), 022, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0004(pid 07513), 018, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0005(pid 07513), 012, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0006(pid 07513), 004, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0007(pid 07513), 008, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0008(pid 07513), 006, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0009(pid 07513), 030, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0010(pid 07513), 034, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0004 of 0004 is on pascal-3-00, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0001(pid 07514), 017, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0002(pid 07514), 025, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0003(pid 07514), 029, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0004(pid 07514), 003, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0005(pid 07514), 033, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0006(pid 07514), 001, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0007(pid 07514), 007, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0008(pid 07514), 039, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0009(pid 07514), 035, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0010(pid 07514), 031, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
This distribution looks very well with this combination of options "--map-by ppr:2:node --use-hwthread-cpus", with one exception: looking at "MPI Instance 0002", you'll find that "MP thread #0001" is executed on CPU 031, and "MP thread #0005" is executed on CPU 011. 011/031 are the same physical core.
All others are real perfect! Is this error due to my fault or might their be a small remaining binding problem in OpenMPI?
I'd appreciate any hint very much!
Kind regards,
Ado
I’m not entirely sure I understand your reference to “real cores”. When we bind you to a core, we bind you to all the HT’s that comprise that core. So, yes, with HT enabled, the binding report will list things by HT, but you’ll always be bound to the full core if you tell us bind-to core
The default binding directive is bind-to socket when more than 2 processes are in the job, and that’s what you are showing. You can override that by adding "-bind-to core" to your cmd line if that is what you desire.
If you want to use individual HTs as independent processors, then “--use-hwthread-cpus -bind-to hwthreads” would indeed be the right combination.
Post by Heinz-Ado Arnolds
Dear OpenMPI users & developers,
I'm trying to distribute my jobs (with SGE) to a machine with a certain number of nodes, each node having 2 sockets, each socket having 10 cores & 10 hyperthreads. I like to use only the real cores, no hyperthreading.
lscpu -a -e
CPU NODE SOCKET CORE L1d:L1i:L2:L3
0 0 0 0 0:0:0:0
1 1 1 1 1:1:1:1
2 0 0 2 2:2:2:0
3 1 1 3 3:3:3:1
4 0 0 4 4:4:4:0
5 1 1 5 5:5:5:1
6 0 0 6 6:6:6:0
7 1 1 7 7:7:7:1
8 0 0 8 8:8:8:0
9 1 1 9 9:9:9:1
10 0 0 10 10:10:10:0
11 1 1 11 11:11:11:1
12 0 0 12 12:12:12:0
13 1 1 13 13:13:13:1
14 0 0 14 14:14:14:0
15 1 1 15 15:15:15:1
16 0 0 16 16:16:16:0
17 1 1 17 17:17:17:1
18 0 0 18 18:18:18:0
19 1 1 19 19:19:19:1
20 0 0 0 0:0:0:0
21 1 1 1 1:1:1:1
22 0 0 2 2:2:2:0
23 1 1 3 3:3:3:1
24 0 0 4 4:4:4:0
25 1 1 5 5:5:5:1
26 0 0 6 6:6:6:0
27 1 1 7 7:7:7:1
28 0 0 8 8:8:8:0
29 1 1 9 9:9:9:1
30 0 0 10 10:10:10:0
31 1 1 11 11:11:11:1
32 0 0 12 12:12:12:0
33 1 1 13 13:13:13:1
34 0 0 14 14:14:14:0
35 1 1 15 15:15:15:1
36 0 0 16 16:16:16:0
37 1 1 17 17:17:17:1
38 0 0 18 18:18:18:0
39 1 1 19 19:19:19:1
How do I have to choose the options & parameters of mpirun to achieve this behavior?
mpirun -np 4 --map-by ppr:2:node --mca plm_rsh_agent "qrsh" -report-bindings ./myid
distributes to
[pascal-1-04:35735] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-1-04:35735] MCW rank 1 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
[pascal-1-03:00787] MCW rank 2 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-1-03:00787] MCW rank 3 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
MPI Instance 0001 of 0004 is on pascal-1-04,pascal-1-04.MPA-Garching.MPG.DE <http://pascal-1-04.MPA-Garching.MPG.DE>, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0002 of 0004 is on pascal-1-04,pascal-1-04.MPA-Garching.MPG.DE <http://pascal-1-04.MPA-Garching.MPG.DE>, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0003 of 0004 is on pascal-1-03,pascal-1-03.MPA-Garching.MPG.DE <http://pascal-1-03.MPA-Garching.MPG.DE>, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0004 of 0004 is on pascal-1-03,pascal-1-03.MPA-Garching.MPG.DE <http://pascal-1-03.MPA-Garching.MPG.DE>, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
i.e.: 2 nodes: ok, 2 sockets: ok, different set of cores: ok, but uses all hwthreads
I have tried several combinations of --use-hwthread-cpus, --bind-to hwthreads, but didn't find the right combination.
Would be great to get any hints?
Thank a lot in advance,
Heinz-Ado Arnolds
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Heinz-Ado Arnolds
2017-04-13 12:42:45 UTC
Permalink
Dear Gilles,

thanks a lot for your response!

1. You're right, my stupid error, I forgot the "export" of OMP_PROC_BIND in my job script. Now this example is working nearly as expected:

[pascal-1-07:25617] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-1-07:25617] MCW rank 1 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
[pascal-0-06:02774] MCW rank 2 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-0-06:02774] MCW rank 3 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
MPI Instance 0001 of 0004 is on pascal-1-07, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-07: MP thread #0001(pid 25634), cpu# 000, 0x00000001, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-07: MP thread #0002(pid 25634), cpu# 002, 0x00000004, Cpus_allowed_list: 2
MPI Instance 0001 of 0004 is on pascal-1-07: MP thread #0003(pid 25634), cpu# 004, 0x00000010, Cpus_allowed_list: 4
MPI Instance 0001 of 0004 is on pascal-1-07: MP thread #0004(pid 25634), cpu# 006, 0x00000040, Cpus_allowed_list: 6
MPI Instance 0001 of 0004 is on pascal-1-07: MP thread #0005(pid 25634), cpu# 008, 0x00000100, Cpus_allowed_list: 8
MPI Instance 0001 of 0004 is on pascal-1-07: MP thread #0006(pid 25634), cpu# 010, 0x00000400, Cpus_allowed_list: 10
MPI Instance 0001 of 0004 is on pascal-1-07: MP thread #0007(pid 25634), cpu# 012, 0x00001000, Cpus_allowed_list: 12
MPI Instance 0001 of 0004 is on pascal-1-07: MP thread #0008(pid 25634), cpu# 014, 0x00004000, Cpus_allowed_list: 14
MPI Instance 0001 of 0004 is on pascal-1-07: MP thread #0009(pid 25634), cpu# 016, 0x00010000, Cpus_allowed_list: 16
MPI Instance 0001 of 0004 is on pascal-1-07: MP thread #0010(pid 25634), cpu# 018, 0x00040000, Cpus_allowed_list: 18
MPI Instance 0002 of 0004 is on pascal-1-07, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-07: MP thread #0001(pid 25633), cpu# 001, 0x00000002, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-07: MP thread #0002(pid 25633), cpu# 003, 0x00000008, Cpus_allowed_list: 3
MPI Instance 0002 of 0004 is on pascal-1-07: MP thread #0003(pid 25633), cpu# 005, 0x00000020, Cpus_allowed_list: 5
MPI Instance 0002 of 0004 is on pascal-1-07: MP thread #0004(pid 25633), cpu# 007, 0x00000080, Cpus_allowed_list: 7
MPI Instance 0002 of 0004 is on pascal-1-07: MP thread #0005(pid 25633), cpu# 009, 0x00000200, Cpus_allowed_list: 9
MPI Instance 0002 of 0004 is on pascal-1-07: MP thread #0006(pid 25633), cpu# 011, 0x00000800, Cpus_allowed_list: 11
MPI Instance 0002 of 0004 is on pascal-1-07: MP thread #0007(pid 25633), cpu# 013, 0x00002000, Cpus_allowed_list: 13
MPI Instance 0002 of 0004 is on pascal-1-07: MP thread #0008(pid 25633), cpu# 015, 0x00008000, Cpus_allowed_list: 15
MPI Instance 0002 of 0004 is on pascal-1-07: MP thread #0009(pid 25633), cpu# 017, 0x00020000, Cpus_allowed_list: 17
MPI Instance 0002 of 0004 is on pascal-1-07: MP thread #0010(pid 25633), cpu# 019, 0x00080000, Cpus_allowed_list: 19
MPI Instance 0003 of 0004 is on pascal-0-06, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-0-06: MP thread #0001(pid 02787), cpu# 000, 0x00000001, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-0-06: MP thread #0002(pid 02787), cpu# 002, 0x00000004, Cpus_allowed_list: 2
MPI Instance 0003 of 0004 is on pascal-0-06: MP thread #0003(pid 02787), cpu# 004, 0x00000010, Cpus_allowed_list: 4
MPI Instance 0003 of 0004 is on pascal-0-06: MP thread #0004(pid 02787), cpu# 006, 0x00000040, Cpus_allowed_list: 6
MPI Instance 0003 of 0004 is on pascal-0-06: MP thread #0005(pid 02787), cpu# 008, 0x00000100, Cpus_allowed_list: 8
MPI Instance 0003 of 0004 is on pascal-0-06: MP thread #0006(pid 02787), cpu# 010, 0x00000400, Cpus_allowed_list: 10
MPI Instance 0003 of 0004 is on pascal-0-06: MP thread #0007(pid 02787), cpu# 012, 0x00001000, Cpus_allowed_list: 12
MPI Instance 0003 of 0004 is on pascal-0-06: MP thread #0008(pid 02787), cpu# 014, 0x00004000, Cpus_allowed_list: 14
MPI Instance 0003 of 0004 is on pascal-0-06: MP thread #0009(pid 02787), cpu# 016, 0x00010000, Cpus_allowed_list: 16
MPI Instance 0003 of 0004 is on pascal-0-06: MP thread #0010(pid 02787), cpu# 018, 0x00040000, Cpus_allowed_list: 18
MPI Instance 0004 of 0004 is on pascal-0-06, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-0-06: MP thread #0001(pid 02786), cpu# 001, 0x00000002, Cpus_allowed_list: 1
MPI Instance 0004 of 0004 is on pascal-0-06: MP thread #0002(pid 02786), cpu# 003, 0x00000008, Cpus_allowed_list: 3
MPI Instance 0004 of 0004 is on pascal-0-06: MP thread #0003(pid 02786), cpu# 005, 0x00000020, Cpus_allowed_list: 5
MPI Instance 0004 of 0004 is on pascal-0-06: MP thread #0004(pid 02786), cpu# 007, 0x00000080, Cpus_allowed_list: 7
MPI Instance 0004 of 0004 is on pascal-0-06: MP thread #0005(pid 02786), cpu# 009, 0x00000200, Cpus_allowed_list: 9
MPI Instance 0004 of 0004 is on pascal-0-06: MP thread #0006(pid 02786), cpu# 011, 0x00000800, Cpus_allowed_list: 11
MPI Instance 0004 of 0004 is on pascal-0-06: MP thread #0007(pid 02786), cpu# 013, 0x00002000, Cpus_allowed_list: 13
MPI Instance 0004 of 0004 is on pascal-0-06: MP thread #0008(pid 02786), cpu# 015, 0x00008000, Cpus_allowed_list: 15
MPI Instance 0004 of 0004 is on pascal-0-06: MP thread #0009(pid 02786), cpu# 017, 0x00020000, Cpus_allowed_list: 17
MPI Instance 0004 of 0004 is on pascal-0-06: MP thread #0010(pid 02786), cpu# 019, 0x00080000, Cpus_allowed_list: 19

Only remaining question: why is "Cpus_allowed_list" of the OpenMPI job still listing the full range of all cores/hwthreads, but OpenMP jobs only use numbers 0-19 (as expected)?

2. I have a different scenario which doesn't still work as expected:

Now i like to have 8 OpenMPI jobs for 2 nodes -> 4 OpenMPI jobs per node -> 2 per socket, each executing one OpenMP job with 5 threads

mpirun -np 8 --map-by ppr:2:socket --use-hwthread-cpus -report-bindings --mca plm_rsh_agent "qrsh" ./myid

I'd like to have a binding like this
cores
node 0 socket 0: 0+2+4+6+8 10+12+14+16+18
socket 1: 1+3+5+7+9 11+13+15+17+19
node 1 socket 0: 0+2+4+6+8 10+12+14+16+18
socket 1: 1+3+5+7+9 11+13+15+17+19

but as you find below all jobs are bound to all cores again which leads to a situation like
cores
node 0 socket 0: 0+2+4+6+8 0+2+4+6+8
socket 1: 1+3+5+7+9 1+3+5+7+9
node 1 socket 0: 0+2+4+6+8 0+2+4+6+8
socket 1: 1+3+5+7+9 1+3+5+7+9

Could you give me a hint again how I could imporve that?

[pascal-0-01:01972] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-0-01:01972] MCW rank 1 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-0-01:01972] MCW rank 2 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
[pascal-0-01:01972] MCW rank 3 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
[pascal-2-01:18506] MCW rank 4 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-2-01:18506] MCW rank 5 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-2-01:18506] MCW rank 6 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
[pascal-2-01:18506] MCW rank 7 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
MPI Instance 0001 of 0008 is on pascal-0-01, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0008 is on pascal-0-01: MP thread #0001(pid 01999), cpu# 000, 0x00000001, Cpus_allowed_list: 0
MPI Instance 0001 of 0008 is on pascal-0-01: MP thread #0002(pid 01999), cpu# 002, 0x00000004, Cpus_allowed_list: 2
MPI Instance 0001 of 0008 is on pascal-0-01: MP thread #0003(pid 01999), cpu# 004, 0x00000010, Cpus_allowed_list: 4
MPI Instance 0001 of 0008 is on pascal-0-01: MP thread #0004(pid 01999), cpu# 006, 0x00000040, Cpus_allowed_list: 6
MPI Instance 0001 of 0008 is on pascal-0-01: MP thread #0005(pid 01999), cpu# 008, 0x00000100, Cpus_allowed_list: 8
MPI Instance 0002 of 0008 is on pascal-0-01, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0002 of 0008 is on pascal-0-01: MP thread #0001(pid 01996), cpu# 000, 0x00000001, Cpus_allowed_list: 0
MPI Instance 0002 of 0008 is on pascal-0-01: MP thread #0002(pid 01996), cpu# 002, 0x00000004, Cpus_allowed_list: 2
MPI Instance 0002 of 0008 is on pascal-0-01: MP thread #0003(pid 01996), cpu# 004, 0x00000010, Cpus_allowed_list: 4
MPI Instance 0002 of 0008 is on pascal-0-01: MP thread #0004(pid 01996), cpu# 006, 0x00000040, Cpus_allowed_list: 6
MPI Instance 0002 of 0008 is on pascal-0-01: MP thread #0005(pid 01996), cpu# 008, 0x00000100, Cpus_allowed_list: 8
MPI Instance 0003 of 0008 is on pascal-0-01, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0003 of 0008 is on pascal-0-01: MP thread #0001(pid 01998), cpu# 001, 0x00000002, Cpus_allowed_list: 1
MPI Instance 0003 of 0008 is on pascal-0-01: MP thread #0002(pid 01998), cpu# 003, 0x00000008, Cpus_allowed_list: 3
MPI Instance 0003 of 0008 is on pascal-0-01: MP thread #0003(pid 01998), cpu# 005, 0x00000020, Cpus_allowed_list: 5
MPI Instance 0003 of 0008 is on pascal-0-01: MP thread #0004(pid 01998), cpu# 007, 0x00000080, Cpus_allowed_list: 7
MPI Instance 0003 of 0008 is on pascal-0-01: MP thread #0005(pid 01998), cpu# 009, 0x00000200, Cpus_allowed_list: 9
MPI Instance 0004 of 0008 is on pascal-0-01, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0008 is on pascal-0-01: MP thread #0001(pid 01997), cpu# 001, 0x00000002, Cpus_allowed_list: 1
MPI Instance 0004 of 0008 is on pascal-0-01: MP thread #0002(pid 01997), cpu# 003, 0x00000008, Cpus_allowed_list: 3
MPI Instance 0004 of 0008 is on pascal-0-01: MP thread #0003(pid 01997), cpu# 005, 0x00000020, Cpus_allowed_list: 5
MPI Instance 0004 of 0008 is on pascal-0-01: MP thread #0004(pid 01997), cpu# 007, 0x00000080, Cpus_allowed_list: 7
MPI Instance 0004 of 0008 is on pascal-0-01: MP thread #0005(pid 01997), cpu# 009, 0x00000200, Cpus_allowed_list: 9
MPI Instance 0005 of 0008 is on pascal-2-01, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0005 of 0008 is on pascal-2-01: MP thread #0001(pid 18531), cpu# 000, 0x00000001, Cpus_allowed_list: 0
MPI Instance 0005 of 0008 is on pascal-2-01: MP thread #0002(pid 18531), cpu# 002, 0x00000004, Cpus_allowed_list: 2
MPI Instance 0005 of 0008 is on pascal-2-01: MP thread #0003(pid 18531), cpu# 004, 0x00000010, Cpus_allowed_list: 4
MPI Instance 0005 of 0008 is on pascal-2-01: MP thread #0004(pid 18531), cpu# 006, 0x00000040, Cpus_allowed_list: 6
MPI Instance 0005 of 0008 is on pascal-2-01: MP thread #0005(pid 18531), cpu# 008, 0x00000100, Cpus_allowed_list: 8
MPI Instance 0006 of 0008 is on pascal-2-01, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0006 of 0008 is on pascal-2-01: MP thread #0001(pid 18530), cpu# 000, 0x00000001, Cpus_allowed_list: 0
MPI Instance 0006 of 0008 is on pascal-2-01: MP thread #0002(pid 18530), cpu# 002, 0x00000004, Cpus_allowed_list: 2
MPI Instance 0006 of 0008 is on pascal-2-01: MP thread #0003(pid 18530), cpu# 004, 0x00000010, Cpus_allowed_list: 4
MPI Instance 0006 of 0008 is on pascal-2-01: MP thread #0004(pid 18530), cpu# 006, 0x00000040, Cpus_allowed_list: 6
MPI Instance 0006 of 0008 is on pascal-2-01: MP thread #0005(pid 18530), cpu# 008, 0x00000100, Cpus_allowed_list: 8
MPI Instance 0007 of 0008 is on pascal-2-01, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0007 of 0008 is on pascal-2-01: MP thread #0001(pid 18528), cpu# 001, 0x00000002, Cpus_allowed_list: 1
MPI Instance 0007 of 0008 is on pascal-2-01: MP thread #0002(pid 18528), cpu# 003, 0x00000008, Cpus_allowed_list: 3
MPI Instance 0007 of 0008 is on pascal-2-01: MP thread #0003(pid 18528), cpu# 005, 0x00000020, Cpus_allowed_list: 5
MPI Instance 0007 of 0008 is on pascal-2-01: MP thread #0004(pid 18528), cpu# 007, 0x00000080, Cpus_allowed_list: 7
MPI Instance 0007 of 0008 is on pascal-2-01: MP thread #0005(pid 18528), cpu# 009, 0x00000200, Cpus_allowed_list: 9
MPI Instance 0008 of 0008 is on pascal-2-01, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0008 of 0008 is on pascal-2-01: MP thread #0001(pid 18527), cpu# 001, 0x00000002, Cpus_allowed_list: 1
MPI Instance 0008 of 0008 is on pascal-2-01: MP thread #0002(pid 18527), cpu# 003, 0x00000008, Cpus_allowed_list: 3
MPI Instance 0008 of 0008 is on pascal-2-01: MP thread #0003(pid 18527), cpu# 005, 0x00000020, Cpus_allowed_list: 5
MPI Instance 0008 of 0008 is on pascal-2-01: MP thread #0004(pid 18527), cpu# 007, 0x00000080, Cpus_allowed_list: 7
MPI Instance 0008 of 0008 is on pascal-2-01: MP thread #0005(pid 18527), cpu# 009, 0x00000200, Cpus_allowed_list: 9

Thanks a lot in advance for your advice any have nice Easter days!

Ado
Post by Gilles Gouaillardet
Heinz-Ado,
it seems the OpenMP runtime did *not* bind the OMP threads at all as requested,
and the root cause could be the OMP_PROC_BIND environment variable was not propagated
can you try
mpirun -x OMP_PROC_BIND ...
and see if it helps ?
Cheers,
Post by Heinz-Ado Arnolds
Dear Gilles,
thanks for your answer.
- compiler: gcc-6.3.0
- OpenMP environment vars: OMP_PROC_BIND=true, GOMP_CPU_AFFINITY not set
- hyperthread a given OpenMP thread is on: it's printed in the output below as a 3-digit number after the first ",", read by sched_getcpu() in the OpenMP test code
- the migration between cores/hyperthreads should be prevented by OMP_PROC_BIND=true
- I didn't find a migration, but the similar use of one core/hyperthread by two OpenMP threads in example "4"/"MPI Instance 0002": 011/031 are both on core #11.
Are there any hints how to cleanly transfer the OpenMPI binding to the OpenMP tasks?
Thanks and kind regards,
Ado
Post by Gilles Gouaillardet
That should be a two steps tango
- Open MPI bind a MPI task to a socket
- the OpenMP runtime bind OpenMP threads to cores (or hyper threads) inside the socket assigned by Open MPI
which compiler are you using ?
do you set some environment variables to direct OpenMP to bind threads ?
Also, how do you measure the hyperthread a given OpenMP thread is on ?
is it the hyperthread used at a given time ? If yes, then the thread might migrate unless it was pinned by the OpenMP runtime.
If you are not sure, please post the source of your program so we can have a look
Last but not least, as long as OpenMP threads are pinned to distinct cores, you should not worry about them migrating between hyperthreads from the same core.
Cheers,
Gilles
Dear rhc,
to make it more clear what I try to achieve, I collected some examples for several combinations of command line options. Would be great if you find time to look to these below. The most promise one is example "4".
I'd like to have 4 MPI jobs starting 1 OpenMP job each with 10 threads, running on 2 nodes, each having 2 sockets, with 10 cores & 10 hwthreads. Only 10 cores (no hwthreads) should be used on each socket.
4 MPI -> 1 OpenMP with 10 thread (i.e. 4x10 threads)
2 nodes, 2 sockets each, 10 cores & 10 hwthreads each
1. mpirun -np 4 --map-by ppr:2:node --mca plm_rsh_agent "qrsh" -report-bindings ./myid
pascal-2-05...DE 20
pascal-1-03...DE 20
[pascal-2-05:28817] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-2-05:28817] MCW rank 1 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
[pascal-1-03:19256] MCW rank 2 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-1-03:19256] MCW rank 3 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
MPI Instance 0001 of 0004 is on pascal-2-05, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0001(pid 28833), 018, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0002(pid 28833), 014, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0003(pid 28833), 028, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0004(pid 28833), 012, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0005(pid 28833), 030, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0006(pid 28833), 016, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0007(pid 28833), 038, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0008(pid 28833), 034, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0009(pid 28833), 020, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-2-05: MP thread #0010(pid 28833), 022, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0002 of 0004 is on pascal-2-05, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0001(pid 28834), 007, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0002(pid 28834), 037, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0003(pid 28834), 039, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0004(pid 28834), 035, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0005(pid 28834), 031, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0006(pid 28834), 005, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0007(pid 28834), 027, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0008(pid 28834), 017, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0009(pid 28834), 019, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-2-05: MP thread #0010(pid 28834), 029, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0003 of 0004 is on pascal-1-03, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0001(pid 19269), 012, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0002(pid 19269), 034, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0003(pid 19269), 008, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0004(pid 19269), 038, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0005(pid 19269), 032, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0006(pid 19269), 036, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0007(pid 19269), 020, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0008(pid 19269), 002, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0009(pid 19269), 004, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0010(pid 19269), 006, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0004 of 0004 is on pascal-1-03, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0001(pid 19268), 005, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0002(pid 19268), 029, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0003(pid 19268), 015, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0004(pid 19268), 007, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0005(pid 19268), 031, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0006(pid 19268), 013, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0007(pid 19268), 037, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0008(pid 19268), 039, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0009(pid 19268), 021, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0010(pid 19268), 023, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0001 of 0004: MP thread #0001 runs on CPU 018, MP thread #0007 runs on CPU 038,
MP thread #0002 runs on CPU 014, MP thread #0008 runs on CPU 034
according to "lscpu -a -e" CPUs 18/38 resp. 14/34 are the same physical cores
2. mpirun -np 4 --map-by ppr:2:node --use-hwthread-cpus -bind-to hwthread --mca plm_rsh_agent "qrsh" -report-bindings ./myid
pascal-1-05...DE 20
pascal-2-05...DE 20
WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.
Node: pascal-1-05
Open MPI uses the "hwloc" library to perform process and memory
binding. This error message means that hwloc has indicated that
processor binding support is not available on this machine.
On OS X, processor and memory binding is not available at all (i.e.,
the OS does not expose this functionality).
On Linux, lack of the functionality can mean that you are on a
platform where processor and memory affinity is not supported in Linux
itself, or that hwloc was built without NUMA and/or processor affinity
support. When building hwloc (which, depending on your Open MPI
installation, may be embedded in Open MPI itself), it is important to
have the libnuma header and library files available. Different linux
distributions package these files under different names; look for
packages with the word "numa" in them. You may also need a developer
version of the package (e.g., with "dev" or "devel" in the name) to
obtain the relevant header files.
If you are getting this message on a non-OS X, non-Linux platform,
then hwloc does not support processor / memory affinity on this
platform. If the OS/platform does actually support processor / memory
https://github.com/open-mpi/hwloc <https://github.com/open-mpi/hwloc>.
This is a warning only; your job will continue, though performance may
be degraded.
[pascal-1-05:33175] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B./../../../../../../../../..][../../../../../../../../../..]
[pascal-1-05:33175] MCW rank 1 bound to socket 0[core 0[hwt 1]]: [.B/../../../../../../../../..][../../../../../../../../../..]
[pascal-2-05:28916] MCW rank 2 bound to socket 0[core 0[hwt 0]]: [B./../../../../../../../../..][../../../../../../../../../..]
[pascal-2-05:28916] MCW rank 3 bound to socket 0[core 0[hwt 1]]: [.B/../../../../../../../../..][../../../../../../../../../..]
MPI Instance 0001 of 0004 is on pascal-1-05, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0001(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0002(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0003(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0004(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0005(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0006(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0007(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0008(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0009(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-05: MP thread #0010(pid 33193), 000, Cpus_allowed_list: 0
MPI Instance 0002 of 0004 is on pascal-1-05, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0001(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0002(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0003(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0004(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0005(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0006(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0007(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0008(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0009(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0002 of 0004 is on pascal-1-05: MP thread #0010(pid 33192), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-2-05, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0001(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0002(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0003(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0004(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0005(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0006(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0007(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0008(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0009(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0003 of 0004 is on pascal-2-05: MP thread #0010(pid 28930), 000, Cpus_allowed_list: 0
MPI Instance 0004 of 0004 is on pascal-2-05, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0001(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0002(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0003(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0004(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0005(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0006(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0007(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0008(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0009(pid 28929), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-2-05: MP thread #0010(pid 28929), 020, Cpus_allowed_list: 20
Only 2 CPUs are used and these are the same physical cores.
3. mpirun -np 4 --use-hwthread-cpus -bind-to hwthread --mca plm_rsh_agent "qrsh" -report-bindings ./myid
pascal-1-03...DE 20
pascal-2-02...DE 20
WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.
Node: pascal-1-03
Open MPI uses the "hwloc" library to perform process and memory
binding. This error message means that hwloc has indicated that
processor binding support is not available on this machine.
On OS X, processor and memory binding is not available at all (i.e.,
the OS does not expose this functionality).
On Linux, lack of the functionality can mean that you are on a
platform where processor and memory affinity is not supported in Linux
itself, or that hwloc was built without NUMA and/or processor affinity
support. When building hwloc (which, depending on your Open MPI
installation, may be embedded in Open MPI itself), it is important to
have the libnuma header and library files available. Different linux
distributions package these files under different names; look for
packages with the word "numa" in them. You may also need a developer
version of the package (e.g., with "dev" or "devel" in the name) to
obtain the relevant header files.
If you are getting this message on a non-OS X, non-Linux platform,
then hwloc does not support processor / memory affinity on this
platform. If the OS/platform does actually support processor / memory
https://github.com/open-mpi/hwloc <https://github.com/open-mpi/hwloc>.
This is a warning only; your job will continue, though performance may
be degraded.
[pascal-1-03:19345] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B./../../../../../../../../..][../../../../../../../../../..]
[pascal-1-03:19345] MCW rank 1 bound to socket 1[core 10[hwt 0]]: [../../../../../../../../../..][B./../../../../../../../../..]
[pascal-1-03:19345] MCW rank 2 bound to socket 0[core 0[hwt 1]]: [.B/../../../../../../../../..][../../../../../../../../../..]
[pascal-1-03:19345] MCW rank 3 bound to socket 1[core 10[hwt 1]]: [../../../../../../../../../..][.B/../../../../../../../../..]
MPI Instance 0001 of 0004 is on pascal-1-03, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0001(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0002(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0003(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0004(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0005(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0006(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0007(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0008(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0009(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0001 of 0004 is on pascal-1-03: MP thread #0010(pid 19373), 000, Cpus_allowed_list: 0
MPI Instance 0002 of 0004 is on pascal-1-03, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0001(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0002(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0003(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0004(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0005(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0006(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0007(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0008(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0009(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0002 of 0004 is on pascal-1-03: MP thread #0010(pid 19372), 001, Cpus_allowed_list: 1
MPI Instance 0003 of 0004 is on pascal-1-03, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0001(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0002(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0003(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0004(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0005(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0006(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0007(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0008(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0009(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0003 of 0004 is on pascal-1-03: MP thread #0010(pid 19370), 020, Cpus_allowed_list: 20
MPI Instance 0004 of 0004 is on pascal-1-03, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0001(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0002(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0003(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0004(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0005(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0006(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0007(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0008(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0009(pid 19371), 021, Cpus_allowed_list: 21
MPI Instance 0004 of 0004 is on pascal-1-03: MP thread #0010(pid 19371), 021, Cpus_allowed_list: 21
The jobs are scheduled to one machine only.
4. mpirun -np 4 --map-by ppr:2:node --use-hwthread-cpus --mca plm_rsh_agent "qrsh" -report-bindings ./myid
pascal-1-00...DE 20
pascal-3-00...DE 20
[pascal-1-00:05867] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-1-00:05867] MCW rank 1 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
[pascal-3-00:07501] MCW rank 2 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-3-00:07501] MCW rank 3 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
MPI Instance 0001 of 0004 is on pascal-1-00, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0001(pid 05884), 034, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0002(pid 05884), 038, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0003(pid 05884), 002, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0004(pid 05884), 008, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0005(pid 05884), 036, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0006(pid 05884), 000, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0007(pid 05884), 004, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0008(pid 05884), 006, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0009(pid 05884), 030, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0001 of 0004 is on pascal-1-00: MP thread #0010(pid 05884), 032, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0002 of 0004 is on pascal-1-00, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0001(pid 05883), 031, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0002(pid 05883), 017, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0003(pid 05883), 027, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0004(pid 05883), 039, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0005(pid 05883), 011, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0006(pid 05883), 033, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0007(pid 05883), 015, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0008(pid 05883), 021, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0009(pid 05883), 003, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0002 of 0004 is on pascal-1-00: MP thread #0010(pid 05883), 025, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0003 of 0004 is on pascal-3-00, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0001(pid 07513), 016, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0002(pid 07513), 020, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0003(pid 07513), 022, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0004(pid 07513), 018, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0005(pid 07513), 012, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0006(pid 07513), 004, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0007(pid 07513), 008, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0008(pid 07513), 006, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0009(pid 07513), 030, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0003 of 0004 is on pascal-3-00: MP thread #0010(pid 07513), 034, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0004 of 0004 is on pascal-3-00, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0001(pid 07514), 017, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0002(pid 07514), 025, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0003(pid 07514), 029, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0004(pid 07514), 003, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0005(pid 07514), 033, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0006(pid 07514), 001, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0007(pid 07514), 007, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0008(pid 07514), 039, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0009(pid 07514), 035, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0004 of 0004 is on pascal-3-00: MP thread #0010(pid 07514), 031, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
This distribution looks very well with this combination of options "--map-by ppr:2:node --use-hwthread-cpus", with one exception: looking at "MPI Instance 0002", you'll find that "MP thread #0001" is executed on CPU 031, and "MP thread #0005" is executed on CPU 011. 011/031 are the same physical core.
All others are real perfect! Is this error due to my fault or might their be a small remaining binding problem in OpenMPI?
I'd appreciate any hint very much!
Kind regards,
Ado
I’m not entirely sure I understand your reference to “real cores”. When we bind you to a core, we bind you to all the HT’s that comprise that core. So, yes, with HT enabled, the binding report will list things by HT, but you’ll always be bound to the full core if you tell us bind-to core
The default binding directive is bind-to socket when more than 2 processes are in the job, and that’s what you are showing. You can override that by adding "-bind-to core" to your cmd line if that is what you desire.
If you want to use individual HTs as independent processors, then “--use-hwthread-cpus -bind-to hwthreads” would indeed be the right combination.
Post by Heinz-Ado Arnolds
Dear OpenMPI users & developers,
I'm trying to distribute my jobs (with SGE) to a machine with a certain number of nodes, each node having 2 sockets, each socket having 10 cores & 10 hyperthreads. I like to use only the real cores, no hyperthreading.
lscpu -a -e
CPU NODE SOCKET CORE L1d:L1i:L2:L3
0 0 0 0 0:0:0:0
1 1 1 1 1:1:1:1
2 0 0 2 2:2:2:0
3 1 1 3 3:3:3:1
4 0 0 4 4:4:4:0
5 1 1 5 5:5:5:1
6 0 0 6 6:6:6:0
7 1 1 7 7:7:7:1
8 0 0 8 8:8:8:0
9 1 1 9 9:9:9:1
10 0 0 10 10:10:10:0
11 1 1 11 11:11:11:1
12 0 0 12 12:12:12:0
13 1 1 13 13:13:13:1
14 0 0 14 14:14:14:0
15 1 1 15 15:15:15:1
16 0 0 16 16:16:16:0
17 1 1 17 17:17:17:1
18 0 0 18 18:18:18:0
19 1 1 19 19:19:19:1
20 0 0 0 0:0:0:0
21 1 1 1 1:1:1:1
22 0 0 2 2:2:2:0
23 1 1 3 3:3:3:1
24 0 0 4 4:4:4:0
25 1 1 5 5:5:5:1
26 0 0 6 6:6:6:0
27 1 1 7 7:7:7:1
28 0 0 8 8:8:8:0
29 1 1 9 9:9:9:1
30 0 0 10 10:10:10:0
31 1 1 11 11:11:11:1
32 0 0 12 12:12:12:0
33 1 1 13 13:13:13:1
34 0 0 14 14:14:14:0
35 1 1 15 15:15:15:1
36 0 0 16 16:16:16:0
37 1 1 17 17:17:17:1
38 0 0 18 18:18:18:0
39 1 1 19 19:19:19:1
How do I have to choose the options & parameters of mpirun to achieve this behavior?
mpirun -np 4 --map-by ppr:2:node --mca plm_rsh_agent "qrsh" -report-bindings ./myid
distributes to
[pascal-1-04:35735] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-1-04:35735] MCW rank 1 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
[pascal-1-03:00787] MCW rank 2 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]], socket 0[core 6[hwt 0-1]], socket 0[core 7[hwt 0-1]], socket 0[core 8[hwt 0-1]], socket 0[core 9[hwt 0-1]]: [BB/BB/BB/BB/BB/BB/BB/BB/BB/BB][../../../../../../../../../..]
[pascal-1-03:00787] MCW rank 3 bound to socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]], socket 1[core 12[hwt 0-1]], socket 1[core 13[hwt 0-1]], socket 1[core 14[hwt 0-1]], socket 1[core 15[hwt 0-1]], socket 1[core 16[hwt 0-1]], socket 1[core 17[hwt 0-1]], socket 1[core 18[hwt 0-1]], socket 1[core 19[hwt 0-1]]: [../../../../../../../../../..][BB/BB/BB/BB/BB/BB/BB/BB/BB/BB]
MPI Instance 0001 of 0004 is on pascal-1-04,pascal-1-04.MPA-Garching.MPG.DE <http://pascal-1-04.MPA-Garching.MPG.DE>, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0002 of 0004 is on pascal-1-04,pascal-1-04.MPA-Garching.MPG.DE <http://pascal-1-04.MPA-Garching.MPG.DE>, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
MPI Instance 0003 of 0004 is on pascal-1-03,pascal-1-03.MPA-Garching.MPG.DE <http://pascal-1-03.MPA-Garching.MPG.DE>, Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
MPI Instance 0004 of 0004 is on pascal-1-03,pascal-1-03.MPA-Garching.MPG.DE <http://pascal-1-03.MPA-Garching.MPG.DE>, Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
i.e.: 2 nodes: ok, 2 sockets: ok, different set of cores: ok, but uses all hwthreads
I have tried several combinations of --use-hwthread-cpus, --bind-to hwthreads, but didn't find the right combination.
Would be great to get any hints?
Thank a lot in advance,
Heinz-Ado Arnolds
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Heinz-Ado Arnolds
2017-04-13 13:31:39 UTC
Permalink
On 13.04.2017 15:20, ***@rist.or.jp wrote:
...
in your second case, there are 2 things
- MPI binds to socket, that is why two MPI tasks are assigned the same
hyperthreads
- the GNU OpenMP runtime looks unable to figure out 2 processes use the
same cores, and hence end up binding
the OpenMP threads to the same cores.
my best bet is you should bind a MPI tasks to 5 cores instead of one
socket.
i do not know the syntax off hand, and i am sure Ralph will help you
with that
Thanks, would be great if someone has that syntax.

Cheers,

Ado
r***@open-mpi.org
2017-04-13 13:49:17 UTC
Permalink
You can always specify a particular number of cpus to use for each process by adding it to the map-by directive:

mpirun -np 8 --map-by ppr:2:socket:pe=5 --use-hwthread-cpus -report-bindings --mca plm_rsh_agent "qrsh" ./myid

would map 2 processes to each socket, binding each process to 5 HTs on that socket (since you told us to treat HTs as independent cpus). If you want us to bind to you 5 cores, then you need to remove that --use-hwthread-cpus directive.

As I said earlier in this thread, we are actively working with the OpenMP folks on a mechanism by which the two sides can coordinate these actions so it will be easier to get the desired behavior. For now, though, hopefully this will suffice.
Post by Heinz-Ado Arnolds
...
in your second case, there are 2 things
- MPI binds to socket, that is why two MPI tasks are assigned the same
hyperthreads
- the GNU OpenMP runtime looks unable to figure out 2 processes use the
same cores, and hence end up binding
the OpenMP threads to the same cores.
my best bet is you should bind a MPI tasks to 5 cores instead of one
socket.
i do not know the syntax off hand, and i am sure Ralph will help you
with that
Thanks, would be great if someone has that syntax.
Cheers,
Ado
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Heinz-Ado Arnolds
2017-04-13 15:26:42 UTC
Permalink
Dear Ralph,

thanks a lot for this valuable advice. Binding now works like expected!

Since adding the ":pe=" option I'm getting warnings

WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.

Node: pascal-1-05
...

even if I choose parameters so that binding is like exactly as before without ":pe=". I don't have libnuma installed on the cluster. Might that really be the cause of the warning?

Thanks a lot, have a nice Easter days

Ado
Post by r***@open-mpi.org
mpirun -np 8 --map-by ppr:2:socket:pe=5 --use-hwthread-cpus -report-bindings --mca plm_rsh_agent "qrsh" ./myid
would map 2 processes to each socket, binding each process to 5 HTs on that socket (since you told us to treat HTs as independent cpus). If you want us to bind to you 5 cores, then you need to remove that --use-hwthread-cpus directive.
As I said earlier in this thread, we are actively working with the OpenMP folks on a mechanism by which the two sides can coordinate these actions so it will be easier to get the desired behavior. For now, though, hopefully this will suffice.
Post by Heinz-Ado Arnolds
...
in your second case, there are 2 things
- MPI binds to socket, that is why two MPI tasks are assigned the same
hyperthreads
- the GNU OpenMP runtime looks unable to figure out 2 processes use the
same cores, and hence end up binding
the OpenMP threads to the same cores.
my best bet is you should bind a MPI tasks to 5 cores instead of one
socket.
i do not know the syntax off hand, and i am sure Ralph will help you
with that
Thanks, would be great if someone has that syntax.
Cheers,
Ado
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
r***@open-mpi.org
2017-04-22 14:45:17 UTC
Permalink
Sorry for delayed response. I’m glad that option solved the problem. We’ll have to look at that configure option - shouldn’t be too hard.

As for the mapping you requested - no problem! Here’s the cmd line:

mpirun --map-by ppr:1:core --bind-to hwthread

Ralph
Dear Ralph, dear Gilles,
thanks a lot for your help! The hints to use ":pe=<n>" and to install libnuma have been the keys to solve my problems.
Perhaps it would not be a bad idea to include --enable-libnuma in the configure help, and make it a default, so that one has to specify --disable-libnuma if he really likes to work without numactl. The option is already checked in configure (framework in opal/mca/hwloc/hwloc1112/hwloc/config/hwloc.m4).
One qestion remains: I now get a binding like
[pascal-3-06:03036] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]]: [BB/BB/BB/BB/BB/../../../../..][../../../../../../../../../..]
and OpenMP uses just "hwt 0" of each core, what is very welcome. But is there a way to get a binding like
[pascal-3-06:03036] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]]: [B./B./B./B./B./../../../../..][../../../../../../../../../..]
from OpenMPI directly?
Cheers and thanks again,
Ado
Yeah, we need libnuma to set the memory binding. There is a param to turn off the warning if installing libnuma is problematic, but it helps your performance if the memory is kept local to the proc
Post by Heinz-Ado Arnolds
Dear Ralph,
thanks a lot for this valuable advice. Binding now works like expected!
Since adding the ":pe=" option I'm getting warnings
WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.
Node: pascal-1-05
...
even if I choose parameters so that binding is like exactly as before without ":pe=". I don't have libnuma installed on the cluster. Might that really be the cause of the warning?
Thanks a lot, have a nice Easter days
Ado
Post by r***@open-mpi.org
mpirun -np 8 --map-by ppr:2:socket:pe=5 --use-hwthread-cpus -report-bindings --mca plm_rsh_agent "qrsh" ./myid
would map 2 processes to each socket, binding each process to 5 HTs on that socket (since you told us to treat HTs as independent cpus). If you want us to bind to you 5 cores, then you need to remove that --use-hwthread-cpus directive.
As I said earlier in this thread, we are actively working with the OpenMP folks on a mechanism by which the two sides can coordinate these actions so it will be easier to get the desired behavior. For now, though, hopefully this will suffice.
Post by Heinz-Ado Arnolds
...
in your second case, there are 2 things
- MPI binds to socket, that is why two MPI tasks are assigned the same
hyperthreads
- the GNU OpenMP runtime looks unable to figure out 2 processes use the
same cores, and hence end up binding
the OpenMP threads to the same cores.
my best bet is you should bind a MPI tasks to 5 cores instead of one
socket.
i do not know the syntax off hand, and i am sure Ralph will help you
with that
Thanks, would be great if someone has that syntax.
Cheers,
Ado
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Heinz-Ado Arnolds
2017-04-24 10:13:41 UTC
Permalink
Dear Ralph,

thanks for this new hint. Unfortunately I don't see how that would fulfill all my requirements:

I like to have 8 OpenMPI jobs for 2 nodes -> 4 OpenMPI jobs per node -> 2 per socket, each executing one OpenMP job with 5 threads

mpirun -np 8 --map-by ppr:4:node:pe=5 ...

How can I connect this with the constraint 1 threat per core:

[pascal-3-06:14965] ... [B./B./B./B./B./../../../../..][../../../../../../../../../..]
[pascal-3-06:14965] ... [../../../../../B./B./B./B./B.][../../../../../../../../../..]
[pascal-3-06:14965] ... [../../../../../../../../../..][B./B./B./B./B./../../../../..]
[pascal-3-06:14965] ... [../../../../../../../../../..][../../../../../B./B./B./B./B./]
[pascal-3-07:21027] ... [B./B./B./B./B./../../../../..][../../../../../../../../../..]
[pascal-3-07:21027] ... [../../../../../B./B./B./B./B.][../../../../../../../../../..]
[pascal-3-07:21027] ... [../../../../../../../../../..][B./B./B./B./B./../../../../..]
[pascal-3-07:21027] ... [../../../../../../../../../..][../../../../../B./B./B./B./B./]

Cheers,

Ado
Sorry for delayed response. I’m glad that option solved the problem. We’ll have to look at that configure option - shouldn’t be too hard.
mpirun --map-by ppr:1:core --bind-to hwthread
Ralph
Dear Ralph, dear Gilles,
thanks a lot for your help! The hints to use ":pe=<n>" and to install libnuma have been the keys to solve my problems.
Perhaps it would not be a bad idea to include --enable-libnuma in the configure help, and make it a default, so that one has to specify --disable-libnuma if he really likes to work without numactl. The option is already checked in configure (framework in opal/mca/hwloc/hwloc1112/hwloc/config/hwloc.m4).
One qestion remains: I now get a binding like
[pascal-3-06:03036] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]]: [BB/BB/BB/BB/BB/../../../../..][../../../../../../../../../..]
and OpenMP uses just "hwt 0" of each core, what is very welcome. But is there a way to get a binding like
[pascal-3-06:03036] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]]: [B./B./B./B./B./../../../../..][../../../../../../../../../..]
from OpenMPI directly?
Cheers and thanks again,
Ado
Yeah, we need libnuma to set the memory binding. There is a param to turn off the warning if installing libnuma is problematic, but it helps your performance if the memory is kept local to the proc
Post by Heinz-Ado Arnolds
Dear Ralph,
thanks a lot for this valuable advice. Binding now works like expected!
Since adding the ":pe=" option I'm getting warnings
WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.
Node: pascal-1-05
...
even if I choose parameters so that binding is like exactly as before without ":pe=". I don't have libnuma installed on the cluster. Might that really be the cause of the warning?
Thanks a lot, have a nice Easter days
Ado
Post by r***@open-mpi.org
mpirun -np 8 --map-by ppr:2:socket:pe=5 --use-hwthread-cpus -report-bindings --mca plm_rsh_agent "qrsh" ./myid
would map 2 processes to each socket, binding each process to 5 HTs on that socket (since you told us to treat HTs as independent cpus). If you want us to bind to you 5 cores, then you need to remove that --use-hwthread-cpus directive.
As I said earlier in this thread, we are actively working with the OpenMP folks on a mechanism by which the two sides can coordinate these actions so it will be easier to get the desired behavior. For now, though, hopefully this will suffice.
Post by Heinz-Ado Arnolds
...
in your second case, there are 2 things
- MPI binds to socket, that is why two MPI tasks are assigned the same
hyperthreads
- the GNU OpenMP runtime looks unable to figure out 2 processes use the
same cores, and hence end up binding
the OpenMP threads to the same cores.
my best bet is you should bind a MPI tasks to 5 cores instead of one
socket.
i do not know the syntax off hand, and i am sure Ralph will help you
with that
Thanks, would be great if someone has that syntax.
Cheers,
Ado
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
r***@open-mpi.org
2017-04-24 13:55:28 UTC
Permalink
I’m afraid none of the current options is going to do that right now. I’ll put a note on my to-do list to look at this, but I can’t promise when I’ll get to it.
Post by Heinz-Ado Arnolds
Dear Ralph,
I like to have 8 OpenMPI jobs for 2 nodes -> 4 OpenMPI jobs per node -> 2 per socket, each executing one OpenMP job with 5 threads
mpirun -np 8 --map-by ppr:4:node:pe=5 ...
[pascal-3-06:14965] ... [B./B./B./B./B./../../../../..][../../../../../../../../../..]
[pascal-3-06:14965] ... [../../../../../B./B./B./B./B.][../../../../../../../../../..]
[pascal-3-06:14965] ... [../../../../../../../../../..][B./B./B./B./B./../../../../..]
[pascal-3-06:14965] ... [../../../../../../../../../..][../../../../../B./B./B./B./B./]
[pascal-3-07:21027] ... [B./B./B./B./B./../../../../..][../../../../../../../../../..]
[pascal-3-07:21027] ... [../../../../../B./B./B./B./B.][../../../../../../../../../..]
[pascal-3-07:21027] ... [../../../../../../../../../..][B./B./B./B./B./../../../../..]
[pascal-3-07:21027] ... [../../../../../../../../../..][../../../../../B./B./B./B./B./]
Cheers,
Ado
Post by r***@open-mpi.org
Sorry for delayed response. I’m glad that option solved the problem. We’ll have to look at that configure option - shouldn’t be too hard.
mpirun --map-by ppr:1:core --bind-to hwthread
Ralph
Dear Ralph, dear Gilles,
thanks a lot for your help! The hints to use ":pe=<n>" and to install libnuma have been the keys to solve my problems.
Perhaps it would not be a bad idea to include --enable-libnuma in the configure help, and make it a default, so that one has to specify --disable-libnuma if he really likes to work without numactl. The option is already checked in configure (framework in opal/mca/hwloc/hwloc1112/hwloc/config/hwloc.m4).
One qestion remains: I now get a binding like
[pascal-3-06:03036] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]]: [BB/BB/BB/BB/BB/../../../../..][../../../../../../../../../..]
and OpenMP uses just "hwt 0" of each core, what is very welcome. But is there a way to get a binding like
[pascal-3-06:03036] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]]: [B./B./B./B./B./../../../../..][../../../../../../../../../..]
from OpenMPI directly?
Cheers and thanks again,
Ado
Yeah, we need libnuma to set the memory binding. There is a param to turn off the warning if installing libnuma is problematic, but it helps your performance if the memory is kept local to the proc
Post by Heinz-Ado Arnolds
Dear Ralph,
thanks a lot for this valuable advice. Binding now works like expected!
Since adding the ":pe=" option I'm getting warnings
WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.
Node: pascal-1-05
...
even if I choose parameters so that binding is like exactly as before without ":pe=". I don't have libnuma installed on the cluster. Might that really be the cause of the warning?
Thanks a lot, have a nice Easter days
Ado
Post by r***@open-mpi.org
mpirun -np 8 --map-by ppr:2:socket:pe=5 --use-hwthread-cpus -report-bindings --mca plm_rsh_agent "qrsh" ./myid
would map 2 processes to each socket, binding each process to 5 HTs on that socket (since you told us to treat HTs as independent cpus). If you want us to bind to you 5 cores, then you need to remove that --use-hwthread-cpus directive.
As I said earlier in this thread, we are actively working with the OpenMP folks on a mechanism by which the two sides can coordinate these actions so it will be easier to get the desired behavior. For now, though, hopefully this will suffice.
Post by Heinz-Ado Arnolds
...
in your second case, there are 2 things
- MPI binds to socket, that is why two MPI tasks are assigned the same
hyperthreads
- the GNU OpenMP runtime looks unable to figure out 2 processes use the
same cores, and hence end up binding
the OpenMP threads to the same cores.
my best bet is you should bind a MPI tasks to 5 cores instead of one
socket.
i do not know the syntax off hand, and i am sure Ralph will help you
with that
Thanks, would be great if someone has that syntax.
Cheers,
Ado
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Loading...