Noam Bernstein
2018-06-22 16:33:29 UTC
Hi - for the last couple of weeks, more or less since we did some kernel updates, certain compute intensive MPI jobs have been behaving oddly as far as their speed - bits that should be quite fast sometimes (but not consistently) take a long time, and re-running sometimes fixes the issue, sometimes not. Iâm starting to suspect core binding problems, which I worry will be difficult to debug, so I hoped to get some feedback on whether my observations are indeed suggesting that thereâs something wrong with the core binding.
Iâm running withe CentOS 6 latest kernel (2.6.32-696.30.1.el6.x86_64), OpenMPI 3.1.0, a dual cpu 8 core + HT intel Xeon node. Code is compiled with ifort, using â-mkl=sequentialâ, and just to be certain OMP_NUM_THREADS=1, so there should be no OpenMP parallelism.
The main question is if Iâm running 16 MPI tasks per node and look at the PSR field from ps, should I get some simple sequence of numbers?
Hereâs the beginning of the output report on the per-core binding I requested from mpirun (âbind-to core)
[compute-7-2:31036] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: [BB/../../../../../../..][../../../../../../../..]
[compute-7-2:31036] MCW rank 1 bound to socket 1[core 8[hwt 0-1]]: [../../../../../../../..][BB/../../../../../../..]
[compute-7-2:31036] MCW rank 2 bound to socket 0[core 1[hwt 0-1]]: [../BB/../../../../../..][../../../../../../../..]
[compute-7-2:31036] MCW rank 3 bound to socket 1[core 9[hwt 0-1]]: [../../../../../../../..][../BB/../../../../../..]
[compute-7-2:31036] MCW rank 4 bound to socket 0[core 2[hwt 0-1]]: [../../BB/../../../../..][../../../../../../../..]
[compute-7-2:31036] MCW rank 5 bound to socket 1[core 10[hwt 0-1]]: [../../../../../../../..][../../BB/../../../../..]
[compute-7-2:31036] MCW rank 6 bound to socket 0[core 3[hwt 0-1]]: [../../../BB/../../../..][../../../../../../../..]
This is the PSR info from ps
PID PSR TTY TIME CMD
31043 1 ? 00:00:34 vasp.para.intel
31045 2 ? 00:00:34 vasp.para.intel
31047 3 ? 00:00:34 vasp.para.intel
31049 4 ? 00:00:34 vasp.para.intel
31051 5 ? 00:00:34 vasp.para.intel
31055 7 ? 00:00:34 vasp.para.intel
31042 8 ? 00:00:34 vasp.para.intel
31046 10 ? 00:00:34 vasp.para.intel
31048 11 ? 00:00:34 vasp.para.intel
31052 13 ? 00:00:34 vasp.para.intel
31054 14 ? 00:00:34 vasp.para.intel
31053 22 ? 00:00:34 vasp.para.intel
31044 25 ? 00:00:34 vasp.para.intel
31050 28 ? 00:00:34 vasp.para.intel
31056 31 ? 00:00:34 vasp.para.intel
Does this output look reasonable? For any sensible way I can think of to enumerate the 32 virtual cores, those numbers donât seem to correspond to one mpi task per core. If this isnât supposed to be giving meaningful output given how openmpi does its binding, is there another tool that can tell me what cores a running job is actually running on/bound to?
An additional bit of confusion is that "ps -mo pid,tid,fname,user,psr -p PIDâ on one of those processes (which is supposed to be running without threaded parallelism) reports 3 separate TID (which I think correspond to threads), with 3 different PSR values, that seem stable during the run, but donât have any connection to one another (not P and P+1, or P and P+8, or P and P+16).
thanks,
Noam
Iâm running withe CentOS 6 latest kernel (2.6.32-696.30.1.el6.x86_64), OpenMPI 3.1.0, a dual cpu 8 core + HT intel Xeon node. Code is compiled with ifort, using â-mkl=sequentialâ, and just to be certain OMP_NUM_THREADS=1, so there should be no OpenMP parallelism.
The main question is if Iâm running 16 MPI tasks per node and look at the PSR field from ps, should I get some simple sequence of numbers?
Hereâs the beginning of the output report on the per-core binding I requested from mpirun (âbind-to core)
[compute-7-2:31036] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: [BB/../../../../../../..][../../../../../../../..]
[compute-7-2:31036] MCW rank 1 bound to socket 1[core 8[hwt 0-1]]: [../../../../../../../..][BB/../../../../../../..]
[compute-7-2:31036] MCW rank 2 bound to socket 0[core 1[hwt 0-1]]: [../BB/../../../../../..][../../../../../../../..]
[compute-7-2:31036] MCW rank 3 bound to socket 1[core 9[hwt 0-1]]: [../../../../../../../..][../BB/../../../../../..]
[compute-7-2:31036] MCW rank 4 bound to socket 0[core 2[hwt 0-1]]: [../../BB/../../../../..][../../../../../../../..]
[compute-7-2:31036] MCW rank 5 bound to socket 1[core 10[hwt 0-1]]: [../../../../../../../..][../../BB/../../../../..]
[compute-7-2:31036] MCW rank 6 bound to socket 0[core 3[hwt 0-1]]: [../../../BB/../../../..][../../../../../../../..]
This is the PSR info from ps
PID PSR TTY TIME CMD
31043 1 ? 00:00:34 vasp.para.intel
31045 2 ? 00:00:34 vasp.para.intel
31047 3 ? 00:00:34 vasp.para.intel
31049 4 ? 00:00:34 vasp.para.intel
31051 5 ? 00:00:34 vasp.para.intel
31055 7 ? 00:00:34 vasp.para.intel
31042 8 ? 00:00:34 vasp.para.intel
31046 10 ? 00:00:34 vasp.para.intel
31048 11 ? 00:00:34 vasp.para.intel
31052 13 ? 00:00:34 vasp.para.intel
31054 14 ? 00:00:34 vasp.para.intel
31053 22 ? 00:00:34 vasp.para.intel
31044 25 ? 00:00:34 vasp.para.intel
31050 28 ? 00:00:34 vasp.para.intel
31056 31 ? 00:00:34 vasp.para.intel
Does this output look reasonable? For any sensible way I can think of to enumerate the 32 virtual cores, those numbers donât seem to correspond to one mpi task per core. If this isnât supposed to be giving meaningful output given how openmpi does its binding, is there another tool that can tell me what cores a running job is actually running on/bound to?
An additional bit of confusion is that "ps -mo pid,tid,fname,user,psr -p PIDâ on one of those processes (which is supposed to be running without threaded parallelism) reports 3 separate TID (which I think correspond to threads), with 3 different PSR values, that seem stable during the run, but donât have any connection to one another (not P and P+1, or P and P+8, or P and P+16).
thanks,
Noam