Thanks, Tom. I did try using the mpirun âbind-to-core option and confirmed that individual MPI processes were placed on unique cores (also without other interfering MPI runs); however, it did not make a significant difference. That said, I do agree that turning off hyper-threading is an important test to rule out any fundamental differences that may be at play. Iâll turn off hyper-threading and let you know what I find.
Best regards,
Andy
On Feb 6, 2017, at 10:44 AM, Elken, Tom <***@intel.com> wrote:
âc.) the workstation is hyper threaded and cluster is notâ
You might turn off hyperthreading (HT) on the workstation, and re-run.
Iâve seen some OSâs on some systems get confused and assign multiple OS âcpusâ to the same HW core/thread.
In any case, if you turn HT off, and top shows you that tasks are running on different âcpusâ, you can be sure they are running on different cores, and less likely to interfere with each other.
-Tom
From: users [mailto:users-***@lists.open-mpi.org] On Behalf Of Andy Witzig
Sent: Monday, February 06, 2017 8:25 AM
To: Open MPI Users <***@lists.open-mpi.org>
Subject: Re: [OMPI users] Performance Issues on SMP Workstation
Hi all,
My apologies for not replying sooner on this issue - Iâve been swamped with other tasking. Hereâs my latest:
1.) I have looked deep into bindings on both systems (used --report-bindings option) and nothing came to light. Iâve tried multiple variations on bindings settings and only minor improvements were made on the workstation.
2.) I used the mpirun --tag-output grep Cpus_allowed_list /proc/self/status command and everything was in order on both systems.
3.) I used ompi_info -c (per recommendation of Penguin Computing support staff) and looked at the differences in configuration. Iâm pasting the output below for reference. The only settings in the cluster configuration that were not present in the workstation configuration were: --enable-__cxa_atexit, --disable-libunwind-exceptions, and --disable-dssi. There were several settings present in the workstation configuration that were not set in the cluster configuration. Any reason why the same version of OpenMPI would have such different settings?
3.) I used hwloc and lstopo to compare system hardware and confirmed that the workstation has either equivalent or superior specs to the cluster node setup.
3.) Primary differences I can see right now are:
a.) OpenMPI 1.6.4 was compiled using gcc 4.4.7 on the cluster and I am compiling with gcc 5.4.0 on the workstation;
b.) OpenMPI compile configurations are different;
b.) the cluster uses Torque/PBS to submit the jobs and;
c.) the workstation is hyper threaded and cluster is not
d.) Workstation runs on Ubuntu while cluster runs on CentOS
My next steps will be to compile/install gcc 4.4.7 on the Workstation and recompile OpenMPI 1.6.4 to ensure the software configuration is equivalent, and do my best to replicate the cluster configuration settings. I will also look into the profiling tools that Christoph mentioned and see if any details come to light.
Thanks much,
Andy
---------------------------WORKSTATION OMPI_INFO -C OUTPUT---------------------------
Using built-in specs.
COLLECT_GCC=/usr/bin/gfortran
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/5/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v
--with-pkgversion='Ubuntu 5.4.0-6ubuntu1~16.04.4'
--with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs <file://///usr/share/doc/gcc-5/README.Bugs>
--enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++
--prefix=/usr
--program-suffix=-5
--enable-shared
--enable-linker-build-id
--libexecdir=/usr/lib
--without-included-gettext
--enable-threads=posix
--libdir=/usr/lib
--enable-nls
--with-sysroot=/
--enable-clocale=gnu
--enable-libstdcxx-debug
--enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new
--enable-gnu-unique-object
--disable-vtable-verify
--enable-libmpx
--enable-plugin
--with-system-zlib
--disable-browser-plugin
--enable-java-awt=gtk
--enable-gtk-cairo
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-amd64/jre
--enable-java-home
--with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-amd64
--with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-amd64
--with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar
--enable-objc-gc
--enable-multiarch
--disable-werror
--with-arch-32=i686
--with-abi=m64
--with-multilib-list=m32,m64,mx32
--enable-multilib
--with-tune=generic
--enable-checking=release
--build=x86_64-linux-gnu
--host=x86_64-linux-gnu
--target=x86_64-linux-gnu
Thread model: posix
gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4)
---------------------------CLUSTER OMPI_INFO -C OUTPUT---------------------------
Using built-in specs.
Target: x86_64-redhat-linux
Configured with: ./configure
--prefix=/public/apps/gcc/4.4.7
--enable-shared
--enable-threads=posix
--enable-checking=release
--with-system-zlib
--enable-__cxa_atexit
--disable-libunwind-exceptions
--enable-gnu-unique-object
--disable-dssi
--with-arch_32=i686
--build=x86_64-redhat-linux build_alias=x86_64-redhat-linux
--enable-languages=c,c++,fortran,objc,obj-c++
Thread model: posix
gcc version 4.4.7 (GCC)
On Feb 2, 2017, at 5:28 AM, Gilles Gouaillardet <***@gmail.com <mailto:***@gmail.com>> wrote:
i cannot remember what is the default binding (if any) on Open MPI 1.6
nor whether the default is the same with or without PBS
you can simply
mpirun --tag-output grep Cpus_allowed_list /proc/self/status
and see if you note any discrepancy between your systems
you might also consider upgrading to the latest Open MPI 2.0.2, and see how things gi
Cheers,
Gilles
On Thursday, February 2, 2017, <***@hlrs.de <mailto:***@hlrs.de>> wrote:
Hello Andy,
You can also use the --report-bindings option of mpirun to check which cores
your program will use and to which cores the processes are bound to.
Are you using the same backend compiler on both systems?
Do you have performance tools available on the systems where you can see in
which part of the Program the time is lost? Common tools would be Score-P/
Vampir/CUBE, TAU, extrae/Paraver.
Best
Christoph
Thank you, Bennet. From my testing, I?ve seen that the application usually
performs better at much smaller ranks on the workstation. I?ve tested on
the cluster and do not see the same response (i.e. see better performance
with ranks of -np 15 or 20). The workstation is not shared and is not
doing any other work. I ran the application on the Workstation with top
and confirmed that 20 procs were fully loaded.
I?ll look into the diagnostics you mentioned and get back with you.
Best regards,
Andy
How do they compare if you run a much smaller number of ranks, say -np 2 or 4?
Is the workstation shared and doing any other work?
You could insert some diagnostics into your script, for example,
uptime and free, both before and after running your MPI program and
compare.
You could also run top in batch mode in the background for your own
username, then run your MPI program, and compare the results from top.
We've seen instances where the MPI ranks only get distributed to a
small number of processors, which you see if they all have small
percentages of CPU.
Just flailing in the dark...
-- bennet
Post by Andy WitzigThank for the idea. I did the test and only get a single host.
Thanks,
Andy
Simple test: replace your executable with ?hostname?. If you see multiple
hosts come out on your cluster, then you know why the performance is
different.
Honestly, I?m not exactly sure what scheme is being used. I am using the
#PBS -S /bin/bash
#PBS -q T30
#PBS -l walltime=24:00:00,nodes=1:ppn=20
#PBS -j oe
#PBS -N test
#PBS -r n
mpirun $EXECUTABLE $INPUT_FILE
I?m not configuring OpenMPI anywhere else. It is possible the Penguin
Computing folks have pre-configured my MPI environment. I?ll see what I
can find.
Best regards,
Andy
Andy,
What allocation scheme are you using on the cluster. For some codes we see
noticeable differences using fillup vs round robin, not 4x though. Fillup
is more shared memory use while round robin uses more infinniband.
Doug
Hi Tom,
The cluster uses an Infiniband interconnect. On the cluster I?m
requesting: #PBS -l walltime=24:00:00,nodes=1:ppn=20. So technically,
the run on the cluster should be SMP on the node, since there are 20
cores/node. On the workstation I?m just using the command: mpirun -np 20
?. I haven?t finished setting Torque/PBS up yet.
Best regards,
Andy
For this case: " a cluster system with 2.6GHz Intel Haswell with 20 cores
/ node and 128GB RAM/node. "
are you running 5 ranks per node on 4 nodes?
What interconnect are you using for the cluster?
-Tom
-----Original Message-----
Witzig
Sent: Wednesday, February 01, 2017 1:37 PM
To: Open MPI Users
Subject: Re: [OMPI users] Performance Issues on SMP Workstation
By the way, the workstation has a total of 36 cores / 72 threads, so using mpirun
-np 20 is possible (and should be equivalent) on both platforms.
Thanks,
cap79
Hi all,
I?m testing my application on a SMP workstation (dual Intel Xeon E5-2697
V4
2.3 GHz Intel Broadwell (boost 2.8-3.1GHz) processors 128GB RAM) and am
seeing a 4x performance drop compared to a cluster system with 2.6GHz Intel
Haswell with 20 cores / node and 128GB RAM/node. Both applications have
mpirun -np 20 $EXECUTABLE $INPUT_FILE
mpirun -np 20 --mca btl self,sm $EXECUTABLE $INPUT_FILE
and others, but cannot achieve the same performance on the workstation as is
seen on the cluster. The workstation outperforms on other non-MPI but multi-
threaded applications, so I don?t think it?s a hardware issue.
Any help you can provide would be appreciated.
Thanks,
cap79
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
***@lists.open-mpi.org <javascript:;>
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>