Discussion:
[OMPI users] Abort/ Deadlock issue in allreduce (Gilles Gouaillardet)
Christof Koehler
2016-12-12 19:25:00 UTC
Permalink
Hello,

yes, I already tried the 2.0.x git branch with the original problem. It
now dies quite noisy

forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
vasp-mpi-sca 00000000040DD64D Unknown Unknown Unknown
...
...
...
mpirun has exited due to process rank 0 with PID 0 on
node node109 exiting improperly. There are three reasons this could
occur:
...
...

but apparently does not hang any more.

Thanks to everyone involved for fixing this !

Best Regards

Christof
Send users mailing list submissions to
To subscribe or unsubscribe via the World Wide Web, visit
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
or, via email, send a message with subject or body 'help' to
You can reach the person managing the list at
When replying, please edit your Subject line so it is more specific
than "Re: Contents of users digest..."
1. Re: Abort/ Deadlock issue in allreduce (Gilles Gouaillardet)
2. Re: How to yield CPU more when not computing (was curious
behavior during wait for broadcast: 100% cpu) (Dave Love)
----------------------------------------------------------------------
Message: 1
Date: Mon, 12 Dec 2016 09:32:25 +0900
Subject: Re: [OMPI users] Abort/ Deadlock issue in allreduce
Content-Type: text/plain; charset="windows-1252"; Format="flowed"
Christof,
Ralph fixed the issue,
meanwhile, the patch can be manually downloaded at
https://patch-diff.githubusercontent.com/raw/open-mpi/ompi/pull/2552.patch
Cheers,
Gilles
Hello,
our case is. The libwannier.a is a "third party"
library which is built seperately and the just linked in. So the vasp
preprocessor never touches it. As far as I can see no preprocessing of
the f90 source is involved in the libwannier build process.
I finally managed to set a breakpoint at the program exit of the root
(gdb) bt
#0 0x00002b7ccd2e4220 in _exit () from /lib64/libc.so.6
#1 0x00002b7ccd25ee2b in __run_exit_handlers () from /lib64/libc.so.6
#2 0x00002b7ccd25eeb5 in exit () from /lib64/libc.so.6
#3 0x000000000407298d in for_stop_core ()
#4 0x00000000012fad41 in w90_io_mp_io_error_ ()
#5 0x0000000001302147 in w90_parameters_mp_param_read_ ()
#6 0x00000000012f49c6 in wannier_setup_ ()
#7 0x0000000000e166a8 in mlwf_mp_mlwf_wannier90_ ()
#8 0x00000000004319ff in vamp () at main.F:2640
#9 0x000000000040d21e in main ()
#10 0x00002b7ccd247b15 in __libc_start_main () from /lib64/libc.so.6
#11 0x000000000040d129 in _start ()
So for_stop_core is called apparently ? Of course it is below the main()
process of vasp, so additional things might happen which are not
visible. Is SIGCHILD (as observed when catching signals in mpirun) the
signal expectd after a for_stop_core ?
Thank you very much for investigating this !
Cheers
Christof
Christof,
There is something really odd with this stack trace.
count is zero, and some pointers do not point to valid addresses (!)
in OpenMPI, MPI_Allreduce(...,count=0,...) is a no-op, so that suggests that
the stack has been corrupted inside MPI_Allreduce(), or that you are not using the library you think you use
pmap <pid> will show you which lib is used
btw, this was not started with
mpirun --mca coll ^tuned ...
right ?
just to make it clear ...
a task from your program bluntly issues a fortran STOP, and this is kind of a feature.
the *only* issue is mpirun does not kill the other MPI tasks and mpirun never completes.
did i get it right ?
I just ran across very similar behavior in VASP (which we just switched over to openmpi 2.0.1), also in a allreduce + STOP combination (some nodes call one, others call the other), and I discovered several interesting things.
The most important is that when MPI is active, the preprocessor converts (via a #define in symbol.inc) fortran STOP into calls to m_exit() (defined in mpi.F), which is a wrapper around mpi_finalize. So in my case some processes in the communicator call mpi_finalize, others call mpi_allreduce. I?m not really surprised this hangs, because I think the correct thing to replace STOP with is mpi_abort, not mpi_finalize. If you know where the STOP is called, you can check the preprocessed equivalent file (.f90 instead of .F), and see if it?s actually been replaced with a call to m_exit. I?m planning to test whether replacing m_exit with m_stop in symbol.inc gives more sensible behavior, i.e. program termination when the original source file executes a STOP.
(gdb) where
#0 0x00002b8d5a095ec6 in opal_progress () from /usr/local/openmpi/2.0.1/x86_64/ib/intel/12.1.6/lib/libopen-pal.so.20
#1 0x00002b8d59b3a36d in ompi_request_default_wait_all () from /usr/local/openmpi/2.0.1/x86_64/ib/intel/12.1.6/lib/libmpi.so.20
#2 0x00002b8d59b8107c in ompi_coll_base_allreduce_intra_recursivedoubling () from /usr/local/openmpi/2.0.1/x86_64/ib/intel/12.1.6/lib/libmpi.so.20
#3 0x00002b8d59b495ac in PMPI_Allreduce () from /usr/local/openmpi/2.0.1/x86_64/ib/intel/12.1.6/lib/libmpi.so.20
#4 0x00002b8d598e4027 in pmpi_allreduce__ () from /usr/local/openmpi/2.0.1/x86_64/ib/intel/12.1.6/lib/libmpi_mpifh.so.20
#5 0x0000000000414077 in m_sum_i (comm=..., ivec=warning: Range for type (null) has invalid bounds 1..-12884901892
warning: Range for type (null) has invalid bounds 1..-12884901892
warning: Range for type (null) has invalid bounds 1..-12884901892
warning: Range for type (null) has invalid bounds 1..-12884901892
warning: Range for type (null) has invalid bounds 1..-12884901892
warning: Range for type (null) has invalid bounds 1..-12884901892
warning: Range for type (null) has invalid bounds 1..-12884901892
..., n=2) at mpi.F:989
#6 0x0000000000daac54 in full_kpoints::set_indpw_full (grid=..., wdes=..., kpoints_f=...) at mkpoints_full.F:1099
#7 0x0000000001441654 in set_indpw_fock (t_info=..., p=warning: Range for type (null) has invalid bounds 1..-1
warning: Range for type (null) has invalid bounds 1..-1
warning: Range for type (null) has invalid bounds 1..-1
warning: Range for type (null) has invalid bounds 1..-1
warning: Range for type (null) has invalid bounds 1..-1
warning: Range for type (null) has invalid bounds 1..-1
warning: Range for type (null) has invalid bounds 1..-1
..., wdes=..., grid=..., latt_cur=..., lmdim=Cannot access memory at address 0x1
) at fock.F:1669
#8 fock::setup_fock (t_info=..., p=warning: Range for type (null) has invalid bounds 1..-1
warning: Range for type (null) has invalid bounds 1..-1
warning: Range for type (null) has invalid bounds 1..-1
warning: Range for type (null) has invalid bounds 1..-1
warning: Range for type (null) has invalid bounds 1..-1
warning: Range for type (null) has invalid bounds 1..-1
warning: Range for type (null) has invalid bounds 1..-1
..., wdes=..., grid=..., latt_cur=..., lmdim=Cannot access memory at address 0x1
) at fock.F:1413
#9 0x0000000002976478 in vamp () at main.F:2093
#10 0x0000000000412f9e in main ()
#11 0x000000383a41ed1d in __libc_start_main () from /lib64/libc.so.6
#12 0x0000000000412ea9 in _start ()
#0 0x000000383a4acbdd in nanosleep () from /lib64/libc.so.6
#1 0x000000383a4e1d94 in usleep () from /lib64/libc.so.6
#2 0x00002b11db1e0ae7 in ompi_mpi_finalize () from /usr/local/openmpi/2.0.1/x86_64/ib/intel/12.1.6/lib/libmpi.so.20
#3 0x00002b11daf8b399 in pmpi_finalize__ () from /usr/local/openmpi/2.0.1/x86_64/ib/intel/12.1.6/lib/libmpi_mpifh.so.20
#4 0x00000000004199c5 in m_exit () at mpi.F:375
#5 0x0000000000dab17f in full_kpoints::set_indpw_full (grid=..., wdes=Cannot resolve DW_OP_push_object_address for a missing object
) at mkpoints_full.F:1065
#6 0x0000000001441654 in set_indpw_fock (t_info=..., p=Cannot resolve DW_OP_push_object_address for a missing object
) at fock.F:1669
#7 fock::setup_fock (t_info=..., p=Cannot resolve DW_OP_push_object_address for a missing object
) at fock.F:1413
#8 0x0000000002976478 in vamp () at main.F:2093
#9 0x0000000000412f9e in main ()
#10 0x000000383a41ed1d in __libc_start_main () from /lib64/libc.so.6
#11 0x0000000000412ea9 in _start ()
____________
||
|U.S. NAVAL|
|_RESEARCH_|
LABORATORY
Noam Bernstein, Ph.D.
Center for Materials Physics and Technology
U.S. Naval Research Laboratory
T +1 202 404 8628 F +1 202 404 7546
https://www.nrl.navy.mil <https://www.nrl.navy.mil/>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://rfd.newmexicoconsortium.org/mailman/private/users/attachments/20161212/193d2296/attachment.html>
------------------------------
Message: 2
Date: Mon, 12 Dec 2016 14:24:16 +0000
Subject: Re: [OMPI users] How to yield CPU more when not computing
(was curious behavior during wait for broadcast: 100% cpu)
Content-Type: text/plain; charset=iso-8859-1
Yes, as root, and there are N different systems to at least provide
unprivileged read access on HPC systems, but that's a bit different, I
think.
LIKWID[1] uses a daemon to provide limited RW access to MSRs for
applications. I wouldn't wonder if support for this was added to
LIKWID by RRZE.
Yes, that's one of the N I had in mind; others provide Linux modules.
From a system manager's point of view it's not clear what are the
implications of the unprivileged access, or even how much it really
helps. I've seen enough setups suggested for HPC systems in areas I
understand (and used by vendors) which allow privilege escalation more
or less trivially, maybe without any real operational advantage. If
it's clearly safe and helpful then great, but I couldn't assess that.
------------------------------
Subject: Digest Footer
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
------------------------------
End of users Digest, Vol 3674, Issue 1
**************************************
--
Dr. rer. nat. Christof Köhler email: ***@bccms.uni-bremen.de
Universitaet Bremen/ BCCMS phone: +49-(0)421-218-62334
Am Fallturm 1/ TAB/ Raum 3.12 fax: +49-(0)421-218-62770
28359 Bremen

PGP: http://www.bccms.uni-bremen.de/cms/people/c_koehler/
Loading...