Discussion:
[OMPI users] OMPI 3.0.0 crashing at mpi_init on OS X using Fortran
Ricardo Parreira de Azambuja Fonseca
2017-12-11 20:43:33 UTC
Permalink
Hi guys

I’m having problems with a Fortran based code that I develop with
OpenMPI 3.0.0 on Mac OS X. The problem shows itself with both gfortran
and intel ifort compilers, and it runs perfectly with version 2.1.2 (and
earlier versions).

Launching the code, even without using mpiexec, causes a segfault when
my code calls mpi_init()

Program received signal SIGSEGV: Segmentation fault - invalid memory
reference.

Backtrace for this error:
#0 0x1107a41fc
(…)
#10 0x10f86eff1
Segmentation fault: 11

Recompiling OpenMPI with —enable-debug, and launching the code through
lldb gives:

(lldb) run
Process 65169 launched: '../source/build/osiris.e' (x86_64)
Process 65169 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason =
EXC_BAD_ACCESS (code=1, address=0x48)
frame #0: 0x0000000100fbe79a
libmpi.40.dylib`ompi_hook_base_mpi_init_top_post_opal(argc=0,
argv=0x0000000000000000, requested=0, provided=0x00007ffeefbfe290) at
hook_base.c:278
275
276 void ompi_hook_base_mpi_init_top_post_opal(int argc, char
**argv, int requested, int *provided)
277 {
-> 278 HOOK_CALL_COMMON( mpi_init_top_post_opal, argc, argv,
requested, provided);
279 }
280
281 void ompi_hook_base_mpi_init_bottom(int argc, char **argv, int
requested, int *provided)
Target 0: (osiris.e) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason =
EXC_BAD_ACCESS (code=1, address=0x48)
* frame #0: 0x0000000100fbe79a
libmpi.40.dylib`ompi_hook_base_mpi_init_top_post_opal(argc=0,
argv=0x0000000000000000, requested=0, provided=0x00007ffeefbfe290) at
hook_base.c:278
frame #1: 0x0000000100dce0ff libmpi.40.dylib`ompi_mpi_init(argc=0,
argv=0x0000000000000000, requested=0, provided=0x00007ffeefbfe290) at
ompi_mpi_init.c:486
frame #2: 0x0000000100eb3f38
libmpi.40.dylib`PMPI_Init(argc=0x00007ffeefbfe2d0,
argv=0x00007ffeefbfe2c8) at pinit.c:66
frame #3: 0x0000000100cceb0b
libmpi_mpifh.40.dylib`ompi_init_f(ierr=0x00007ffeefbfe9f8) at
init_f.c:84
frame #4: 0x0000000100ccead5
libmpi_mpifh.40.dylib`mpi_init_(ierr=0x00007ffeefbfe9f8) at init_f.c:65
frame #5: 0x0000000100004e5a osiris.e`__m_system_MOD_system_init at
os-sys-multi.f03:323
frame #6: 0x000000010036edb5 osiris.e`MAIN__ at os-main.f03:36
frame #7: 0x000000010039eff2 osiris.e`main at memory.h:19
frame #8: 0x00007fff6ee7d115 libdyld.dylib`start + 1

Any thoughts?

Thanks in advance,
Ricardo


Ricardo Fonseca

Full Professor | Professor Catedrático
GoLP - Grupo de Lasers e Plasmas
Instituto de Plasmas e Fusão Nuclear
Instituto Superior Técnico
Av. Rovisco Pais
1049-001 Lisboa
Portugal

tel: +351 21 8419202
web: http://epp.tecnico.ulisboa.pt/
r***@open-mpi.org
2017-12-12 04:21:14 UTC
Permalink
FWIW: I just cloned the v3.0.x branch to get the latest 3.0.1 release candidate, built and ran it on Mac OSX High Sierra. Everything built and ran fine for both C and Fortran codes.

You might want to test the same - could be this was already fixed.
Post by Ricardo Parreira de Azambuja Fonseca
Hi guys
I’m having problems with a Fortran based code that I develop with OpenMPI 3.0.0 on Mac OS X. The problem shows itself with both gfortran and intel ifort compilers, and it runs perfectly with version 2.1.2 (and earlier versions).
Launching the code, even without using mpiexec, causes a segfault when my code calls mpi_init()
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
#0 0x1107a41fc
(…)
#10 0x10f86eff1
Segmentation fault: 11
(lldb) run
Process 65169 launched: '../source/build/osiris.e' (x86_64)
Process 65169 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x48)
frame #0: 0x0000000100fbe79a libmpi.40.dylib`ompi_hook_base_mpi_init_top_post_opal(argc=0, argv=0x0000000000000000, requested=0, provided=0x00007ffeefbfe290) at hook_base.c:278
275
276 void ompi_hook_base_mpi_init_top_post_opal(int argc, char **argv, int requested, int *provided)
277 {
-> 278 HOOK_CALL_COMMON( mpi_init_top_post_opal, argc, argv, requested, provided);
279 }
280
281 void ompi_hook_base_mpi_init_bottom(int argc, char **argv, int requested, int *provided)
Target 0: (osiris.e) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x48)
* frame #0: 0x0000000100fbe79a libmpi.40.dylib`ompi_hook_base_mpi_init_top_post_opal(argc=0, argv=0x0000000000000000, requested=0, provided=0x00007ffeefbfe290) at hook_base.c:278
frame #1: 0x0000000100dce0ff libmpi.40.dylib`ompi_mpi_init(argc=0, argv=0x0000000000000000, requested=0, provided=0x00007ffeefbfe290) at ompi_mpi_init.c:486
frame #2: 0x0000000100eb3f38 libmpi.40.dylib`PMPI_Init(argc=0x00007ffeefbfe2d0, argv=0x00007ffeefbfe2c8) at pinit.c:66
frame #3: 0x0000000100cceb0b libmpi_mpifh.40.dylib`ompi_init_f(ierr=0x00007ffeefbfe9f8) at init_f.c:84
frame #4: 0x0000000100ccead5 libmpi_mpifh.40.dylib`mpi_init_(ierr=0x00007ffeefbfe9f8) at init_f.c:65
frame #5: 0x0000000100004e5a osiris.e`__m_system_MOD_system_init at os-sys-multi.f03:323
frame #6: 0x000000010036edb5 osiris.e`MAIN__ at os-main.f03:36
frame #7: 0x000000010039eff2 osiris.e`main at memory.h:19
frame #8: 0x00007fff6ee7d115 libdyld.dylib`start + 1
Any thoughts?
Thanks in advance,
Ricardo

Ricardo Fonseca
Full Professor | Professor Catedrático
GoLP - Grupo de Lasers e Plasmas
Instituto de Plasmas e Fusão Nuclear
Instituto Superior Técnico
Av. Rovisco Pais
1049-001 Lisboa
Portugal
tel: +351 21 8419202
web: http://epp.tecnico.ulisboa.pt/
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Jeff Squyres (jsquyres)
2017-12-12 20:00:25 UTC
Permalink
I am unable to reproduce your error with Open MPI v3.0.0 on the latest stable MacOS High Sierra.

Given that you're failing in MPI_INIT, it feels like the application shouldn't matter. But regardless, can you test with the trivial Fortran test programs in the examples/ directory in the Open MPI tarball?
Post by r***@open-mpi.org
FWIW: I just cloned the v3.0.x branch to get the latest 3.0.1 release candidate, built and ran it on Mac OSX High Sierra. Everything built and ran fine for both C and Fortran codes.
You might want to test the same - could be this was already fixed.
Post by Ricardo Parreira de Azambuja Fonseca
Hi guys
I’m having problems with a Fortran based code that I develop with OpenMPI 3.0.0 on Mac OS X. The problem shows itself with both gfortran and intel ifort compilers, and it runs perfectly with version 2.1.2 (and earlier versions).
Launching the code, even without using mpiexec, causes a segfault when my code calls mpi_init()
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
#0 0x1107a41fc
(…)
#10 0x10f86eff1
Segmentation fault: 11
(lldb) run
Process 65169 launched: '../source/build/osiris.e' (x86_64)
Process 65169 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x48)
frame #0: 0x0000000100fbe79a libmpi.40.dylib`ompi_hook_base_mpi_init_top_post_opal(argc=0, argv=0x0000000000000000, requested=0, provided=0x00007ffeefbfe290) at hook_base.c:278
275
276 void ompi_hook_base_mpi_init_top_post_opal(int argc, char **argv, int requested, int *provided)
277 {
-> 278 HOOK_CALL_COMMON( mpi_init_top_post_opal, argc, argv, requested, provided);
279 }
280
281 void ompi_hook_base_mpi_init_bottom(int argc, char **argv, int requested, int *provided)
Target 0: (osiris.e) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x48)
* frame #0: 0x0000000100fbe79a libmpi.40.dylib`ompi_hook_base_mpi_init_top_post_opal(argc=0, argv=0x0000000000000000, requested=0, provided=0x00007ffeefbfe290) at hook_base.c:278
frame #1: 0x0000000100dce0ff libmpi.40.dylib`ompi_mpi_init(argc=0, argv=0x0000000000000000, requested=0, provided=0x00007ffeefbfe290) at ompi_mpi_init.c:486
frame #2: 0x0000000100eb3f38 libmpi.40.dylib`PMPI_Init(argc=0x00007ffeefbfe2d0, argv=0x00007ffeefbfe2c8) at pinit.c:66
frame #3: 0x0000000100cceb0b libmpi_mpifh.40.dylib`ompi_init_f(ierr=0x00007ffeefbfe9f8) at init_f.c:84
frame #4: 0x0000000100ccead5 libmpi_mpifh.40.dylib`mpi_init_(ierr=0x00007ffeefbfe9f8) at init_f.c:65
frame #5: 0x0000000100004e5a osiris.e`__m_system_MOD_system_init at os-sys-multi.f03:323
frame #6: 0x000000010036edb5 osiris.e`MAIN__ at os-main.f03:36
frame #7: 0x000000010039eff2 osiris.e`main at memory.h:19
frame #8: 0x00007fff6ee7d115 libdyld.dylib`start + 1
Any thoughts?
Thanks in advance,
Ricardo

Ricardo Fonseca
Full Professor | Professor Catedrático
GoLP - Grupo de Lasers e Plasmas
Instituto de Plasmas e Fusão Nuclear
Instituto Superior Técnico
Av. Rovisco Pais
1049-001 Lisboa
Portugal
tel: +351 21 8419202
web: http://epp.tecnico.ulisboa.pt/
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
--
Jeff Squyres
***@cisco.com

Loading...