Discussion:
[OMPI users] Segmentation Fault when using OpenMPI 1.10.6 and PGI 17.1.0 on POWER8
Hammond, Simon David (-EXP)
2017-02-22 03:39:16 UTC
Permalink
Hi OpenMPI Users,

Has anyone successfully tested OpenMPI 1.10.6 with PGI 17.1.0 on POWER8 with the LSF scheduler (—with-lsf=..)?

I am getting this error when the code hits MPI_Finalize. It causes the job to abort (i.e. exit the LSF session) when I am running interactively.

Are there any materials we can supply to aid debugging/problem isolation?

[white23:58788] *** Process received signal ***
[white23:58788] Signal: Segmentation fault (11)
[white23:58788] Signal code: Invalid permissions (2)
[white23:58788] Failing at address: 0x1000008e0810
[white23:58788] [ 0] [0x100000050478]
[white23:58788] [ 1] [0x0]
[white23:58788] [ 2] /home/projects/pwr8-rhel73-lsf/openmpi/1.10.6/pgi/17.1.0/cuda/none/lib/libopen-rte.so.12(+0x1b6b0)[0x10000071b6b0]
[white23:58788] [ 3] /home/projects/pwr8-rhel73-lsf/openmpi/1.10.6/pgi/17.1.0/cuda/none/lib/libopen-rte.so.12(orte_finalize+0x70)[0x10000071b5b8]
[white23:58788] [ 4] /home/projects/pwr8-rhel73-lsf/openmpi/1.10.6/pgi/17.1.0/cuda/none/lib/libmpi.so.12(ompi_mpi_finalize+0x760)[0x100000121dc8]
[white23:58788] [ 5] /home/projects/pwr8-rhel73-lsf/openmpi/1.10.6/pgi/17.1.0/cuda/none/lib/libmpi.so.12(PMPI_Finalize+0x6c)[0x100000153154]
[white23:58788] [ 6] ./IMB-MPI1[0x100028dc]
[white23:58788] [ 7] /lib64/libc.so.6(+0x24700)[0x1000004b4700]
[white23:58788] [ 8] /lib64/libc.so.6(__libc_start_main+0xc4)[0x1000004b48f4]
[white23:58788] *** End of error message ***
[white22:73620] *** Process received signal ***
[white22:73620] Signal: Segmentation fault (11)
[white22:73620] Signal code: Invalid permissions (2)
[white22:73620] Failing at address: 0x1000008e0810


Thanks,

S.



Si Hammond
Scalable Computer Architectures
Sandia National Laboratories, NM, USA

[Sent from Remote Connection, Please excuse typos]
r***@open-mpi.org
2017-02-22 05:17:54 UTC
Permalink
Can you provide a backtrace with line numbers from a debug build? We don’t get much testing with lsf, so it is quite possible there is a bug in there.
Post by Hammond, Simon David (-EXP)
Hi OpenMPI Users,
Has anyone successfully tested OpenMPI 1.10.6 with PGI 17.1.0 on POWER8 with the LSF scheduler (—with-lsf=..)?
I am getting this error when the code hits MPI_Finalize. It causes the job to abort (i.e. exit the LSF session) when I am running interactively.
Are there any materials we can supply to aid debugging/problem isolation?
[white23:58788] *** Process received signal ***
[white23:58788] Signal: Segmentation fault (11)
[white23:58788] Signal code: Invalid permissions (2)
[white23:58788] Failing at address: 0x1000008e0810
[white23:58788] [ 0] [0x100000050478]
[white23:58788] [ 1] [0x0]
[white23:58788] [ 2] /home/projects/pwr8-rhel73-lsf/openmpi/1.10.6/pgi/17.1.0/cuda/none/lib/libopen-rte.so.12(+0x1b6b0)[0x10000071b6b0]
[white23:58788] [ 3] /home/projects/pwr8-rhel73-lsf/openmpi/1.10.6/pgi/17.1.0/cuda/none/lib/libopen-rte.so.12(orte_finalize+0x70)[0x10000071b5b8]
[white23:58788] [ 4] /home/projects/pwr8-rhel73-lsf/openmpi/1.10.6/pgi/17.1.0/cuda/none/lib/libmpi.so.12(ompi_mpi_finalize+0x760)[0x100000121dc8]
[white23:58788] [ 5] /home/projects/pwr8-rhel73-lsf/openmpi/1.10.6/pgi/17.1.0/cuda/none/lib/libmpi.so.12(PMPI_Finalize+0x6c)[0x100000153154]
[white23:58788] [ 6] ./IMB-MPI1[0x100028dc]
[white23:58788] [ 7] /lib64/libc.so.6(+0x24700)[0x1000004b4700]
[white23:58788] [ 8] /lib64/libc.so.6(__libc_start_main+0xc4)[0x1000004b48f4]
[white23:58788] *** End of error message ***
[white22:73620] *** Process received signal ***
[white22:73620] Signal: Segmentation fault (11)
[white22:73620] Signal code: Invalid permissions (2)
[white22:73620] Failing at address: 0x1000008e0810
Thanks,
S.

Si Hammond
Scalable Computer Architectures
Sandia National Laboratories, NM, USA
[Sent from Remote Connection, Please excuse typos]
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Loading...