Kaiming Ouyang
2018-03-20 01:29:18 UTC
Hi everyone,
Recently I need to compile High-Performance Linpack code with openmpi 1.2
version (a little bit old). When I finish compilation, and try to run, I
get the following errors:
[test:32058] *** Process received signal ***
[test:32058] Signal: Segmentation fault (11)
[test:32058] Signal code: Address not mapped (1)
[test:32058] Failing at address: 0x14a2b84b6304
[test:32058] [ 0] /lib64/libpthread.so.0(+0xf5e0) [0x14eb116295e0]
[test:32058] [ 1] /root/research/lib/openmpi-1.
2.9/lib/openmpi/mca_btl_sm.so(mca_btl_sm_component_progress+0x28a)
[0x14eaa81258aa]
[test:32058] [ 2] /root/research/lib/openmpi-1.
2.9/lib/openmpi/mca_bml_r2.so(mca_bml_r2_progress+0x2b) [0x14eaa853219b]
[test:32058] [ 3] /root/research/lib/openmpi-1.
2.9/lib/libopen-pal.so.0(opal_progress+0x4a) [0x14eb128dbaaa]
[test:32058] [ 4]
/root/research/lib/openmpi-1.2.9/lib/openmpi/mca_oob_tcp.so(mca_oob_tcp_msg_wait+0x1d)
[0x14eaf41e6b4d]
[test:32058] [ 5]
/root/research/lib/openmpi-1.2.9/lib/openmpi/mca_oob_tcp.so(mca_oob_tcp_recv+0x3a5)
[0x14eaf41eac45]
[test:32058] [ 6]
/root/research/lib/openmpi-1.2.9/lib/libopen-rte.so.0(mca_oob_recv_packed+0x33)
[0x14eb12b62223]
[test:32058] [ 7] /root/research/lib/openmpi-1.
2.9/lib/openmpi/mca_gpr_proxy.so(orte_gpr_proxy_put+0x1f9) [0x14eaf3dd7db9]
[test:32058] [ 8] /root/research/lib/openmpi-1.
2.9/lib/libopen-rte.so.0(orte_smr_base_set_proc_state+0x31d)
[0x14eb12b7893d]
[test:32058] [ 9]
/root/research/lib/openmpi-1.2.9/lib/libmpi.so.0(ompi_mpi_init+0x8d6)
[0x14eb13202136]
[test:32058] [10]
/root/research/lib/openmpi-1.2.9/lib/libmpi.so.0(MPI_Init+0x6a)
[0x14eb1322461a]
[test:32058] [11] ./xhpl(main+0x5d) [0x404e7d]
[test:32058] [12] /lib64/libc.so.6(__libc_start_main+0xf5) [0x14eb11278c05]
[test:32058] [13] ./xhpl() [0x4056cb]
[test:32058] *** End of error message ***
mpirun noticed that job rank 0 with PID 31481 on node test.novalocal exited
on signal 15 (Terminated).
23 additional processes aborted (not shown)
The machine has infiniband, so I doubt whether openmpi 1.2 does not support
infiniband by default. I also try to run it not through infiniband, but the
program can only deal with small size input. When I increase the input size
and grid size, it just gets stuck. The program I run is a benchmark, so I
don't think there would be a problem in the code. Any idea? Thanks.
Recently I need to compile High-Performance Linpack code with openmpi 1.2
version (a little bit old). When I finish compilation, and try to run, I
get the following errors:
[test:32058] *** Process received signal ***
[test:32058] Signal: Segmentation fault (11)
[test:32058] Signal code: Address not mapped (1)
[test:32058] Failing at address: 0x14a2b84b6304
[test:32058] [ 0] /lib64/libpthread.so.0(+0xf5e0) [0x14eb116295e0]
[test:32058] [ 1] /root/research/lib/openmpi-1.
2.9/lib/openmpi/mca_btl_sm.so(mca_btl_sm_component_progress+0x28a)
[0x14eaa81258aa]
[test:32058] [ 2] /root/research/lib/openmpi-1.
2.9/lib/openmpi/mca_bml_r2.so(mca_bml_r2_progress+0x2b) [0x14eaa853219b]
[test:32058] [ 3] /root/research/lib/openmpi-1.
2.9/lib/libopen-pal.so.0(opal_progress+0x4a) [0x14eb128dbaaa]
[test:32058] [ 4]
/root/research/lib/openmpi-1.2.9/lib/openmpi/mca_oob_tcp.so(mca_oob_tcp_msg_wait+0x1d)
[0x14eaf41e6b4d]
[test:32058] [ 5]
/root/research/lib/openmpi-1.2.9/lib/openmpi/mca_oob_tcp.so(mca_oob_tcp_recv+0x3a5)
[0x14eaf41eac45]
[test:32058] [ 6]
/root/research/lib/openmpi-1.2.9/lib/libopen-rte.so.0(mca_oob_recv_packed+0x33)
[0x14eb12b62223]
[test:32058] [ 7] /root/research/lib/openmpi-1.
2.9/lib/openmpi/mca_gpr_proxy.so(orte_gpr_proxy_put+0x1f9) [0x14eaf3dd7db9]
[test:32058] [ 8] /root/research/lib/openmpi-1.
2.9/lib/libopen-rte.so.0(orte_smr_base_set_proc_state+0x31d)
[0x14eb12b7893d]
[test:32058] [ 9]
/root/research/lib/openmpi-1.2.9/lib/libmpi.so.0(ompi_mpi_init+0x8d6)
[0x14eb13202136]
[test:32058] [10]
/root/research/lib/openmpi-1.2.9/lib/libmpi.so.0(MPI_Init+0x6a)
[0x14eb1322461a]
[test:32058] [11] ./xhpl(main+0x5d) [0x404e7d]
[test:32058] [12] /lib64/libc.so.6(__libc_start_main+0xf5) [0x14eb11278c05]
[test:32058] [13] ./xhpl() [0x4056cb]
[test:32058] *** End of error message ***
mpirun noticed that job rank 0 with PID 31481 on node test.novalocal exited
on signal 15 (Terminated).
23 additional processes aborted (not shown)
The machine has infiniband, so I doubt whether openmpi 1.2 does not support
infiniband by default. I also try to run it not through infiniband, but the
program can only deal with small size input. When I increase the input size
and grid size, it just gets stuck. The program I run is a benchmark, so I
don't think there would be a problem in the code. Any idea? Thanks.