Discussion:
[OMPI users] v1.2 Bus Error (/tmp usage)
Hugh Merz
2007-03-20 20:32:26 UTC
Permalink
Good Day,

I'm using Open MPI on a diskless cluster (/tmp is part of a 1m ramdisk), and I found that after upgrading from v1.1.4 to v1.2 that jobs using np > 4 would fail to start during MPI_Init, due to what appears to be a lack of space in /tmp. The error output is:

-----

[tpb200:32193] *** Process received signal ***
[tpb200:32193] Signal: Bus error (7)
[tpb200:32193] Signal code: (2)
[tpb200:32193] Failing at address: 0x2a998f4120
[tpb200:32193] [ 0] /lib64/tls/libpthread.so.0 [0x2a95f6e430]
[tpb200:32193] [ 1] /opt/openmpi/1.2.gcc3/lib/libmpi.so.0(ompi_free_list_grow+0x138) [0x2a9568abc8]
[tpb200:32193] [ 2] /opt/openmpi/1.2.gcc3/lib/libmpi.so.0(ompi_free_list_resize+0x2d) [0x2a9568b0dd]
[tpb200:32193] [ 3] /opt/openmpi/1.2.gcc3/lib/openmpi/mca_btl_sm.so(mca_btl_sm_add_procs_same_base_addr+0x6bf) [0x2a98ba419f]
[tpb200:32193] [ 4] /opt/openmpi/1.2.gcc3/lib/openmpi/mca_bml_r2.so(mca_bml_r2_add_procs+0x28a) [0x2a9899a4fa]
[tpb200:32193] [ 5] /opt/openmpi/1.2.gcc3/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0xe8) [0x2a98889308]
[tpb200:32193] [ 6] /opt/openmpi/1.2.gcc3/lib/libmpi.so.0(ompi_mpi_init+0x45d) [0x2a956a32ed]
[tpb200:32193] [ 7] /opt/openmpi/1.2.gcc3/lib/libmpi.so.0(MPI_Init+0x93) [0x2a956c5c93]
[tpb200:32193] [ 8] a.out(main+0x1c) [0x400a44]
[tpb200:32193] [ 9] /lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x2a960933fb]
[tpb200:32193] [10] a.out [0x40099a]
[tpb200:32193] *** End of error message ***

... lots of the above for each process ...

mpirun noticed that job rank 0 with PID 32040 on node tpb200 exited on signal 7 (Bus error).

--/--

If I increase the size of my ramdisk or point $TMP to a network filesystem then jobs start and complete fine, so it's not a showstopper, but with v1.1.4 (or LAM v7.1.2) I didn't encounter this issue with my default 1m ramdisk (even with np > 100 ). Is there a way to limit /tmp usage in Open MPI v1.2?

Hugh
Ralph Castain
2007-03-20 20:37:31 UTC
Permalink
One option would be to amend your mpirun command with -mca btl ^sm. This
turns off the shared memory subsystem, so you'll see some performance loss
in your collectives. However, it will reduce your /tmp usage to almost
nothing.

Others may suggest alternative solutions.
Ralph
Post by Hugh Merz
Good Day,
I'm using Open MPI on a diskless cluster (/tmp is part of a 1m ramdisk), and
I found that after upgrading from v1.1.4 to v1.2 that jobs using np > 4 would
fail to start during MPI_Init, due to what appears to be a lack of space in
-----
[tpb200:32193] *** Process received signal ***
[tpb200:32193] Signal: Bus error (7)
[tpb200:32193] Signal code: (2)
[tpb200:32193] Failing at address: 0x2a998f4120
[tpb200:32193] [ 0] /lib64/tls/libpthread.so.0 [0x2a95f6e430]
[tpb200:32193] [ 1]
/opt/openmpi/1.2.gcc3/lib/libmpi.so.0(ompi_free_list_grow+0x138)
[0x2a9568abc8]
[tpb200:32193] [ 2]
/opt/openmpi/1.2.gcc3/lib/libmpi.so.0(ompi_free_list_resize+0x2d)
[0x2a9568b0dd]
[tpb200:32193] [ 3]
/opt/openmpi/1.2.gcc3/lib/openmpi/mca_btl_sm.so(mca_btl_sm_add_procs_same_base
_addr+0x6bf) [0x2a98ba419f]
[tpb200:32193] [ 4]
/opt/openmpi/1.2.gcc3/lib/openmpi/mca_bml_r2.so(mca_bml_r2_add_procs+0x28a)
[0x2a9899a4fa]
[tpb200:32193] [ 5]
/opt/openmpi/1.2.gcc3/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0xe8)
[0x2a98889308]
[tpb200:32193] [ 6] /opt/openmpi/1.2.gcc3/lib/libmpi.so.0(ompi_mpi_init+0x45d)
[0x2a956a32ed]
[tpb200:32193] [ 7] /opt/openmpi/1.2.gcc3/lib/libmpi.so.0(MPI_Init+0x93) [0x2a956c5c93]
[tpb200:32193] [ 8] a.out(main+0x1c) [0x400a44]
[tpb200:32193] [ 9] /lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x2a960933fb]
[tpb200:32193] [10] a.out [0x40099a]
[tpb200:32193] *** End of error message ***
... lots of the above for each process ...
mpirun noticed that job rank 0 with PID 32040 on node tpb200 exited on signal
7 (Bus error).
--/--
If I increase the size of my ramdisk or point $TMP to a network filesystem
then jobs start and complete fine, so it's not a showstopper, but with v1.1.4
(or LAM v7.1.2) I didn't encounter this issue with my default 1m ramdisk (even
with np > 100 ). Is there a way to limit /tmp usage in Open MPI v1.2?
Hugh
_______________________________________________
users mailing list
http://www.open-mpi.org/mailman/listinfo.cgi/users
Loading...