Discussion:
[OMPI users] openmpi crashes for more than 1 MPI
abhisek Mondal
2018-03-13 06:58:15 UTC
Permalink
Hi,

I'm having a strange issue with Openmpi-1.4.
Whenever I try to run a program with number of mpi more than 1, it crashes.

For instance the following code:
mpirun -np 2 -bynode `which relion_refine_mpi` --gpu --tau2_fudge 2
--scale --dont_combine_weights_via_disc --iter 25 --norm --psi_step 10.0
--ctf --offset_range 5.0 --oversampling 1 --pool 3 --o
Runs/000757_ProtRelionClassify2D/extra/relion --i
Runs/000757_ProtRelionClassify2D/input_particles.star --particle_diameter
282 --K 50 --preread_images --flatten_solvent --zero_mask --offset_step
2.0 --angpix 1.89 --j 5

Is giving me:
--------------------------------------------------------------------------
00110: It looks like opal_init failed for some reason; your parallel
process is
00111: likely to abort. There are many reasons that a parallel process
can
00112: fail during opal_init; some of which are due to configuration or
00113: environment problems. This failure appears to be an internal
failure;
00114: here's some additional information (which may only be relevant to
an
00115: Open MPI developer):
00116:
00117: opal_shmem_base_select failed
00118: --> Returned value -1 instead of OPAL_SUCCESS
00119:
--------------------------------------------------------------------------


But, if I use following code, it would run fine:
relion_refine --gpu --tau2_fudge 2 --scale
--dont_combine_weights_via_disc --iter 25 --norm --psi_step 10.0 --ctf
--offset_range 5.0 --oversampling 1 --pool 3 --o
Runs/000757_ProtRelionClassify2D/extra/relion --i
Runs/000757_ProtRelionClassify2D/input_particles.star --particle_diameter
282 --K 50 --preread_images --flatten_solvent --zero_mask --offset_step
2.0 --angpix 1.89 --j 5


I have used Openmpi perfectly with the same program earlier. Can't figure
out what is wrong this time.

Please help me out.

Thank you.
--
Abhisek Mondal

*Senior Research Fellow*

*Structural Biology and Bioinformatics Division*
*CSIR-Indian Institute of Chemical Biology*

*Kolkata 700032*

*INDIA*
Gilles Gouaillardet
2018-03-13 07:27:52 UTC
Permalink
Hi,


I think it is really time to upgrade Open MPI.

Supported versions are 2.1.2 and 3.0.0


Open MPI 1.4 is really old now and I doubt you will ever get any support
on that version.


Cheers,


Gilles
Post by abhisek Mondal
Hi,
I'm having a strange issue with Openmpi-1.4.
Whenever I try to run a program with number of mpi more than 1, it crashes.
mpirun -np 2 -bynode  `which relion_refine_mpi` --gpu --tau2_fudge 2
--scale  --dont_combine_weights_via_disc --iter 25 --norm  --psi_step
10.0 --ctf  --offset_range 5.0 --oversampling 1 --pool 3 --o
Runs/000757_ProtRelionClassify2D/extra/relion --i
Runs/000757_ProtRelionClassify2D/input_particles.star
--particle_diameter 282 --K 50 --preread_images --flatten_solvent
--zero_mask  --offset_step 2.0 --angpix 1.89  --j 5
--------------------------------------------------------------------------
00110:   It looks like opal_init failed for some reason; your parallel
process is
00111:   likely to abort.  There are many reasons that a parallel
process can
00112:   fail during opal_init; some of which are due to configuration or
00113:   environment problems.  This failure appears to be an internal
failure;
00114:   here's some additional information (which may only be
relevant to an
00117:     opal_shmem_base_select failed
00118:     --> Returned value -1 instead of OPAL_SUCCESS
 --------------------------------------------------------------------------
relion_refine --gpu  --tau2_fudge 2 --scale
--dont_combine_weights_via_disc  --iter 25 --norm --psi_step 10.0
--ctf  --offset_range 5.0 --oversampling 1 --pool 3 --o
Runs/000757_ProtRelionClassify2D/extra/relion --i
Runs/000757_ProtRelionClassify2D/input_particles.star
--particle_diameter 282 --K 50 --preread_images --flatten_solvent
--zero_mask  --offset_step 2.0 --angpix 1.89  --j 5
I have used Openmpi perfectly with the same program earlier. Can't
figure out what is wrong this time.
Please help me out.
Thank you.
--
Abhisek Mondal
/Senior Research Fellow
/
/Structural Biology and Bioinformatics Division
/
/CSIR-Indian Institute of Chemical Biology/
/Kolkata 700032
/
/INDIA
/
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Loading...