abhisek Mondal
2018-03-13 06:58:15 UTC
Hi,
I'm having a strange issue with Openmpi-1.4.
Whenever I try to run a program with number of mpi more than 1, it crashes.
For instance the following code:
mpirun -np 2 -bynode `which relion_refine_mpi` --gpu --tau2_fudge 2
--scale --dont_combine_weights_via_disc --iter 25 --norm --psi_step 10.0
--ctf --offset_range 5.0 --oversampling 1 --pool 3 --o
Runs/000757_ProtRelionClassify2D/extra/relion --i
Runs/000757_ProtRelionClassify2D/input_particles.star --particle_diameter
282 --K 50 --preread_images --flatten_solvent --zero_mask --offset_step
2.0 --angpix 1.89 --j 5
Is giving me:
--------------------------------------------------------------------------
00110: It looks like opal_init failed for some reason; your parallel
process is
00111: likely to abort. There are many reasons that a parallel process
can
00112: fail during opal_init; some of which are due to configuration or
00113: environment problems. This failure appears to be an internal
failure;
00114: here's some additional information (which may only be relevant to
an
00115: Open MPI developer):
00116:
00117: opal_shmem_base_select failed
00118: --> Returned value -1 instead of OPAL_SUCCESS
00119:
--------------------------------------------------------------------------
But, if I use following code, it would run fine:
relion_refine --gpu --tau2_fudge 2 --scale
--dont_combine_weights_via_disc --iter 25 --norm --psi_step 10.0 --ctf
--offset_range 5.0 --oversampling 1 --pool 3 --o
Runs/000757_ProtRelionClassify2D/extra/relion --i
Runs/000757_ProtRelionClassify2D/input_particles.star --particle_diameter
282 --K 50 --preread_images --flatten_solvent --zero_mask --offset_step
2.0 --angpix 1.89 --j 5
I have used Openmpi perfectly with the same program earlier. Can't figure
out what is wrong this time.
Please help me out.
Thank you.
--
Abhisek Mondal
*Senior Research Fellow*
*Structural Biology and Bioinformatics Division*
*CSIR-Indian Institute of Chemical Biology*
*Kolkata 700032*
*INDIA*
I'm having a strange issue with Openmpi-1.4.
Whenever I try to run a program with number of mpi more than 1, it crashes.
For instance the following code:
mpirun -np 2 -bynode `which relion_refine_mpi` --gpu --tau2_fudge 2
--scale --dont_combine_weights_via_disc --iter 25 --norm --psi_step 10.0
--ctf --offset_range 5.0 --oversampling 1 --pool 3 --o
Runs/000757_ProtRelionClassify2D/extra/relion --i
Runs/000757_ProtRelionClassify2D/input_particles.star --particle_diameter
282 --K 50 --preread_images --flatten_solvent --zero_mask --offset_step
2.0 --angpix 1.89 --j 5
Is giving me:
--------------------------------------------------------------------------
00110: It looks like opal_init failed for some reason; your parallel
process is
00111: likely to abort. There are many reasons that a parallel process
can
00112: fail during opal_init; some of which are due to configuration or
00113: environment problems. This failure appears to be an internal
failure;
00114: here's some additional information (which may only be relevant to
an
00115: Open MPI developer):
00116:
00117: opal_shmem_base_select failed
00118: --> Returned value -1 instead of OPAL_SUCCESS
00119:
--------------------------------------------------------------------------
But, if I use following code, it would run fine:
relion_refine --gpu --tau2_fudge 2 --scale
--dont_combine_weights_via_disc --iter 25 --norm --psi_step 10.0 --ctf
--offset_range 5.0 --oversampling 1 --pool 3 --o
Runs/000757_ProtRelionClassify2D/extra/relion --i
Runs/000757_ProtRelionClassify2D/input_particles.star --particle_diameter
282 --K 50 --preread_images --flatten_solvent --zero_mask --offset_step
2.0 --angpix 1.89 --j 5
I have used Openmpi perfectly with the same program earlier. Can't figure
out what is wrong this time.
Please help me out.
Thank you.
--
Abhisek Mondal
*Senior Research Fellow*
*Structural Biology and Bioinformatics Division*
*CSIR-Indian Institute of Chemical Biology*
*Kolkata 700032*
*INDIA*