Discussion:
[OMPI users] error in mpi processes: openmpi-3.0
abhisek Mondal
2018-04-04 04:20:29 UTC
Permalink
Hi,

I have recently upgraded my system to openmpi-3.0. But despite proper
installation and gpu integration I keep receiving this error, as I was also
receiving in openmpi-1.4:

*$ mpirun -np 10 -bynode `which relion_preprocess_mpi` --i
input_micrographs.star --coord_dir "." --coord_suffix .coords.star
--part_star extra/output_particles.star --part_dir "." --extract
--extract_size 140 --bg_radius 52 --invert_contrast --norm*

*00021:
--------------------------------------------------------------------------*
*00022: The following command line options and corresponding MCA
parameter have*
*00023: been deprecated and replaced as follows:*
*00024: *
*00025: Command line options:*
*00026: Deprecated: --bynode, -bynode*
*00027: Replacement: --map-by node*
*00028: *
*00029: Equivalent MCA parameter:*
*00030: Deprecated: rmaps_base_bynode*
*00031: Replacement: rmaps_base_mapping_policy=node*
*00032: *
*00033: The deprecated forms *will* disappear in a future version of Open
MPI.*
*00034: Please update to the new syntax.*
*00035:
--------------------------------------------------------------------------*
*00036: [localhost.localdomain:29946] PMIX ERROR: BAD-PARAM in file
../../../../../../../opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_esh.c at
line 1005*
*00037: [localhost.localdomain:29951] PMIX ERROR: BAD-PARAM in file
../../../../../../../opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_esh.c at
line 1005*
*00038: [localhost.localdomain:29948] PMIX ERROR: BAD-PARAM in file
../../../../../../../opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_esh.c at
line 1005*
*00039: [localhost.localdomain:29952] PMIX ERROR: BAD-PARAM in file
../../../../../../../opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_esh.c at
line 1005*
*00040: [localhost.localdomain:29944] PMIX ERROR: BAD-PARAM in file
../../../../../../../opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_esh.c at
line 1005*
*00041: [localhost.localdomain:29950] PMIX ERROR: BAD-PARAM in file
../../../../../../../opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_esh.c at
line 1005*
*00042: [localhost.localdomain:29949] PMIX ERROR: BAD-PARAM in file
../../../../../../../opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_esh.c at
line 1005*
*00043: *** An error occurred in MPI_Init*
*00044: *** on a NULL communicator*
*00045: *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now
abort,*
*00046: *** and potentially your MPI job)*
*00047: *** An error occurred in MPI_Init*
*00048: *** on a NULL communicator*
*00049: *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now
abort,*
*00050: *** and potentially your MPI job)*
*00051: *** An error occurred in MPI_Init*
*00052: *** on a NULL communicator*
*00053: *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now
abort,*
*00054: *** and potentially your MPI job)*
*00055: [localhost.localdomain:29952] Local abort before MPI_INIT
completed completed successfully, but am not able to aggregate error
messages, and not able to guarantee that all other processes were killed!*

I'm not sure what is causing this crash.

Please help me out.

Thank you

*-- *
Abhisek Mondal

*Senior Research Fellow*

*Structural Biology and Bioinformatics Division*
*CSIR-Indian Institute of Chemical Biology*

*Kolkata 700032*

*INDIA*
Jeff Squyres (jsquyres)
2018-04-07 20:11:16 UTC
Permalink
Are you 100% sure that you are not accidentally mixing and matching multiple versions of Open MPI in the same job? This type of error (PMIX bad param) is typical when you accidentally use Open MPI vXYZ on one node and Open MPI vABC on a different node.
Hi,
$ mpirun -np 10 -bynode `which relion_preprocess_mpi` --i input_micrographs.star --coord_dir "." --coord_suffix .coords.star --part_star extra/output_particles.star --part_dir "." --extract --extract_size 140 --bg_radius 52 --invert_contrast --norm
00021: --------------------------------------------------------------------------
00022: The following command line options and corresponding MCA parameter have
00026: Deprecated: --bynode, -bynode
00027: Replacement: --map-by node
00030: Deprecated: rmaps_base_bynode
00031: Replacement: rmaps_base_mapping_policy=node
00033: The deprecated forms *will* disappear in a future version of Open MPI.
00034: Please update to the new syntax.
00035: --------------------------------------------------------------------------
00036: [localhost.localdomain:29946] PMIX ERROR: BAD-PARAM in file ../../../../../../../opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_esh.c at line 1005
00037: [localhost.localdomain:29951] PMIX ERROR: BAD-PARAM in file ../../../../../../../opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_esh.c at line 1005
00038: [localhost.localdomain:29948] PMIX ERROR: BAD-PARAM in file ../../../../../../../opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_esh.c at line 1005
00039: [localhost.localdomain:29952] PMIX ERROR: BAD-PARAM in file ../../../../../../../opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_esh.c at line 1005
00040: [localhost.localdomain:29944] PMIX ERROR: BAD-PARAM in file ../../../../../../../opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_esh.c at line 1005
00041: [localhost.localdomain:29950] PMIX ERROR: BAD-PARAM in file ../../../../../../../opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_esh.c at line 1005
00042: [localhost.localdomain:29949] PMIX ERROR: BAD-PARAM in file ../../../../../../../opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_esh.c at line 1005
00043: *** An error occurred in MPI_Init
00044: *** on a NULL communicator
00045: *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
00046: *** and potentially your MPI job)
00047: *** An error occurred in MPI_Init
00048: *** on a NULL communicator
00049: *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
00050: *** and potentially your MPI job)
00051: *** An error occurred in MPI_Init
00052: *** on a NULL communicator
00053: *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
00054: *** and potentially your MPI job)
00055: [localhost.localdomain:29952] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
I'm not sure what is causing this crash.
Please help me out.
Thank you
--
Abhisek Mondal
Senior Research Fellow
Structural Biology and Bioinformatics Division
CSIR-Indian Institute of Chemical Biology
Kolkata 700032
INDIA
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
--
Jeff Squyres
***@cisco.com
Loading...