Simone Pellegrini
2011-09-06 09:01:18 UTC
Dear all,
I am developing an MPI application which uses heavily MPI_Spawn. Usually
everything works fine for the first hundred spawn but after a while the
application exist with a curious message:
[arch-top:27712] [[36904,165],0] ORTE_ERROR_LOG: Data unpack would read
past end of buffer in file base/grpcomm_base_modex.c at line 349
[arch-top:27712] [[36904,165],0] ORTE_ERROR_LOG: Data unpack would read
past end of buffer in file grpcomm_bad_module.c at line 518
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_proc_set_arch failed
--> Returned "Data unpack would read past end of buffer" (-26)
instead of "Success" (0)
--------------------------------------------------------------------------
*** The MPI_Init_thread() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[arch-top:27712] Abort before MPI_INIT completed successfully; not able
to guarantee that all other processes were killed!
[arch-top:27714] [[36904,165],0] ORTE_ERROR_LOG: Data unpack would read
past end of buffer in file base/grpcomm_base_modex.c at line 349
[arch-top:27714] [[36904,165],0] ORTE_ERROR_LOG: Data unpack would read
past end of buffer in file grpcomm_bad_module.c at line 518
*** The MPI_Init_thread() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[arch-top:27714] Abort before MPI_INIT completed successfully; not able
to guarantee that all other processes were killed!
[arch-top:27226] 1 more process has sent help message help-mpi-runtime /
mpi_init:startup:internal-failure
[arch-top:27226] Set MCA parameter "orte_base_help_aggregate" to 0 to
see all help / error messages
Also using MPI_init instead of MPI_Init_thread does not help, the same
error occurs.
Strangely the error does not occur if I run the code enabling debug in
(-mca plm_base_verbose 5 -mca rmaps_base_verbose 5).
I am using OpenMPI 1.5.3
cheers, Simone
I am developing an MPI application which uses heavily MPI_Spawn. Usually
everything works fine for the first hundred spawn but after a while the
application exist with a curious message:
[arch-top:27712] [[36904,165],0] ORTE_ERROR_LOG: Data unpack would read
past end of buffer in file base/grpcomm_base_modex.c at line 349
[arch-top:27712] [[36904,165],0] ORTE_ERROR_LOG: Data unpack would read
past end of buffer in file grpcomm_bad_module.c at line 518
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_proc_set_arch failed
--> Returned "Data unpack would read past end of buffer" (-26)
instead of "Success" (0)
--------------------------------------------------------------------------
*** The MPI_Init_thread() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[arch-top:27712] Abort before MPI_INIT completed successfully; not able
to guarantee that all other processes were killed!
[arch-top:27714] [[36904,165],0] ORTE_ERROR_LOG: Data unpack would read
past end of buffer in file base/grpcomm_base_modex.c at line 349
[arch-top:27714] [[36904,165],0] ORTE_ERROR_LOG: Data unpack would read
past end of buffer in file grpcomm_bad_module.c at line 518
*** The MPI_Init_thread() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[arch-top:27714] Abort before MPI_INIT completed successfully; not able
to guarantee that all other processes were killed!
[arch-top:27226] 1 more process has sent help message help-mpi-runtime /
mpi_init:startup:internal-failure
[arch-top:27226] Set MCA parameter "orte_base_help_aggregate" to 0 to
see all help / error messages
Also using MPI_init instead of MPI_Init_thread does not help, the same
error occurs.
Strangely the error does not occur if I run the code enabling debug in
(-mca plm_base_verbose 5 -mca rmaps_base_verbose 5).
I am using OpenMPI 1.5.3
cheers, Simone