Florian Lindner
2018-08-03 10:40:43 UTC
Hello,
I have this piece of code:
MPI_Comm icomm;
INFO << "Accepting connection on " << portName;
MPI_Comm_accept(portName.c_str(), MPI_INFO_NULL, 0, MPI_COMM_SELF, &icomm);
and sometimes (like in 1 of 5 runs), I get:
[helium:33883] [[32673,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file dpm_orte.c at line 406
[helium:33883] *** An error occurred in MPI_Comm_accept
[helium:33883] *** reported by process [2141257729,0]
[helium:33883] *** on communicator MPI_COMM_SELF
[helium:33883] *** MPI_ERR_UNKNOWN: unknown error
[helium:33883] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[helium:33883] *** and potentially your MPI job)
[helium:33883] [0] func:/usr/lib/libopen-pal.so.13(opal_backtrace_buffer+0x33) [0x7fc1ad0ac6e3]
[helium:33883] [1] func:/usr/lib/libmpi.so.12(ompi_mpi_abort+0x365) [0x7fc1af4955e5]
[helium:33883] [2] func:/usr/lib/libmpi.so.12(ompi_mpi_errors_are_fatal_comm_handler+0xe2) [0x7fc1af487e72]
[helium:33883] [3] func:/usr/lib/libmpi.so.12(ompi_errhandler_invoke+0x145) [0x7fc1af4874b5]
[helium:33883] [4] func:/usr/lib/libmpi.so.12(MPI_Comm_accept+0x262) [0x7fc1af4a90e2]
[helium:33883] [5] func:./mpiports() [0x41e43d]
[helium:33883] [6] func:/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fc1ad7a1830]
[helium:33883] [7] func:./mpiports() [0x41b249]
Before that I check for the length of portName
DEBUG << "COMM ACCEPT portName.size() = " << portName.size();
DEBUG << "MPI_MAX_PORT_NAME = " << MPI_MAX_PORT_NAME;
which both return 1024.
I am completely puzzled, how I can get a buffer issue, except something faulty with std::string portName.
Any clues?
Launch command: mpirun -n 4 -mca opal_abort_print_stack 1
OpenMPI 1.10.2 @ Ubuntu 16.
Thanks,
Florian
I have this piece of code:
MPI_Comm icomm;
INFO << "Accepting connection on " << portName;
MPI_Comm_accept(portName.c_str(), MPI_INFO_NULL, 0, MPI_COMM_SELF, &icomm);
and sometimes (like in 1 of 5 runs), I get:
[helium:33883] [[32673,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file dpm_orte.c at line 406
[helium:33883] *** An error occurred in MPI_Comm_accept
[helium:33883] *** reported by process [2141257729,0]
[helium:33883] *** on communicator MPI_COMM_SELF
[helium:33883] *** MPI_ERR_UNKNOWN: unknown error
[helium:33883] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[helium:33883] *** and potentially your MPI job)
[helium:33883] [0] func:/usr/lib/libopen-pal.so.13(opal_backtrace_buffer+0x33) [0x7fc1ad0ac6e3]
[helium:33883] [1] func:/usr/lib/libmpi.so.12(ompi_mpi_abort+0x365) [0x7fc1af4955e5]
[helium:33883] [2] func:/usr/lib/libmpi.so.12(ompi_mpi_errors_are_fatal_comm_handler+0xe2) [0x7fc1af487e72]
[helium:33883] [3] func:/usr/lib/libmpi.so.12(ompi_errhandler_invoke+0x145) [0x7fc1af4874b5]
[helium:33883] [4] func:/usr/lib/libmpi.so.12(MPI_Comm_accept+0x262) [0x7fc1af4a90e2]
[helium:33883] [5] func:./mpiports() [0x41e43d]
[helium:33883] [6] func:/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fc1ad7a1830]
[helium:33883] [7] func:./mpiports() [0x41b249]
Before that I check for the length of portName
DEBUG << "COMM ACCEPT portName.size() = " << portName.size();
DEBUG << "MPI_MAX_PORT_NAME = " << MPI_MAX_PORT_NAME;
which both return 1024.
I am completely puzzled, how I can get a buffer issue, except something faulty with std::string portName.
Any clues?
Launch command: mpirun -n 4 -mca opal_abort_print_stack 1
OpenMPI 1.10.2 @ Ubuntu 16.
Thanks,
Florian