Discussion:
[OMPI users] Comm_connect: Data unpack would read past end of buffer
Florian Lindner
2018-08-03 10:40:43 UTC
Permalink
Hello,

I have this piece of code:

MPI_Comm icomm;
INFO << "Accepting connection on " << portName;
MPI_Comm_accept(portName.c_str(), MPI_INFO_NULL, 0, MPI_COMM_SELF, &icomm);

and sometimes (like in 1 of 5 runs), I get:

[helium:33883] [[32673,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file dpm_orte.c at line 406
[helium:33883] *** An error occurred in MPI_Comm_accept
[helium:33883] *** reported by process [2141257729,0]
[helium:33883] *** on communicator MPI_COMM_SELF
[helium:33883] *** MPI_ERR_UNKNOWN: unknown error
[helium:33883] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[helium:33883] *** and potentially your MPI job)
[helium:33883] [0] func:/usr/lib/libopen-pal.so.13(opal_backtrace_buffer+0x33) [0x7fc1ad0ac6e3]
[helium:33883] [1] func:/usr/lib/libmpi.so.12(ompi_mpi_abort+0x365) [0x7fc1af4955e5]
[helium:33883] [2] func:/usr/lib/libmpi.so.12(ompi_mpi_errors_are_fatal_comm_handler+0xe2) [0x7fc1af487e72]
[helium:33883] [3] func:/usr/lib/libmpi.so.12(ompi_errhandler_invoke+0x145) [0x7fc1af4874b5]
[helium:33883] [4] func:/usr/lib/libmpi.so.12(MPI_Comm_accept+0x262) [0x7fc1af4a90e2]
[helium:33883] [5] func:./mpiports() [0x41e43d]
[helium:33883] [6] func:/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fc1ad7a1830]
[helium:33883] [7] func:./mpiports() [0x41b249]


Before that I check for the length of portName

DEBUG << "COMM ACCEPT portName.size() = " << portName.size();
DEBUG << "MPI_MAX_PORT_NAME = " << MPI_MAX_PORT_NAME;

which both return 1024.

I am completely puzzled, how I can get a buffer issue, except something faulty with std::string portName.

Any clues?

Launch command: mpirun -n 4 -mca opal_abort_print_stack 1
OpenMPI 1.10.2 @ Ubuntu 16.

Thanks,
Florian
Ralph H Castain
2018-08-03 15:06:53 UTC
Permalink
The buffer being overrun isn’t anything to do with you - it’s an internal buffer used as part of creating the connections. It indicates a problem in OMPI.

The 1.10 series is out of the support window, but if you want to stick with it you should at least update to the last release in that series - believe that is 1.10.7.

The OMPI v2.x series had problems that don’t support dynamics, so you should skip that one. If you want to come all the way forward, you should take the OMPI v3.x series.

Ralph
Post by Florian Lindner
Hello,
MPI_Comm icomm;
INFO << "Accepting connection on " << portName;
MPI_Comm_accept(portName.c_str(), MPI_INFO_NULL, 0, MPI_COMM_SELF, &icomm);
[helium:33883] [[32673,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file dpm_orte.c at line 406
[helium:33883] *** An error occurred in MPI_Comm_accept
[helium:33883] *** reported by process [2141257729,0]
[helium:33883] *** on communicator MPI_COMM_SELF
[helium:33883] *** MPI_ERR_UNKNOWN: unknown error
[helium:33883] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[helium:33883] *** and potentially your MPI job)
[helium:33883] [0] func:/usr/lib/libopen-pal.so.13(opal_backtrace_buffer+0x33) [0x7fc1ad0ac6e3]
[helium:33883] [1] func:/usr/lib/libmpi.so.12(ompi_mpi_abort+0x365) [0x7fc1af4955e5]
[helium:33883] [2] func:/usr/lib/libmpi.so.12(ompi_mpi_errors_are_fatal_comm_handler+0xe2) [0x7fc1af487e72]
[helium:33883] [3] func:/usr/lib/libmpi.so.12(ompi_errhandler_invoke+0x145) [0x7fc1af4874b5]
[helium:33883] [4] func:/usr/lib/libmpi.so.12(MPI_Comm_accept+0x262) [0x7fc1af4a90e2]
[helium:33883] [5] func:./mpiports() [0x41e43d]
[helium:33883] [6] func:/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fc1ad7a1830]
[helium:33883] [7] func:./mpiports() [0x41b249]
Before that I check for the length of portName
DEBUG << "COMM ACCEPT portName.size() = " << portName.size();
DEBUG << "MPI_MAX_PORT_NAME = " << MPI_MAX_PORT_NAME;
which both return 1024.
I am completely puzzled, how I can get a buffer issue, except something faulty with std::string portName.
Any clues?
Launch command: mpirun -n 4 -mca opal_abort_print_stack 1
Thanks,
Florian
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Loading...