Discussion:
[OMPI users] EBADF (Bad file descriptor) on a simplest "Hello world" program
Dmitry N. Mikushin
2018-06-01 19:29:05 UTC
Permalink
Dear all,

Looks like I have a weird issue never encountered before. While trying to
run simplest "Hello world" program, I get:

$ cat hello.c
#include <mpi.h>

int main(int argc, char* argv[])
{
MPI_Init(&argc, &argv);

MPI_Finalize();

return 0;
}
$ mpicc hello.c -o hello
$ mpirun -np 1 ./hello
--------------------------------------------------------------------------
WARNING: The accept(3) system call failed on a TCP socket. While this
should generally never happen on a well-configured HPC system, the
most common causes when it does occur are:

* The process ran out of file descriptors
* The operating system ran out of file descriptors
* The operating system ran out of memory

Your Open MPI job will likely hang until the failure resason is fixed
(e.g., more file descriptors and/or memory becomes available), and may
eventually timeout / abort.

Local host: M17xR4
Errno: 9 (Bad file descriptor)
Probable cause: Unknown cause; job will try to continue
--------------------------------------------------------------------------

Further tracing shows the following:

[pid 13498] accept(0, 0x7f2ec8000960, 0x7f2ee6740e7c) = -1 EBADF (Bad file
descriptor)
[pid 13498] shutdown(0, SHUT_RDWR) = -1 EBADF (Bad file descriptor)
[pid 13498] close(0) = -1 EBADF (Bad file descriptor)
[pid 13498] open("/usr/share/openmpi/help-oob-tcp.txt", O_RDONLY) = 0
[pid 13498] ioctl(0, TCGETS, 0x7f2ee6740be0) = -1 ENOTTY (Inappropriate
ioctl for device)
[pid 13499] <... nanosleep resumed> NULL) = 0
[pid 13498] fstat(0, <unfinished ...>
[pid 13499] nanosleep({0, 100000}, <unfinished ...>
[pid 13498] <... fstat resumed> {st_mode=S_IFREG|0644, st_size=3025, ...})
= 0
[pid 13498] read(0, "# -*- text -*-\n#\n# Copyright (c)"..., 8192) = 3025
[pid 13498] read(0, "", 4096) = 0
[pid 13498] read(0, "", 8192) = 0
[pid 13498] ioctl(0, TCGETS, 0x7f2ee6740b40) = -1 ENOTTY (Inappropriate
ioctl for device)
[pid 13498] close(0) = 0
[pid 13499] <... nanosleep resumed> NULL) = 0
[pid 13499] nanosleep({0, 100000}, <unfinished ...>
[pid 13498] write(1, "--------------------------------"...,
768--------------------------------------------------------------------------
WARNING: The accept(3) system call failed on a TCP socket. While this
should generally never happen on a well-configured HPC system, the
most common causes when it does occur are:

* The process ran out of file descriptors
* The operating system ran out of file descriptors
* The operating system ran out of memory

Your Open MPI job will likely hang until the failure resason is fixed
(e.g., more file descriptors and/or memory becomes available), and may
eventually timeout / abort.

Local host: M17xR4
Errno: 9 (Bad file descriptor)
Probable cause: Unknown cause; job will try to continue
--------------------------------------------------------------------------
) = 768

In fact, "Bad file descriptor" first occurs a bit earlier, here:

[pid 13499] open("/proc/self/fd",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 20
[pid 13499] fstat(20, {st_mode=S_IFDIR|0500, st_size=0, ...}) = 0
[pid 13499] getdents(20, /* 25 entries */, 32768) = 600
[pid 13499] close(3) = 0
[pid 13499] close(4) = 0
[pid 13499] close(5) = 0
[pid 13499] close(6) = 0
[pid 13499] close(7) = 0
[pid 13499] close(8) = 0
[pid 13499] close(9) = 0
[pid 13499] close(10) = 0
[pid 13499] close(11) = 0
[pid 13499] close(12) = 0
[pid 13499] close(13) = 0
[pid 13499] close(14) = 0
[pid 13499] close(15) = 0
[pid 13499] close(16) = 0
[pid 13499] close(17) = 0
[pid 13499] close(18) = 0
[pid 13499] close(19) = 0
[pid 13499] close(20) = 0
[pid 13499] getdents(20, 0x1cc04a0, 32768) = -1 EBADF (Bad file descriptor)
[pid 13499] close(20) = -1 EBADF (Bad file descriptor)

Any idea how to fix this? System is Ubuntu 16.04:

Linux M17xR4 4.10.0-42-generic #46~16.04.1-Ubuntu SMP Mon Dec 4 15:57:59
UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Kind regards,
- Dmitry.
Dmitry N. Mikushin
2018-06-02 20:28:29 UTC
Permalink
ping
Post by Dmitry N. Mikushin
Dear all,
Looks like I have a weird issue never encountered before. While trying to
$ cat hello.c
#include <mpi.h>
int main(int argc, char* argv[])
{
MPI_Init(&argc, &argv);
MPI_Finalize();
return 0;
}
$ mpicc hello.c -o hello
$ mpirun -np 1 ./hello
--------------------------------------------------------------------------
WARNING: The accept(3) system call failed on a TCP socket. While this
should generally never happen on a well-configured HPC system, the
* The process ran out of file descriptors
* The operating system ran out of file descriptors
* The operating system ran out of memory
Your Open MPI job will likely hang until the failure resason is fixed
(e.g., more file descriptors and/or memory becomes available), and may
eventually timeout / abort.
Local host: M17xR4
Errno: 9 (Bad file descriptor)
Probable cause: Unknown cause; job will try to continue
--------------------------------------------------------------------------
[pid 13498] accept(0, 0x7f2ec8000960, 0x7f2ee6740e7c) = -1 EBADF (Bad file
descriptor)
[pid 13498] shutdown(0, SHUT_RDWR) = -1 EBADF (Bad file descriptor)
[pid 13498] close(0) = -1 EBADF (Bad file descriptor)
[pid 13498] open("/usr/share/openmpi/help-oob-tcp.txt", O_RDONLY) = 0
[pid 13498] ioctl(0, TCGETS, 0x7f2ee6740be0) = -1 ENOTTY (Inappropriate
ioctl for device)
[pid 13499] <... nanosleep resumed> NULL) = 0
[pid 13498] fstat(0, <unfinished ...>
[pid 13499] nanosleep({0, 100000}, <unfinished ...>
[pid 13498] <... fstat resumed> {st_mode=S_IFREG|0644, st_size=3025, ...})
= 0
[pid 13498] read(0, "# -*- text -*-\n#\n# Copyright (c)"..., 8192) = 3025
[pid 13498] read(0, "", 4096) = 0
[pid 13498] read(0, "", 8192) = 0
[pid 13498] ioctl(0, TCGETS, 0x7f2ee6740b40) = -1 ENOTTY (Inappropriate
ioctl for device)
[pid 13498] close(0) = 0
[pid 13499] <... nanosleep resumed> NULL) = 0
[pid 13499] nanosleep({0, 100000}, <unfinished ...>
[pid 13498] write(1, "--------------------------------"...,
768---------------------------------------------------------
-----------------
WARNING: The accept(3) system call failed on a TCP socket. While this
should generally never happen on a well-configured HPC system, the
* The process ran out of file descriptors
* The operating system ran out of file descriptors
* The operating system ran out of memory
Your Open MPI job will likely hang until the failure resason is fixed
(e.g., more file descriptors and/or memory becomes available), and may
eventually timeout / abort.
Local host: M17xR4
Errno: 9 (Bad file descriptor)
Probable cause: Unknown cause; job will try to continue
--------------------------------------------------------------------------
) = 768
[pid 13499] open("/proc/self/fd", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC)
= 20
[pid 13499] fstat(20, {st_mode=S_IFDIR|0500, st_size=0, ...}) = 0
[pid 13499] getdents(20, /* 25 entries */, 32768) = 600
[pid 13499] close(3) = 0
[pid 13499] close(4) = 0
[pid 13499] close(5) = 0
[pid 13499] close(6) = 0
[pid 13499] close(7) = 0
[pid 13499] close(8) = 0
[pid 13499] close(9) = 0
[pid 13499] close(10) = 0
[pid 13499] close(11) = 0
[pid 13499] close(12) = 0
[pid 13499] close(13) = 0
[pid 13499] close(14) = 0
[pid 13499] close(15) = 0
[pid 13499] close(16) = 0
[pid 13499] close(17) = 0
[pid 13499] close(18) = 0
[pid 13499] close(19) = 0
[pid 13499] close(20) = 0
[pid 13499] getdents(20, 0x1cc04a0, 32768) = -1 EBADF (Bad file descriptor)
[pid 13499] close(20) = -1 EBADF (Bad file descriptor)
Linux M17xR4 4.10.0-42-generic #46~16.04.1-Ubuntu SMP Mon Dec 4 15:57:59
UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Kind regards,
- Dmitry.
Loading...