Dave Turner
2016-12-14 03:57:40 UTC
[warn] Epoll ADD(4) on fd 1 failed. Old events were 0; read change was 0
(none); write change was 1 (add): Operation not permitted
Gentoo with compiled OpenMPI 2.0.1 and SGE
ompi_info --all file attached
We recently did a maintenance upgrade to our cluster including
moving to OpenMPI 2.0.1. Fortran programs now give the
epoll add error above at the start of a run and the stdout file
freezes until the end of the run when all info is dumped.
I've read about this problem and it seems to be a file lock
issue where OpenMPI and SGE are both trying to lock the
same output file. We have not seen this problem with
previous versions of OpenMPI.
We've tried compiling OpenMPI with and without
specifying --with-libevent=/usr, and I've tried compiling
with --disable-event-epoll and using -mca opal_event_include poll.
Both of these were suggestions from a few years back but
neither affects the problem. I've also tried redirecting the output
manually as:
mpirun -np 4 ./app > file.out
This just locks file.out instead with all the output again being
dumped at the end of the run.
We also do not have this issue with 1.10.4 installed.
Any suggestions? Has anyone else run into this problem?
Dave Turner
(none); write change was 1 (add): Operation not permitted
Gentoo with compiled OpenMPI 2.0.1 and SGE
ompi_info --all file attached
We recently did a maintenance upgrade to our cluster including
moving to OpenMPI 2.0.1. Fortran programs now give the
epoll add error above at the start of a run and the stdout file
freezes until the end of the run when all info is dumped.
I've read about this problem and it seems to be a file lock
issue where OpenMPI and SGE are both trying to lock the
same output file. We have not seen this problem with
previous versions of OpenMPI.
We've tried compiling OpenMPI with and without
specifying --with-libevent=/usr, and I've tried compiling
with --disable-event-epoll and using -mca opal_event_include poll.
Both of these were suggestions from a few years back but
neither affects the problem. I've also tried redirecting the output
manually as:
mpirun -np 4 ./app > file.out
This just locks file.out instead with all the output again being
dumped at the end of the run.
We also do not have this issue with 1.10.4 installed.
Any suggestions? Has anyone else run into this problem?
Dave Turner
--
Work: ***@ksu.edu (785) 532-7791
2219 Engineering Hall, Manhattan KS 66506
Home: ***@gmail.com
cell: (785) 770-5929
Work: ***@ksu.edu (785) 532-7791
2219 Engineering Hall, Manhattan KS 66506
Home: ***@gmail.com
cell: (785) 770-5929