Nicolas Joly
2017-03-03 13:36:27 UTC
Hi,
We just got hit by a problem with sharedfp/lockedfile component under
v2.0.1 (should be identical with v2.0.2). We had 2 instances of an MPI
program running conccurrently on the same input file and using
MPI_File_read_shared() function ...
If the shared file pointer is maintained with the lockedfile
component, a "XXX.lockedfile" is created near to the data
file. Unfortunately, this fixed name will collide with multiple tools
instances ;)
Running 2 instances of the following command line (source code
attached) on the same machine will show the problematic behaviour.
mpirun -n 1 --mca sharedfp lockedfile ./shrread -v input.dat
Confirmed with lsof(8) output :
***@tars [~]> lsof input.dat.lockedfile
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
shrread 5876 njoly 21w REG 0,30 8 13510798885996031 input.dat.lockedfile
shrread 5884 njoly 21w REG 0,30 8 13510798885996031 input.dat.lockedfile
Thanks in advance.
We just got hit by a problem with sharedfp/lockedfile component under
v2.0.1 (should be identical with v2.0.2). We had 2 instances of an MPI
program running conccurrently on the same input file and using
MPI_File_read_shared() function ...
If the shared file pointer is maintained with the lockedfile
component, a "XXX.lockedfile" is created near to the data
file. Unfortunately, this fixed name will collide with multiple tools
instances ;)
Running 2 instances of the following command line (source code
attached) on the same machine will show the problematic behaviour.
mpirun -n 1 --mca sharedfp lockedfile ./shrread -v input.dat
Confirmed with lsof(8) output :
***@tars [~]> lsof input.dat.lockedfile
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
shrread 5876 njoly 21w REG 0,30 8 13510798885996031 input.dat.lockedfile
shrread 5884 njoly 21w REG 0,30 8 13510798885996031 input.dat.lockedfile
Thanks in advance.
--
Nicolas Joly
Cluster & Computing Group
Biology IT Center
Institut Pasteur, Paris.
Nicolas Joly
Cluster & Computing Group
Biology IT Center
Institut Pasteur, Paris.