Discussion:
[OMPI users] Segmentation fault (invalid mem ref) at MPI_StartAll (second call)
Paolo Pezzutto
2016-11-25 12:11:33 UTC
Permalink
Dear all,

I am struggling with an invalid memory reference when calling SUB EXC_MPI
(MOD01), and precisely at MPI_StartAll (see comment) below.

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
! ********** file mod01.f90 ************ !
MODULE MOD01

implicit none
include 'mpif.h'
! alternatively
! use mpi
! implicit none
PRIVATE
! ...
INTERFACE exc_mpi
MODULE PROCEDURE exc_mpi
END INTERFACE
PUBLIC exc_mpi

CONTAINS

subroutine exc_mpi (X)
!! send and receive from procs PN0 <-> PN1 and PN0 <-> PN2
real, dimension (ni:ns, m, l), intent(inout) :: X

logical, save :: frstime=.true.
integer, save :: mpitype_sn, mpitype_sp, mpitype_rn, mpitype_rp
integer, save :: requests(4), reqcount
integer :: istatus(MPI_STATUS_SIZE,4), ierr

if (frstime) then
call exc_init()
frstime = .false.
end if
call MPI_StartAll(reqcount,requests,ierr) !! <-- segfault here
call MPI_WaitAll(reqcount,requests,istatus,ierr)
return

contains

subroutine exc_init

integer :: i0, ierrs(12), ktag

nrequests = 0
ierrs=0
ktag = 1

! find i0

if ( condition1 ) then
! send to PN2
call MPI_Type_Vector(m*l, messlengthup(PN2), ns-ni+1, MPI_REAL,
mpitype_sn, ierrs(1))
call MPI_Type_Commit(mpitype_sn, ierrs(3))
call MPI_Send_Init(X(i0, 1, 1), 1, mpitype_sn, PN2-1, ktag,
MPI_COMM_WORLD, requests(reqcount+1), ierrs(5))
! recieve from PN2
call MPI_Type_Vector(m*l, messlengthdo(PN0), ns-ni+1, MPI_REAL,
mpitype_rn, ierrs(2))
call MPI_Type_Commit(mpitype_rn,ierrs(4))
call MPI_Recv_Init(X(nend(irank)+1, 1, 1), 1, mpitype_rn, PN2-1,
ktag+1, MPI_COMM_WORLD, requests(nrequests+2), ierrs(6))
nrequests = nrequests + 2
end if

if ( condition2 ) then
! send and rec PN0 <-> PN1
nrequests = nrequests + 2
end if

return
end subroutine exc_init

end subroutine exc_mpi

! ...

END MODULE MOD01
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

The calls are coming from this other module in a separate file:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

! ********** file mod02.f90 ************ !
MODULE MOD02

use MOD01, only: exc_mpi

IMPLICIT NONE
include 'mpif.h'
! alternatively
! use mpi
! implicit none
PRIVATE

! ...

INTERFACE MYSUB
MODULE PROCEDURE MYSUB
END INTERFACE
PUBLIC MYSUB

CONTAINS

SUBROUTINE MYSUB (Y)

IMPLICIT NONE
REAL, INTENT(INOUT) :: Y(nl:nr, m, l) ! ni<=nl, nr>=ns
REAL, ALLOCATABLE, DIMENSION(:,:,:) :: Y0
!...
allocate ( Y0(n-1:ns, 1:m, 1:l) )

DO i = 1, icount

Y0(nl:nr,:,:) = F3(:,:,:)
call exc_mpi ( Y0(ni:ns, :, :) ) ! <-- segfault here
call mpi_barrier (mpi_comm_world, ierr)
Y0(ni-1,:,:) = 0.
CALL SUB01

END DO
deallocate (Y0)
RETURN

CONTAINS

SUBROUTINE SUB01
!...
FRE: DO iterm = 1, m
DIR: DO iterl = 1, l
DO itern = nl, nr
! Y(itern, iterm, iterl) = some_lin_combination(Y0)
END DO
END DO DIR
END DO FRE

END SUBROUTINE SUB01

! ...
END SUBROUTINE MYSUB

END MODULE MOD02
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

Segmentation fault is raised at runtime when MAIN (actually a sub in a
module) calls MYSUB (in MOD02) for the second time, i.e. just MPI_StartAll
without re-initialisation. The segfault is an invalid mem reference, but I
swear that the bounds aren't changing.

The error is not systematic, in the sense that the program works if
splitting the job up to a certain number of processes, say NPMAX, which
depend on the size of decomposed array (the bigger the size, the higher
NPMAX). With more procs than NPMAX, the program segfaults.

The same issue arises with [gfortran+ompi], [gfortran+mpich], while with
[ifort+mpich] does not always segfault but one process might hang
indefinitely. So I bet it is not strictly an ompi issue, so apologize for
posting here. It is not a single version issue too: same for deb-jessie,
ubuntu 14 and personal 2.0.1 -can share config.log if necessary-.

The only thing in common is glibc (2.19, distro stable). Actually the
backtrace of ifort-mpich lists libpthread.so. I have not tried with
alternative c-libs, nor with newest glibc.

Intel Virtual threading is enabled on all the three archs (one mini hpc and
two pc).

This error is not reported on "serious" archs like nec, sun (ifort+ompi)
and ibm.

I am trying to find a possible MPI workaround for deb-based systems,
maintaining efficiency as much as possible.

As can be seen, MOD02 passes to the exchange procedure (MOD01) a sliced
array Y0 which is non contiguous. But I should not worry because
MPI_Type_Vector is expected to do the remapping job for me.

I could almost overcome the fault (NPMAX growing by one order of magnitude)
is to exchange the dimensions back and forth, but this causes the execution
slowing down approximately a factor of 2.

Initialising at each call ( call exc_init() out of the if statement in sub
exc_mpi ) does solve, but it is totally inefficient since MAIN (not listed)
is looping a lot.

I bet that permanently exchanging X/Y/Y0 dimensions will solve but I do not
want to loose efficiency of the nested cycles like SUB01 (first dimension
is much bigger than the others).

Indeed, MOD02 creates a temporary array. Doing that explicitly doesn't
solve the issue.

Forcing allocation an heap or stack doesn't solve.

Any hint out there?

Thanks for reading

P
George Bosilca
2016-11-25 20:09:37 UTC
Permalink
At the first glance I would say you are confusing the variables counting
your requests, reqcount and nrequests.

George.
Post by Paolo Pezzutto
Dear all,
I am struggling with an invalid memory reference when calling SUB EXC_MPI
(MOD01), and precisely at MPI_StartAll (see comment) below.
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
! ********** file mod01.f90 ************ !
MODULE MOD01
implicit none
include 'mpif.h'
! alternatively
! use mpi
! implicit none
PRIVATE
! ...
INTERFACE exc_mpi
MODULE PROCEDURE exc_mpi
END INTERFACE
PUBLIC exc_mpi
CONTAINS
subroutine exc_mpi (X)
!! send and receive from procs PN0 <-> PN1 and PN0 <-> PN2
real, dimension (ni:ns, m, l), intent(inout) :: X
logical, save :: frstime=.true.
integer, save :: mpitype_sn, mpitype_sp, mpitype_rn, mpitype_rp
integer, save :: requests(4), reqcount
integer :: istatus(MPI_STATUS_SIZE,4), ierr
if (frstime) then
call exc_init()
frstime = .false.
end if
call MPI_StartAll(reqcount,requests,ierr) !! <-- segfault here
call MPI_WaitAll(reqcount,requests,istatus,ierr)
return
contains
subroutine exc_init
integer :: i0, ierrs(12), ktag
nrequests = 0
ierrs=0
ktag = 1
! find i0
if ( condition1 ) then
! send to PN2
call MPI_Type_Vector(m*l, messlengthup(PN2), ns-ni+1, MPI_REAL,
mpitype_sn, ierrs(1))
call MPI_Type_Commit(mpitype_sn, ierrs(3))
call MPI_Send_Init(X(i0, 1, 1), 1, mpitype_sn, PN2-1, ktag,
MPI_COMM_WORLD, requests(reqcount+1), ierrs(5))
! recieve from PN2
call MPI_Type_Vector(m*l, messlengthdo(PN0), ns-ni+1, MPI_REAL,
mpitype_rn, ierrs(2))
call MPI_Type_Commit(mpitype_rn,ierrs(4))
call MPI_Recv_Init(X(nend(irank)+1, 1, 1), 1, mpitype_rn, PN2-1,
ktag+1, MPI_COMM_WORLD, requests(nrequests+2), ierrs(6))
nrequests = nrequests + 2
end if
if ( condition2 ) then
! send and rec PN0 <-> PN1
nrequests = nrequests + 2
end if
return
end subroutine exc_init
end subroutine exc_mpi
! ...
END MODULE MOD01
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
! ********** file mod02.f90 ************ !
MODULE MOD02
use MOD01, only: exc_mpi
IMPLICIT NONE
include 'mpif.h'
! alternatively
! use mpi
! implicit none
PRIVATE
! ...
INTERFACE MYSUB
MODULE PROCEDURE MYSUB
END INTERFACE
PUBLIC MYSUB
CONTAINS
SUBROUTINE MYSUB (Y)
IMPLICIT NONE
REAL, INTENT(INOUT) :: Y(nl:nr, m, l) ! ni<=nl, nr>=ns
REAL, ALLOCATABLE, DIMENSION(:,:,:) :: Y0
!...
allocate ( Y0(n-1:ns, 1:m, 1:l) )
DO i = 1, icount
Y0(nl:nr,:,:) = F3(:,:,:)
call exc_mpi ( Y0(ni:ns, :, :) ) ! <-- segfault here
call mpi_barrier (mpi_comm_world, ierr)
Y0(ni-1,:,:) = 0.
CALL SUB01
END DO
deallocate (Y0)
RETURN
CONTAINS
SUBROUTINE SUB01
!...
FRE: DO iterm = 1, m
DIR: DO iterl = 1, l
DO itern = nl, nr
! Y(itern, iterm, iterl) = some_lin_combination(Y0)
END DO
END DO DIR
END DO FRE
END SUBROUTINE SUB01
! ...
END SUBROUTINE MYSUB
END MODULE MOD02
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Segmentation fault is raised at runtime when MAIN (actually a sub in a
module) calls MYSUB (in MOD02) for the second time, i.e. just MPI_StartAll
without re-initialisation. The segfault is an invalid mem reference, but I
swear that the bounds aren't changing.
The error is not systematic, in the sense that the program works if
splitting the job up to a certain number of processes, say NPMAX, which
depend on the size of decomposed array (the bigger the size, the higher
NPMAX). With more procs than NPMAX, the program segfaults.
The same issue arises with [gfortran+ompi], [gfortran+mpich], while with
[ifort+mpich] does not always segfault but one process might hang
indefinitely. So I bet it is not strictly an ompi issue, so apologize for
posting here. It is not a single version issue too: same for deb-jessie,
ubuntu 14 and personal 2.0.1 -can share config.log if necessary-.
The only thing in common is glibc (2.19, distro stable). Actually the
backtrace of ifort-mpich lists libpthread.so. I have not tried with
alternative c-libs, nor with newest glibc.
Intel Virtual threading is enabled on all the three archs (one mini hpc
and two pc).
This error is not reported on "serious" archs like nec, sun (ifort+ompi)
and ibm.
I am trying to find a possible MPI workaround for deb-based systems,
maintaining efficiency as much as possible.
As can be seen, MOD02 passes to the exchange procedure (MOD01) a sliced
array Y0 which is non contiguous. But I should not worry because
MPI_Type_Vector is expected to do the remapping job for me.
I could almost overcome the fault (NPMAX growing by one order of
magnitude) is to exchange the dimensions back and forth, but this causes
the execution slowing down approximately a factor of 2.
Initialising at each call ( call exc_init() out of the if statement in sub
exc_mpi ) does solve, but it is totally inefficient since MAIN (not listed)
is looping a lot.
I bet that permanently exchanging X/Y/Y0 dimensions will solve but I do
not want to loose efficiency of the nested cycles like SUB01 (first
dimension is much bigger than the others).
Indeed, MOD02 creates a temporary array. Doing that explicitly doesn't
solve the issue.
Forcing allocation an heap or stack doesn't solve.
Any hint out there?
Thanks for reading
P
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Loading...