[OMPI users] Rounding errors and MPI

Discussion:

Oscar Mojica

2017-01-16 13:28:05 UTC

Hello everybody

I'm having a problem with a parallel program written in fortran. I have a 3D array which is divided in two in the third dimension so thats two processes

perform some operations with a part of the cube, using a subroutine. Each process also has the complete cube. Before each process call the subroutine,

I compare its sub array with its corresponding part of the whole cube. These are the same. The subroutine simply performs point-to-point operations in a loop, i.e.

do k=k1,k2
do j=1,nhx
do it=1,nt
wave(it,j,k)= wave(it,j,k)*dt/(dx+dy)*(dhx+dhy)/(dx+dy)
end do
end do
enddo

where, wave is the 3D array and the other values are constants.

After leaving the subroutine I notice that there is a difference in the values calculated by process 1 compared to the values that I get if the whole cube is passed to the subroutine but that this only works on its part, i.e.

--- complete 2017-01-12 10:30:23.000000000 -0400
+++ half 2017-01-12 10:34:57.000000000 -0400
@@ -4132545,7 +4132545,7 @@
-2.5386049E-04
-2.9899486E-04
-3.4697619E-04
- -3.7867704E-04
+ -3.7867710E-04
0.0000000E+00
0.0000000E+00
0.0000000E+00

When I do this with more processes the same thing happens with all processes other than zero. I find it very strange. I am disabling the optimization when compiling.

In the end the results are visually the same, but not numerically. I am working with simple precision.

Any idea what may be going on? I do not know if this is related to MPI

Oscar Mojica
Geologist Ph.D. in Geophysics
SENAI CIMATEC Supercomputing Center
Lattes: http://lattes.cnpq.br/0796232840554652

Tim Prince via users

2017-01-16 13:45:03 UTC

Permalink

You might try inserting parentheses so as to specify your preferred order of evaluation. If using ifort, you would need -assume protect-parens .

Sent via the ASUS PadFone X mini, an AT&T 4G LTE smartphone

-------- Original Message --------
From:Oscar Mojica <***@hotmail.com>
Sent:Mon, 16 Jan 2017 08:28:05 -0500
To:Open MPI User's List <***@lists.open-mpi.org>
Subject:[OMPI users] Rounding errors and MPI

>_______________________________________________
>users mailing list
>***@lists.open-mpi.org
>https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Yann Jobic

2017-01-16 14:01:40 UTC

Permalink

Hi,

Is there an overlapping section in the MPI part ?

Otherwise, please check :
- declaration type of all the variables (consistency)
- correct initialization of the array "wave" (to zero)
- maybe use temporary variables like
real size1,size2,factor
size1 = dx+dy
size2 = dhx+dhy
factor = dt*size2/(size1**2)
and then in the big loop:
wave(it,j,k)= wave(it,j,k)*factor
The code will also run faster.

Yann

Le 16/01/2017 à 14:28, Oscar Mojica a écrit :
> Hello everybody
>
> I'm having a problem with a parallel program written in fortran. I
> have a 3D array which is divided in two in the third dimension so
> thats two processes
>
> perform some operations with a part of the cube, usinga subroutine.
> Each process also has the complete cube. Before each process call the
> subroutine,
>
> I compare its sub array with its corresponding part of the whole cube.
> These are the same. The subroutine simply performs point-to-point
> operations in a loop, i.e.
>
>
> do k=k1,k2
> do j=1,nhx
> do it=1,nt
> wave(it,j,k)= wave(it,j,k)*dt/(dx+dy)*(dhx+dhy)/(dx+dy)
> end do
> end do
> enddo
>
>
> where, wave is the 3D array and the other values are constants.
>
>
> After leaving the subroutine I notice that there is a difference in
> the values calculated by process 1 compared to the values that I get
> if the whole cube is passed to the subroutine but that this only works
> on its part, i.e.
>
>
> --- complete 2017-01-12 10:30:23.000000000 -0400
> +++ half 2017-01-12 10:34:57.000000000 -0400
> @@ -4132545,7 +4132545,7 @@
> -2.5386049E-04
> -2.9899486E-04
> -3.4697619E-04
> - -3.7867704E-04
> + -3.7867710E-04
> 0.0000000E+00
> 0.0000000E+00
> 0.0000000E+00
>
>
> When I do this with more processes the same thing happens with all
> processes other than zero. I find it very strange. I am disabling the
> optimization when compiling.
>
> In the end the results are visually the same, but not numerically. I
> am working with simple precision.
>
>
> Any idea what may be going on? I do not know if this is related to MPI
>
>
>
> Oscar Mojica
> Geologist Ph.D. in Geophysics
> SENAI CIMATEC Supercomputing Center
> Lattes: http://lattes.cnpq.br/0796232840554652
>
>
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Oscar Mojica

2017-01-16 16:31:57 UTC

Permalink

Thanks guys for your answers.

Actually, the optimization was not disabled, and that was the problem, compiling it with -o0 solves it. Sorry.

Oscar Mojica
Geologist Ph.D. in Geophysics
SENAI CIMATEC Supercomputing Center
Lattes: http://lattes.cnpq.br/0796232840554652

________________________________
From: users <users-***@lists.open-mpi.org> on behalf of Yann Jobic <***@univ-amu.fr>
Sent: Monday, January 16, 2017 12:01 PM
To: Open MPI Users
Subject: Re: [OMPI users] Rounding errors and MPI

Hi,

Is there an overlapping section in the MPI part ?

Otherwise, please check :
- declaration type of all the variables (consistency)
- correct initialization of the array "wave" (to zero)
- maybe use temporary variables like
real size1,size2,factor
size1 = dx+dy
size2 = dhx+dhy
factor = dt*size2/(size1**2)
and then in the big loop:
wave(it,j,k)= wave(it,j,k)*factor
The code will also run faster.

Yann

Le 16/01/2017 à 14:28, Oscar Mojica a écrit :
Hello everybody

I'm having a problem with a parallel program written in fortran. I have a 3D array which is divided in two in the third dimension so thats two processes

perform some operations with a part of the cube, using a subroutine. Each process also has the complete cube. Before each process call the subroutine,

I compare its sub array with its corresponding part of the whole cube. These are the same. The subroutine simply performs point-to-point operations in a loop, i.e.

do k=k1,k2
do j=1,nhx
do it=1,nt
wave(it,j,k)= wave(it,j,k)*dt/(dx+dy)*(dhx+dhy)/(dx+dy)
end do
end do
enddo

where, wave is the 3D array and the other values are constants.

After leaving the subroutine I notice that there is a difference in the values calculated by process 1 compared to the values that I get if the whole cube is passed to the subroutine but that this only works on its part, i.e.

--- complete 2017-01-12 10:30:23.000000000 -0400
+++ half 2017-01-12 10:34:57.000000000 -0400
@@ -4132545,7 +4132545,7 @@
-2.5386049E-04
-2.9899486E-04
-3.4697619E-04
- -3.7867704E-04
+ -3.7867710E-04
0.0000000E+00
0.0000000E+00
0.0000000E+00

When I do this with more processes the same thing happens with all processes other than zero. I find it very strange. I am disabling the optimization when compiling.

In the end the results are visually the same, but not numerically. I am working with simple precision.

Any idea what may be going on? I do not know if this is related to MPI

Oscar Mojica
Geologist Ph.D. in Geophysics
SENAI CIMATEC Supercomputing Center
Lattes: http://lattes.cnpq.br/0796232840554652

Jeff Hammond

2017-01-18 19:38:21 UTC

Permalink

If compiling with -O0 solves the problem, then you should use -assume
protect-parens and/or one of the options discussed in the PDF you will find
at
https://software.intel.com/en-us/articles/consistency-of-floating-point-results-using-the-intel-compiler.
Disabling optimization is a heavy hammer that you don't want to use if you
care about performance at all. If you are using Fortran and MPI, it seems
likely you care about performance.

Jeff

On Mon, Jan 16, 2017 at 8:31 AM, Oscar Mojica <***@hotmail.com> wrote:

> Thanks guys for your answers.
>
>
> Actually, the optimization was not disabled, and that was the problem,
> compiling it with -o0 solves it. Sorry.
>
>
> Oscar Mojica
> Geologist Ph.D. in Geophysics
> SENAI CIMATEC Supercomputing Center
> Lattes: http://lattes.cnpq.br/0796232840554652
>
>
>
> ------------------------------
> *From:* users <users-***@lists.open-mpi.org> on behalf of Yann Jobic <
> ***@univ-amu.fr>
> *Sent:* Monday, January 16, 2017 12:01 PM
> *To:* Open MPI Users
> *Subject:* Re: [OMPI users] Rounding errors and MPI
>
> Hi,
>
> Is there an overlapping section in the MPI part ?
>
> Otherwise, please check :
> - declaration type of all the variables (consistency)
> - correct initialization of the array "wave" (to zero)
> - maybe use temporary variables like
> real size1,size2,factor
> size1 = dx+dy
> size2 = dhx+dhy
> factor = dt*size2/(size1**2)
> and then in the big loop:
> wave(it,j,k)= wave(it,j,k)*factor
> The code will also run faster.
>
> Yann
>
> Le 16/01/2017 Ã 14:28, Oscar Mojica a Ã©crit :
>
> Hello everybody
>
> I'm having a problem with a parallel program written in fortran. I have a
> 3D array which is divided in two in the third dimension so thats two
> processes
>
> perform some operations with a part of the cube, using a subroutine. Each
> process also has the complete cube. Before each process call the
> subroutine,
>
> I compare its sub array with its corresponding part of the whole cube. These
> are the same. The subroutine simply performs point-to-point operations in
> a loop, i.e.
>
>
> do k=k1,k2
> do j=1,nhx
> do it=1,nt
> wave(it,j,k)= wave(it,j,k)*dt/(dx+dy)*(dhx+dhy)/(dx+dy)
> end do
> end do
> enddo
>
>
> where, wave is the 3D array and the other values are constants.
>
>
> After leaving the subroutine I notice that there is a difference in the
> values calculated by process 1 compared to the values that I get if the
> whole cube is passed to the subroutine but that this only works on its
> part, i.e.
>
>
> --- complete 2017-01-12 10:30:23.000000000 -0400
> +++ half 2017-01-12 10:34:57.000000000 -0400
> @@ -4132545,7 +4132545,7 @@
> -2.5386049E-04
> -2.9899486E-04
> -3.4697619E-04
> - -3.7867704E-04
> + -3.7867710E-04
> 0.0000000E+00
> 0.0000000E+00
> 0.0000000E+00
>
>
> When I do this with more processes the same thing happens with all
> processes other than zero. I find it very strange. I am disabling the
> optimization when compiling.
>
> In the end the results are visually the same, but not numerically. I am
> working with simple precision.
>
>
> Any idea what may be going on? I do not know if this is related to MPI
>
>
> Oscar Mojica
> Geologist Ph.D. in Geophysics
> SENAI CIMATEC Supercomputing Center
> Lattes: http://lattes.cnpq.br/0796232840554652
>
>
>
> _______________________________________________
> users mailing ***@lists.open-mpi.orghttps://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>

--
Jeff Hammond
***@gmail.com
http://jeffhammond.github.io/

Jason Maldonis

2017-01-18 20:07:08 UTC

Permalink

Hi Oscar,

I have similar issues that I was never able to fully track down in my code,
but I think you just identified the real problem. If you figure out the
correct options could you please let me know here?

Using the compiler optimizations are important for our code, but if we can
solve this issue with a compile option, that would be huge!

Thank you for sharing this,
Jason

Jason Maldonis
Research Assistant of Professor Paul Voyles
Materials Science Grad Student
University of Wisconsin, Madison
1509 University Ave, Rm 202
Madison, WI 53706
***@wisc.edu

On Wed, Jan 18, 2017 at 1:38 PM, Jeff Hammond <***@gmail.com>
wrote:

> If compiling with -O0 solves the problem, then you should use -assume
> protect-parens and/or one of the options discussed in the PDF you will find
> at https://software.intel.com/en-us/articles/consistency-of-
> floating-point-results-using-the-intel-compiler. Disabling optimization
> is a heavy hammer that you don't want to use if you care about performance
> at all. If you are using Fortran and MPI, it seems likely you care about
> performance.
>
> Jeff
>
> On Mon, Jan 16, 2017 at 8:31 AM, Oscar Mojica <***@hotmail.com>
> wrote:
>
>> Thanks guys for your answers.
>>
>>
>> Actually, the optimization was not disabled, and that was the problem,
>> compiling it with -o0 solves it. Sorry.
>>
>>
>> Oscar Mojica
>> Geologist Ph.D. in Geophysics
>> SENAI CIMATEC Supercomputing Center
>> Lattes: http://lattes.cnpq.br/0796232840554652
>>
>>
>>
>> ------------------------------
>> *From:* users <users-***@lists.open-mpi.org> on behalf of Yann Jobic
>> <***@univ-amu.fr>
>> *Sent:* Monday, January 16, 2017 12:01 PM
>> *To:* Open MPI Users
>> *Subject:* Re: [OMPI users] Rounding errors and MPI
>>
>> Hi,
>>
>> Is there an overlapping section in the MPI part ?
>>
>> Otherwise, please check :
>> - declaration type of all the variables (consistency)
>> - correct initialization of the array "wave" (to zero)
>> - maybe use temporary variables like
>> real size1,size2,factor
>> size1 = dx+dy
>> size2 = dhx+dhy
>> factor = dt*size2/(size1**2)
>> and then in the big loop:
>> wave(it,j,k)= wave(it,j,k)*factor
>> The code will also run faster.
>>
>> Yann
>>
>> Le 16/01/2017 Ã 14:28, Oscar Mojica a Ã©crit :
>>
>> Hello everybody
>>
>> I'm having a problem with a parallel program written in fortran. I have
>> a 3D array which is divided in two in the third dimension so thats two
>> processes
>>
>> perform some operations with a part of the cube, using a subroutine. Each
>> process also has the complete cube. Before each process call the
>> subroutine,
>>
>> I compare its sub array with its corresponding part of the whole cube. These
>> are the same. The subroutine simply performs point-to-point operations
>> in a loop, i.e.
>>
>>
>> do k=k1,k2
>> do j=1,nhx
>> do it=1,nt
>> wave(it,j,k)= wave(it,j,k)*dt/(dx+dy)*(dhx+dhy)/(dx+dy)
>> end do
>> end do
>> enddo
>>
>>
>> where, wave is the 3D array and the other values are constants.
>>
>>
>> After leaving the subroutine I notice that there is a difference in the
>> values calculated by process 1 compared to the values that I get if the
>> whole cube is passed to the subroutine but that this only works on its
>> part, i.e.
>>
>>
>> --- complete 2017-01-12 10:30:23.000000000 -0400
>> +++ half 2017-01-12 10:34:57.000000000 -0400
>> @@ -4132545,7 +4132545,7 @@
>> -2.5386049E-04
>> -2.9899486E-04
>> -3.4697619E-04
>> - -3.7867704E-04
>> + -3.7867710E-04
>> 0.0000000E+00
>> 0.0000000E+00
>> 0.0000000E+00
>>
>>
>> When I do this with more processes the same thing happens with all
>> processes other than zero. I find it very strange. I am disabling the
>> optimization when compiling.
>>
>> In the end the results are visually the same, but not numerically. I am
>> working with simple precision.
>>
>>
>> Any idea what may be going on? I do not know if this is related to MPI
>>
>>
>> Oscar Mojica
>> Geologist Ph.D. in Geophysics
>> SENAI CIMATEC Supercomputing Center
>> Lattes: http://lattes.cnpq.br/0796232840554652
>>
>>
>>
>> _______________________________________________
>> users mailing ***@lists.open-mpi.orghttps://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> ***@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
>
>
>
> --
> Jeff Hammond
> ***@gmail.com
> http://jeffhammond.github.io/
>
> _______________________________________________
> users mailing list
> ***@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>

Oscar Mojica

2017-01-23 13:38:12 UTC

Permalink

Jason

I am sorry for my late answer. For my particular case, following the advice of Jeff gave me the results I wanted. By the way, thanks Jeff for the link, it was quite useful.

I compiled my serial and parallel programs using

-fp-model precise -fp-model source (Linux* or OS X*)

to improve the consistency and reproducibility of floating-point results while limiting the impact on performance. My program is quite large, so I preferred to use this option

because there are certainly other problems that are not only related to reassociation.

Oscar Mojica
Geologist Ph.D. in Geophysics
SENAI CIMATEC Supercomputing Center
Lattes: http://lattes.cnpq.br/0796232840554652

________________________________
From: users <users-***@lists.open-mpi.org> on behalf of Jason Maldonis <***@wisc.edu>
Sent: Wednesday, January 18, 2017 6:07 PM
To: Open MPI Users
Subject: Re: [OMPI users] Rounding errors and MPI

Hi Oscar,

I have similar issues that I was never able to fully track down in my code, but I think you just identified the real problem. If you figure out the correct options could you please let me know here?

Using the compiler optimizations are important for our code, but if we can solve this issue with a compile option, that would be huge!

Thank you for sharing this,
Jason

Jason Maldonis
Research Assistant of Professor Paul Voyles
Materials Science Grad Student
University of Wisconsin, Madison
1509 University Ave, Rm 202
Madison, WI 53706
***@wisc.edu<mailto:***@wisc.edu>

On Wed, Jan 18, 2017 at 1:38 PM, Jeff Hammond <***@gmail.com<mailto:***@gmail.com>> wrote:
If compiling with -O0 solves the problem, then you should use -assume protect-parens and/or one of the options discussed in the PDF you will find at https://software.intel.com/en-us/articles/consistency-of-floating-point-results-using-the-intel-compiler. Disabling optimization is a heavy hammer that you don't want to use if you care about performance at all. If you are using Fortran and MPI, it seems likely you care about performance.

Jeff

On Mon, Jan 16, 2017 at 8:31 AM, Oscar Mojica <***@hotmail.com<mailto:***@hotmail.com>> wrote:

Thanks guys for your answers.

Actually, the optimization was not disabled, and that was the problem, compiling it with -o0 solves it. Sorry.

Oscar Mojica
Geologist Ph.D. in Geophysics
SENAI CIMATEC Supercomputing Center
Lattes: http://lattes.cnpq.br/0796232840554652

________________________________
From: users <users-***@lists.open-mpi.org<mailto:users-***@lists.open-mpi.org>> on behalf of Yann Jobic <***@univ-amu.fr<mailto:***@univ-amu.fr>>
Sent: Monday, January 16, 2017 12:01 PM
To: Open MPI Users
Subject: Re: [OMPI users] Rounding errors and MPI

Hi,

Is there an overlapping section in the MPI part ?

Otherwise, please check :
- declaration type of all the variables (consistency)
- correct initialization of the array "wave" (to zero)
- maybe use temporary variables like
real size1,size2,factor
size1 = dx+dy
size2 = dhx+dhy
factor = dt*size2/(size1**2)
and then in the big loop:
wave(it,j,k)= wave(it,j,k)*factor
The code will also run faster.

Yann

Le 16/01/2017 à 14:28, Oscar Mojica a écrit :
Hello everybody

I'm having a problem with a parallel program written in fortran. I have a 3D array which is divided in two in the third dimension so thats two processes

perform some operations with a part of the cube, using a subroutine. Each process also has the complete cube. Before each process call the subroutine,

I compare its sub array with its corresponding part of the whole cube. These are the same. The subroutine simply performs point-to-point operations in a loop, i.e.

do k=k1,k2
do j=1,nhx
do it=1,nt
wave(it,j,k)= wave(it,j,k)*dt/(dx+dy)*(dhx+dhy)/(dx+dy)
end do
end do
enddo

where, wave is the 3D array and the other values are constants.

After leaving the subroutine I notice that there is a difference in the values calculated by process 1 compared to the values that I get if the whole cube is passed to the subroutine but that this only works on its part, i.e.

--- complete 2017-01-12 10:30:23.000000000 -0400
+++ half 2017-01-12 10:34:57.000000000 -0400
@@ -4132545,7 +4132545,7 @@
-2.5386049E-04
-2.9899486E-04
-3.4697619E-04
- -3.7867704E-04
+ -3.7867710E-04
0.0000000E+00
0.0000000E+00
0.0000000E+00

When I do this with more processes the same thing happens with all processes other than zero. I find it very strange. I am disabling the optimization when compiling.

In the end the results are visually the same, but not numerically. I am working with simple precision.

Any idea what may be going on? I do not know if this is related to MPI

Oscar Mojica
Geologist Ph.D. in Geophysics
SENAI CIMATEC Supercomputing Center
Lattes: http://lattes.cnpq.br/0796232840554652

_______________________________________________
users mailing list
***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

--
Jeff Hammond
***@gmail.com<mailto:***@gmail.com>
http://jeffhammond.github.io/