Discussion:
[OMPI users] Strange benchmarks at large message sizes
Cooper Burns
2017-09-19 14:56:34 UTC
Permalink
Hello,

I have been running some simple benchmarks and saw some strange behaviour:
All tests are done on 4 nodes with 24 cores each (total of 96 mpi processes)

When I run MPI_Allreduce() I see the run time spike up (about 10x) when I
go from reducing a total of 4096KB to 8192KB for example, when count is
2^21 (8192 kb of 4 byte ints):

MPI_Allreduce(send_buf, recv_buf, count, MPI_SUM, MPI_COMM_WORLD)

is slower than:

MPI_Allreduce(send_buf, recv_buf, count*/2*, MPI_INT, MPI_SUM,
MPI_COMM_WORLD)
MPI_Allreduce(send_buf* + count/2*, recv_buf *+ count/2*, count*/2*,MPI_INT,
MPI_SUM, MPI_COMM_WORLD)

Just wondering if anyone knows what the cause of this behaviour is.

Thanks!
Cooper


Cooper Burns
Senior Research Engineer
<https://www.linkedin.com/company/convergent-science-inc>
<https://www.facebook.com/ConvergentScience>
<https://twitter.com/convergecfd>
<https://www.youtube.com/user/convergecfd> <https://vimeo.com/convergecfd>
(608) 230-1551
convergecfd.com
<https://convergecfd.com/?utm_source=Email&utm_medium=signature&utm_campaign=CSIEmailSignature>
Howard Pritchard
2017-09-19 20:44:02 UTC
Permalink
Hello Cooper

Could you rerun your test with the following env. variable set

export OMPI_MCA_coll=self,basic,libnbc

and see if that helps?

Also, what type of interconnect are you using - ethernet, IB, ...?

Howard
Post by Cooper Burns
Hello,
All tests are done on 4 nodes with 24 cores each (total of 96 mpi processes)
When I run MPI_Allreduce() I see the run time spike up (about 10x) when I
go from reducing a total of 4096KB to 8192KB for example, when count is
MPI_Allreduce(send_buf, recv_buf, count, MPI_SUM, MPI_COMM_WORLD)
MPI_Allreduce(send_buf, recv_buf, count*/2*, MPI_INT, MPI_SUM,
MPI_COMM_WORLD)
MPI_Allreduce(send_buf* + count/2*, recv_buf *+ count/2*, count*/2*,MPI_INT,
MPI_SUM, MPI_COMM_WORLD)
Just wondering if anyone knows what the cause of this behaviour is.
Thanks!
Cooper
Cooper Burns
Senior Research Engineer
<https://www.linkedin.com/company/convergent-science-inc>
<https://www.facebook.com/ConvergentScience>
<https://twitter.com/convergecfd>
<https://www.youtube.com/user/convergecfd>
<https://vimeo.com/convergecfd>
(608) 230-1551
convergecfd.com
<https://convergecfd.com/?utm_source=Email&utm_medium=signature&utm_campaign=CSIEmailSignature>
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Cooper Burns
2017-09-21 14:10:04 UTC
Permalink
Ok I tried that ( sorry for delay... Network issues killed our cluster )

Setting the env variable you suggested changed results, but all it did was
to move the run time spike from between 4mb and 8mb to between 32kb and 64kb

The nodes I'm running on *have* infiniband but i think I am running on
ethernet for these tests.

Any other ideas?

Thanks!
Cooper

Cooper Burns
Senior Research Engineer
<https://www.linkedin.com/company/convergent-science-inc>
<https://www.facebook.com/ConvergentScience>
<https://twitter.com/convergecfd>
<https://www.youtube.com/user/convergecfd> <https://vimeo.com/convergecfd>
(608) 230-1551
convergecfd.com
<https://convergecfd.com/?utm_source=Email&utm_medium=signature&utm_campaign=CSIEmailSignature>
Post by Howard Pritchard
Hello Cooper
Could you rerun your test with the following env. variable set
export OMPI_MCA_coll=self,basic,libnbc
and see if that helps?
Also, what type of interconnect are you using - ethernet, IB, ...?
Howard
Post by Cooper Burns
Hello,
All tests are done on 4 nodes with 24 cores each (total of 96 mpi processes)
When I run MPI_Allreduce() I see the run time spike up (about 10x) when I
go from reducing a total of 4096KB to 8192KB for example, when count is
MPI_Allreduce(send_buf, recv_buf, count, MPI_SUM, MPI_COMM_WORLD)
MPI_Allreduce(send_buf, recv_buf, count*/2*, MPI_INT, MPI_SUM,
MPI_COMM_WORLD)
MPI_Allreduce(send_buf* + count/2*, recv_buf *+ count/2*, count*/2*,MPI_INT,
MPI_SUM, MPI_COMM_WORLD)
Just wondering if anyone knows what the cause of this behaviour is.
Thanks!
Cooper
Cooper Burns
Senior Research Engineer
<https://www.linkedin.com/company/convergent-science-inc>
<https://www.facebook.com/ConvergentScience>
<https://twitter.com/convergecfd>
<https://www.youtube.com/user/convergecfd>
<https://vimeo.com/convergecfd>
(608) 230-1551
convergecfd.com
<https://convergecfd.com/?utm_source=Email&utm_medium=signature&utm_campaign=CSIEmailSignature>
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Gilles Gouaillardet
2017-09-21 14:52:33 UTC
Permalink
Unless you are using mxm, you can disable tcp with

mpirun --mca pml ob1 --mca btl ^tcp ...

coll/tuned select an algorithm based on communicator size and message size. The spike could occur because a suboptimal (on your cluster and with your job topology) algo is selected.

Note you can force an algo, or redefine the rules of algo selection.

Cheers,

Gilles
Post by Cooper Burns
Ok I tried that ( sorry for delay... Network issues killed our cluster )
Setting the env variable you suggested changed results, but all it did was to move the run time spike from between 4mb and 8mb to between 32kb and 64kb
The nodes I'm running on have infiniband but i think I am running on ethernet for these tests.
Any other ideas?
Thanks!
Cooper
Cooper Burns
Senior Research Engineer
ᅩ ᅩ ᅩ ᅩ ᅩ
ᅩ
(608) 230-1551
ï¿Œconvergecfd.com
ᅩ
Hello Cooper
Could you rerun your test with the following env. variable set
export OMPI_MCA_coll=self,basic,libnbc
and see if that helps?
Also, what type of interconnect are you using - ethernet, IB, ...?
Howard
Hello,
All tests are done on 4 nodes with 24 cores each (total of 96 mpi processes)
MPI_Allreduce(send_buf, recv_buf, count, MPI_SUM, MPI_COMM_WORLD)
MPI_Allreduce(send_buf, recv_buf, count/2, MPI_INT, MPI_SUM, MPI_COMM_WORLD)
MPI_Allreduce(send_buf + count/2, recv_buf + count/2, count/2,MPI_INT,  MPI_SUM, MPI_COMM_WORLD)
Just wondering if anyone knows what the cause of this behaviour is.
Thanks!
Cooper
Cooper Burns
Senior Research Engineer
ᅩ ᅩ ᅩ ᅩ ᅩ
ᅩ
(608) 230-1551
ï¿Œconvergecfd.com
ᅩ
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Loading...