Konstantinos Konstantinidis
2017-10-17 06:30:32 UTC
I have implemented some algorithms in C++ which are greatly affected by
shuffling time among nodes which is done by some broadcast calls. Up to
now, I have been testing them by running something like
mpirun -mca btl ^openib -mca plm_rsh_no_tree_spawn 1 ./my_test
which I think make MPI_Bcast to work serially. Now, I want to improve the
communication time so I have configured the appropriate SSH access from
every node to every other node and I have enabled the binary tree
implementation of Open MPI collective calls by running
mpirun -mca btl ^openib ./my_test
My problem is that throughout various experiments with files of different
sizes, I realized that there is no improvement in terms of transmission
time even though theoretically I would expect a gain of approximately
(log(k))/(k-1) where k is the size of the group that the communication
takes place within.
I compile the code with
mpic++ my_test.cc -o my_test
and all of the experiments are done on Amazon EC2 r3.large or m3.large
machines. I have also set different values of rate limits to avoid bursty
behavior of Amazon's EC2 transmission rate. The Open MPI I have installed
is described on the txt I have attached after running ompi_info.
What can be wrong here?
shuffling time among nodes which is done by some broadcast calls. Up to
now, I have been testing them by running something like
mpirun -mca btl ^openib -mca plm_rsh_no_tree_spawn 1 ./my_test
which I think make MPI_Bcast to work serially. Now, I want to improve the
communication time so I have configured the appropriate SSH access from
every node to every other node and I have enabled the binary tree
implementation of Open MPI collective calls by running
mpirun -mca btl ^openib ./my_test
My problem is that throughout various experiments with files of different
sizes, I realized that there is no improvement in terms of transmission
time even though theoretically I would expect a gain of approximately
(log(k))/(k-1) where k is the size of the group that the communication
takes place within.
I compile the code with
mpic++ my_test.cc -o my_test
and all of the experiments are done on Amazon EC2 r3.large or m3.large
machines. I have also set different values of rate limits to avoid bursty
behavior of Amazon's EC2 transmission rate. The Open MPI I have installed
is described on the txt I have attached after running ompi_info.
What can be wrong here?