specified in the MPI standard.
around each MPI_Bcast() invokation.
in order to get the total time spent in synchronization.
Post by Konstantinos KonstantinidisI do not completely understand whether that involves changing some MPI
code. I have no prior experience with that.
But if I get the idea something like this could potentially work
(assume that comm is the communicator of the groups that communicates
/*clock_t total_time = clock();
*/
/*clock_t sync_time = 0;
*/
/*
*/
/*for each transmission{*/
/*
*/
/* sync_time = sync_time - clock();*/
/* comm.Barrier();
*/
/* sync_time = sync_time + clock();
*/
/*
*/
/* comm.Bcast(...);*/
/*
*/
/*}*/
/*
*/
/*total_time = clock() - total_time;
*/
/*
*/
/*//Total time*/
/*double t_time = double(total_time)/CLOCKS_PER_SEC;
*/
/*
*/
/*//Synchronization time*/
/*double s_time = double(sync_time)/CLOCKS_PER_SEC;
*/
/*
*/
/*//Actual data transmission time*/
/*double d_time = t_time - s_time;*/
I know that I have added a useless barrier call, but do you think that
this can work the way I think it will and at least give some idea of
the synchronization time?
Barrett, I am also working on switching to m4.large instances and will
check if this helps.
Regards,
Kostas
Gilles suggested your best next course of action; time the
MPI_Bcast and MPI_Barrier calls and see if there’s a non-linear
scaling effect as you increase group size.
You mention that you’re using m3.large instances; while this isn’t
the list for in-depth discussion about EC2 instances (the AWS
Forums are better for that), I’ll note that unless you’re tied to
m3 for organizational or reserved instance reasons, you’ll
probably be happier on another instance type. m3 was one of the
last instance families released which does not support Enhanced
Networking. There’s significantly more jitter and latency in the
m3 network stack compared to platforms which support Enhanced
Networking (including the m4 platform). If networking costs are
causing your scaling problems, the first step will be migrating
instance types.
Brian
On Oct 23, 2017, at 4:19 AM, Gilles Gouaillardet
Konstantions,
A simple way is to rewrite MPI_Bcast() and insert timer and
PMPI_Barrier() before invoking the real PMPI_Bcast().
time spent in PMPI_Barrier() can be seen as time NOT spent on actual
data transmission,
and since all tasks are synchronized upon exit, time spent in
PMPI_Bcast() can be seen as time spent on actual data transmission.
this is not perfect, but this is a pretty good approximation.
You can add extra timers so you end up with an idea of how much time
is spent in PMPI_Barrier() vs PMPI_Bcast().
Cheers,
Gilles
On Mon, Oct 23, 2017 at 4:16 PM, Konstantinos Konstantinidis
Post by Konstantinos KonstantinidisIn any case, do you think that the time NOT spent on actual data
transmission can impact the total time of the broadcast
especially when
Post by Konstantinos Konstantinidisthere are so many groups that communicate (please refer to the
numbers I
Post by Konstantinos Konstantinidisgave before if you want to get an idea).
Also, is there any way to quantify this impact i.e. to measure
the time not
Post by Konstantinos Konstantinidisspent on actual data transmissions?
Kostas
On Fri, Oct 20, 2017 at 10:32 PM, Jeff Hammond
Post by Jeff HammondBroadcast is collective but not necessarily synchronous in the
sense you
Post by Konstantinos KonstantinidisPost by Jeff Hammondimply. If you broadcast message size under the eager limit,
the root may
Post by Konstantinos KonstantinidisPost by Jeff Hammondreturn before any non-root processes enter the function. Data
transfer may
rendezvous forces
Post by Konstantinos KonstantinidisPost by Jeff Hammondsynchronization between any two processes but there may still
be asynchrony
Post by Konstantinos KonstantinidisPost by Jeff Hammondbetween different levels of the broadcast tree.
Jeff
On Fri, Oct 20, 2017 at 3:27 PM Konstantinos Konstantinidis
Post by Konstantinos KonstantinidisHi,
I am running some tests on Amazon EC2 and they require a lot of
communication among m3.large instances.
I would like to give you an idea of what kind of
communication takes
5 instances are
each group,
4 instances in
Post by Konstantinos KonstantinidisPost by Jeff HammondPost by Konstantinos Konstantinidisthe group. So within each group, exactly 5 broadcasts take place.
The problem is that if I increase the size of the group from
5 to 10
while, based on
Post by Konstantinos KonstantinidisPost by Jeff HammondPost by Konstantinos Konstantinidissome theoretical results, this is not reasonable.
I want to check if one of the reasons that this is happening
is due to
call MPI_Bcast()
the machines in
actual data
synchronization time?
attached
Post by Konstantinos KonstantinidisPost by Jeff HammondPost by Konstantinos Konstantinidisfile.
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
<https://lists.open-mpi.org/mailman/listinfo/users>
Post by Konstantinos KonstantinidisPost by Jeff Hammond--
Jeff Hammond
http://jeffhammond.github.io/
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
<https://lists.open-mpi.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
<https://lists.open-mpi.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users