Discussion:
[OMPI users] Open MPI task scheduler
Jack Bryan
2010-06-20 17:49:12 UTC
Permalink
Hi, all:
I need to design a task scheduler (not PBS job scheduler) on Open MPI cluster.
I need to parallelize an algorithm so that a big problem is decomposed into small tasks, which can be distributed to other worker nodes by the Scheduler and after being solved, the results of these tasks are returned to the manager node with the Scheduler, which will distribute more tasks on the base of the collected results.
I need to use C++ to design the scheduler.
I have searched online and I cannot find any scheduler available for this purpose.
Any help is appreciated.
thanks
Jack
June 19 2010
_________________________________________________________________
Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1
Matthieu Brucher
2010-06-20 18:13:14 UTC
Permalink
Hi Jack,

What you are seeking is the client/server pattern. Have one node act
as a server. It will create a list of tasks or even a graph of tasks
if you have dependencies, and then create clients that will connect to
the server with an RPC protocol (I've done this with a SOAP+TCP
protocol, the severance of the TCP connection meaning that the client
is dead and that its task should be recycled, ités easy to do with
Boost.ASIO and some Python/GCCXML scripts to automatically generate
your code, I've written a skeletton on my blog). You may even have
clients with different sizes or capabilities and tell the server what
each client can do, and then the server may dispatch appropriate
tickets to the clients.

Each client and server can be a MPI process, you don't have to create
all clients inside one MPI process (you may use several if the
smallest resource your batch scheduler allocates is bigger that one of
your tasks). With a batch scheduler, it's better to allocate your
tasks as small as possible so that you can balance the resources you
need.

Matthieu

2010/6/20 Jack Bryan <***@hotmail.com>:
> Hi, all:
> I need to design a task scheduler (not PBS job scheduler) on Open MPI
> cluster.
> I need to parallelize an algorithm so that a big problem is decomposed into
> small tasks, which can be distributed
> to other worker nodes by the Scheduler and after being solved, the results
> of these tasks are returned to the manager node with the Scheduler, which
> will distribute more tasks on the base of the collected results.
> I need to use C++ to design the scheduler.
> I have searched online and I cannot find any scheduler available
> for this purpose.
> Any help is appreciated.
> thanks
> Jack
> June 19  2010
> ________________________________
> Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
> Learn more.
> _______________________________________________
> users mailing list
> ***@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Jack Bryan
2010-06-20 18:44:24 UTC
Permalink
Hi, Matthieu:
Thanks for your help.
Most of your ideas show that what I want to do.
My scheduler should be able to be called from any C++ program, which can put a list of tasks to the scheduler and then the scheduler distributes the tasks to other client nodes.
It may work like in this way:
while(still tasks available) { myScheduler.push(tasks); myScheduler.get(tasks results from client nodes);}

My cluster has 400 nodes with Open MPI. The tasks should be transferred b y MPI protocol.
I am not familiar with RPC Protocol.
If I use Boost.ASIO and some Python/GCCXML script to generate the code, it can be called from C++ program on Open MPI cluster ?
I cannot find the skeletton on your blog.
Would you please tell me where to find it ?
I really appreciate your help.

Jack
June 20 2010
> Date: Sun, 20 Jun 2010 20:13:14 +0200
> From: ***@gmail.com
> To: ***@open-mpi.org
> Subject: Re: [OMPI users] Open MPI task scheduler
>
> Hi Jack,
>
> What you are seeking is the client/server pattern. Have one node act
> as a server. It will create a list of tasks or even a graph of tasks
> if you have dependencies, and then create clients that will connect to
> the server with an RPC protocol (I've done this with a SOAP+TCP
> protocol, the severance of the TCP connection meaning that the client
> is dead and that its task should be recycled, ités easy to do with
> Boost.ASIO and some Python/GCCXML scripts to automatically generate
> your code, I've written a skeletton on my blog). You may even have
> clients with different sizes or capabilities and tell the server what
> each client can do, and then the server may dispatch appropriate
> tickets to the clients.
>
> Each client and server can be a MPI process, you don't have to create
> all clients inside one MPI process (you may use several if the
> smallest resource your batch scheduler allocates is bigger that one of
> your tasks). With a batch scheduler, it's better to allocate your
> tasks as small as possible so that you can balance the resources you
> need.
>
> Matthieu
>
> 2010/6/20 Jack Bryan <***@hotmail.com>:
> > Hi, all:
> > I need to design a task scheduler (not PBS job scheduler) on Open MPI
> > cluster.
> > I need to parallelize an algorithm so that a big problem is decomposed into
> > small tasks, which can be distributed
> > to other worker nodes by the Scheduler and after being solved, the results
> > of these tasks are returned to the manager node with the Scheduler, which
> > will distribute more tasks on the base of the collected results.
> > I need to use C++ to design the scheduler.
> > I have searched online and I cannot find any scheduler available
> > for this purpose.
> > Any help is appreciated.
> > thanks
> > Jack
> > June 19 2010
> > ________________________________
> > Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
> > Learn more.
> > _______________________________________________
> > users mailing list
> > ***@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
>
>
>
> --
> Information System Engineer, Ph.D.
> Blog: http://matt.eifelle.com
> LinkedIn: http://www.linkedin.com/in/matthieubrucher
>
> _______________________________________________
> users mailing list
> ***@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

_________________________________________________________________
The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail.
http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5
Matthieu Brucher
2010-06-20 19:00:06 UTC
Permalink
2010/6/20 Jack Bryan <***@hotmail.com>:
> Hi, Matthieu:
> Thanks for your help.
> Most of your ideas show that what I want to do.
> My scheduler should be able to be called from any C++ program, which can
> put
> a list of tasks to the scheduler and then the scheduler distributes the
> tasks to other client nodes.
> It may work like in this way:
> while(still tasks available) {
> myScheduler.push(tasks);
> myScheduler.get(tasks results from client nodes);
> }

Exactly. In your case, you want only one server, so you must find a
system so that every task can be serialized in the same form. The
easiest way to do so is to serialize your parameter set as an XML
fragment and add the type of task as another field.

> My cluster has 400 nodes with Open MPI. The tasks should be transferred b y
> MPI protocol.

No, they should not ;) MPI can be used, but it is not the easiest way
to do so. You still have to serialize your ticket, and you have to use
some functions that are from MPI2 (so perhaps not as portable as MPI1
functions). Besides, it cannot be used from programs that do not know
of using MPI protocols.

> I am not familiar with  RPC Protocol.

RPC is not a protocol per se. SOAP is. RPC stands for Remote Procedure
Call. It is basically your scheduler that has several functions
clients can call:
- add tickets
- retrieve ticket
- ticket is done

> If I use Boost.ASIO and some Python/GCCXML script to generate the code, it
> can be
> called from C++ program on Open MPI cluster ?

Yes, SOAP is just an XML way of representing the fact that you call a
function on the server. You can use it with C++, Java, ... I use it
with Python to monitor how many tasks are remaining, for instance.

> I cannot find the skeletton on your blog.
> Would you please tell me where to find it ?

It's not complete as some of the work is property of my employer. This
is how I use GCCXML to generate the calling code:
http://matt.eifelle.com/2009/07/21/using-gccxml-to-automate-c-wrappers-creation/
You have some additional code to write, but this is the main idea.

> I really appreciate your help.

No sweat, I hope I can give you correct hints!

Matthieu
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Jack Bryan
2010-06-20 22:17:34 UTC
Permalink
Hi, thank you very much for your help.
What is the meaning of " must find a system so that every task can be serialized in the same form." What is the meaning of "serize " ?
I have no experience of programming with python and XML.
I have studied your blog.
Where can I find a simple example to use the techniques you have said ?
For exmple, I have 5 task (print "hello world !").
I want to use 6 processors to do it in parallel.
One processr is the manager node who distributes tasks and other 5 processorsdo the printing jobs and when they are done, they tell this to the manager noitde.

Boost.Asio is a cross-platform C++ library for network and low-level I/O programming. I have no experiences of using it. Will it take a long time to learn how to use it ?
If the messages are transferred by SOAP+TCP, how the manager node calls it and push task into it ?
Do I need to install SOAP+TCP on my cluster so that I can use it ?

Any help is appreciated.
Jack
June 20 2010
> Date: Sun, 20 Jun 2010 21:00:06 +0200
> From: ***@gmail.com
> To: ***@open-mpi.org
> Subject: Re: [OMPI users] Open MPI task scheduler
>
> 2010/6/20 Jack Bryan <***@hotmail.com>:
> > Hi, Matthieu:
> > Thanks for your help.
> > Most of your ideas show that what I want to do.
> > My scheduler should be able to be called from any C++ program, which can
> > put
> > a list of tasks to the scheduler and then the scheduler distributes the
> > tasks to other client nodes.
> > It may work like in this way:
> > while(still tasks available) {
> > myScheduler.push(tasks);
> > myScheduler.get(tasks results from client nodes);
> > }
>
> Exactly. In your case, you want only one server, so you must find a
> system so that every task can be serialized in the same form. The
> easiest way to do so is to serialize your parameter set as an XML
> fragment and add the type of task as another field.
>
> > My cluster has 400 nodes with Open MPI. The tasks should be transferred b y
> > MPI protocol.
>
> No, they should not ;) MPI can be used, but it is not the easiest way
> to do so. You still have to serialize your ticket, and you have to use
> some functions that are from MPI2 (so perhaps not as portable as MPI1
> functions). Besides, it cannot be used from programs that do not know
> of using MPI protocols.
>
> > I am not familiar with RPC Protocol.
>
> RPC is not a protocol per se. SOAP is. RPC stands for Remote Procedure
> Call. It is basically your scheduler that has several functions
> clients can call:
> - add tickets
> - retrieve ticket
> - ticket is done
>
> > If I use Boost.ASIO and some Python/GCCXML script to generate the code, it
> > can be
> > called from C++ program on Open MPI cluster ?
>
> Yes, SOAP is just an XML way of representing the fact that you call a
> function on the server. You can use it with C++, Java, ... I use it
> with Python to monitor how many tasks are remaining, for instance.
>
> > I cannot find the skeletton on your blog.
> > Would you please tell me where to find it ?
>
> It's not complete as some of the work is property of my employer. This
> is how I use GCCXML to generate the calling code:
> http://matt.eifelle.com/2009/07/21/using-gccxml-to-automate-c-wrappers-creation/
> You have some additional code to write, but this is the main idea.
>
> > I really appreciate your help.
>
> No sweat, I hope I can give you correct hints!
>
> Matthieu
> --
> Information System Engineer, Ph.D.
> Blog: http://matt.eifelle.com
> LinkedIn: http://www.linkedin.com/in/matthieubrucher
>
> _______________________________________________
> users mailing list
> ***@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

_________________________________________________________________
Hotmail is redefining busy with tools for the New Busy. Get more from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2
jody
2010-06-21 06:41:23 UTC
Permalink
Hi

I think your problem can be solved easily on the MPI level.
Just hav you manager execute a loop in which it waits for any message.
Define different message types by their MPI-tags. Once a message
has been received, decide what to do by looking at the tag.

Here i assume that a worker with no job sends a message with the tag
TAG_TASK_REQUEST and then waits to receive a message from the master
with either a new task or the command to exit.
Once a worker has finished a tsk it sends a message with the tag TAG_RESULT,
and then sends a message containing the result.
Here i assume that new tasks can be sent from a different node by using
the tag TAG_NEW_TASK.

The main loop in the Master would be:

while (more_tasks) {
MPI_Recv(&a, MPI_INT, 1, MPI_ANY_SOURCE, MPI_ANY_TAG, &st);
switch (st.MPI_TAG) {
case TAG_TASK_REQUEST:
sendNextTask(st.MPI_SOURCE);
break;
case TAG_RESULT:
collectResult(st.MPI_SOURCE);
break;
case TAG_NEW_TASK:
putNewTaskOnQueue(st.MPI_SOURCE);
break;
}
}


In a worker:

while (go_on) {
MPI_Send(a, MPI_INT, 1, idMaster, TAG_TASK_REQUEST);
MPI_Recv(&TaskDef, TaskType, 1, idMaster, MPI_ANY_TAG, &st);
if (st.MPI_TAG == TAG_STOP) {
go_on=false;
} else {
result=workOnTask(TaskDef, TaskLen);
MPI_Send(a, MPI_INT, 1, idMaster, TAG_RESULT);
MPI_Send(result, resultType, 1, idMaster, TAG_RESULT_CONTENT);
}
}

I hope this helps
Jody

On Mon, Jun 21, 2010 at 12:17 AM, Jack Bryan <***@hotmail.com> wrote:
> Hi,
> thank you very much for your help.
> What is the meaning of " must find a system so that every task can be
> serialized in the same form." What is the meaning of "serize " ?
> I have no experience of programming with python and XML.
> I have studied your blog.
> Where can I find a simple example to use the techniques you have said ?
> For exmple, I have 5 task (print "hello world !").
> I want to use 6 processors to do it in parallel.
> One processr is the manager node who distributes tasks and other 5
> processors
> do the printing jobs and when they are done, they tell this to the manager
> noitde.
>
> Boost.Asio is a cross-platform C++ library for network and low-level I/O
> programming. I have no experiences of using it. Will it take a long time to
> learn
> how to use it ?
> If the messages are transferred by SOAP+TCP, how the manager node calls it
> and push task into it ?
> Do I need to install SOAP+TCP on my cluster so that I can use it ?
>
> Any help is appreciated.
> Jack
> June 20  2010
>> Date: Sun, 20 Jun 2010 21:00:06 +0200
>> From: ***@gmail.com
>> To: ***@open-mpi.org
>> Subject: Re: [OMPI users] Open MPI task scheduler
>>
>> 2010/6/20 Jack Bryan <***@hotmail.com>:
>> > Hi, Matthieu:
>> > Thanks for your help.
>> > Most of your ideas show that what I want to do.
>> > My scheduler should be able to be called from any C++ program, which can
>> > put
>> > a list of tasks to the scheduler and then the scheduler distributes the
>> > tasks to other client nodes.
>> > It may work like in this way:
>> > while(still tasks available) {
>> > myScheduler.push(tasks);
>> > myScheduler.get(tasks results from client nodes);
>> > }
>>
>> Exactly. In your case, you want only one server, so you must find a
>> system so that every task can be serialized in the same form. The
>> easiest way to do so is to serialize your parameter set as an XML
>> fragment and add the type of task as another field.
>>
>> > My cluster has 400 nodes with Open MPI. The tasks should be transferred
>> > b y
>> > MPI protocol.
>>
>> No, they should not ;) MPI can be used, but it is not the easiest way
>> to do so. You still have to serialize your ticket, and you have to use
>> some functions that are from MPI2 (so perhaps not as portable as MPI1
>> functions). Besides, it cannot be used from programs that do not know
>> of using MPI protocols.
>>
>> > I am not familiar with  RPC Protocol.
>>
>> RPC is not a protocol per se. SOAP is. RPC stands for Remote Procedure
>> Call. It is basically your scheduler that has several functions
>> clients can call:
>> - add tickets
>> - retrieve ticket
>> - ticket is done
>>
>> > If I use Boost.ASIO and some Python/GCCXML script to generate the code,
>> > it
>> > can be
>> > called from C++ program on Open MPI cluster ?
>>
>> Yes, SOAP is just an XML way of representing the fact that you call a
>> function on the server. You can use it with C++, Java, ... I use it
>> with Python to monitor how many tasks are remaining, for instance.
>>
>> > I cannot find the skeletton on your blog.
>> > Would you please tell me where to find it ?
>>
>> It's not complete as some of the work is property of my employer. This
>> is how I use GCCXML to generate the calling code:
>>
>> http://matt.eifelle.com/2009/07/21/using-gccxml-to-automate-c-wrappers-creation/
>> You have some additional code to write, but this is the main idea.
>>
>> > I really appreciate your help.
>>
>> No sweat, I hope I can give you correct hints!
>>
>> Matthieu
>> --
>> Information System Engineer, Ph.D.
>> Blog: http://matt.eifelle.com
>> LinkedIn: http://www.linkedin.com/in/matthieubrucher
>>
>> _______________________________________________
>> users mailing list
>> ***@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ________________________________
> Hotmail is redefining busy with tools for the New Busy. Get more from your
> inbox. See how.
> _______________________________________________
> users mailing list
> ***@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
Matthieu Brucher
2010-06-21 07:14:18 UTC
Permalink
2010/6/21 Jack Bryan <***@hotmail.com>:
> Hi,
> thank you very much for your help.
> What is the meaning of " must find a system so that every task can be
> serialized in the same form." What is the meaning of "serize " ?

Serialize is the process of converting an object instance into a
text/binary stream, and to create a new object instance from this
stream. It allows communication of data between processes. With MPI,
you send one data type after another, with serialization, you send
everything in one big chunk.

> I have no experience of programming with python and XML.

Python is not mandatory at all. I use it to automate the wrappers/SOAP
files generation, and to talk to the daemon. You can do this is any
language you are comfortable with.

> I have studied your blog.
> Where can I find a simple example to use the techniques you have said ?

If you are looking for RPC samples, you can ask google with just SOAP
as key, it will find several tutorials on how this works. As Jody
said, you may want something simplier if you can have all tasks in one
MPI process, but once you go on a big cluster with variable resources,
you will be stuck.

> For exmple, I have 5 task (print "hello world !").
> I want to use 6 processors to do it in parallel.
> One processr is the manager node who distributes tasks and other 5
> processors
> do the printing jobs and when they are done, they tell this to the manager
> noitde.

In this case, you have your daemon working in parallel from the batch
scheduler, and then each process asks the daemon for a new ticket. You
may add tickets by talking to the dameon directly, without having to
launch a new job.


> Boost.Asio is a cross-platform C++ library for network and low-level I/O
> programming. I have no experiences of using it. Will it take a long time to
> learn
> how to use it ?

The longest time will not be to master Boost, but more to understand
how to create your TCP server and to serialize your parameters.

> If the messages are transferred by SOAP+TCP, how the manager node calls it
> and push task into it ?

You have to think of SOAP + TCP as just a simple function call that
hides everything. From the client node point of view, it's a simple
function call server.get_ticket(). The manager node will be talked to
by different kind of programs: task programs or by a program that will
push tickets. The latter one will just be another function call like
this in C++:

std::vector<std::string> tickets;
daemon.connect(somewhere, port);
daemon.add_tickets(tickets);

> Do I need to install SOAP+TCP on my cluster so that I can use it ?

As Jody said, you can do things with MPI directly. I would not
recommand it, but this will help you with a fast solution. You will
have to use some MPI2 calls to create a socket on the daemon to talk
to it, and in fact, you will have to create exactly what I proposed,
but less portable.

Matthieu
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Bill Rankin
2010-06-20 20:04:26 UTC
Permalink
On Jun 20, 2010, at 1:49 PM, Jack Bryan wrote:

Hi, all:

I need to design a task scheduler (not PBS job scheduler) on Open MPI cluster.

Quick question - why *not* PBS?

Using shell scripts with the Job Array and Dependent Jobs features of PBS Pro (not sure about Maui/Torque nor SGE) you can implement this in a fairly straight forward manner. It worked for the Bioinformaticists using BLAST.

It just seems that the workflow you are describing is part and partial of what any good workload management system is supposed to do and do well.

Just a thought.

Good luck,

-bill
Jack Bryan
2010-06-20 21:18:58 UTC
Permalink
Thanks for your reply.
My task scheduler is application program level not OS level.
PBS is to ask OS to do the job scheduling.
My scheduler needs to be called by any C++ program to out tasks in to the scheduler and then distribute tasks to worker nodes.
After the tasks are done, the manager node collects the results.
It may work like in this way:
while(still tasks available) { myScheduler.push(tasks); myScheduler.get(tasks results from client nodes);}

Any help is appreciated.
Jack
June 20 2010
> From: ***@sas.com
> To: ***@open-mpi.org
> Date: Sun, 20 Jun 2010 20:04:26 +0000
> Subject: Re: [OMPI users] Open MPI task scheduler
>
>
> On Jun 20, 2010, at 1:49 PM, Jack Bryan wrote:
>
> Hi, all:
>
> I need to design a task scheduler (not PBS job scheduler) on Open MPI cluster.
>
> Quick question - why *not* PBS?
>
> Using shell scripts with the Job Array and Dependent Jobs features of PBS Pro (not sure about Maui/Torque nor SGE) you can implement this in a fairly straight forward manner. It worked for the Bioinformaticists using BLAST.
>
> It just seems that the workflow you are describing is part and partial of what any good workload management system is supposed to do and do well.
>
> Just a thought.
>
> Good luck,
>
> -bill
>
>
> _______________________________________________
> users mailing list
> ***@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

_________________________________________________________________
The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail.
http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4
Loading...