[OMPI users] running mpi program between my PC and an ARM-architektur raspberry

Discussion:

peng jia

2018-04-01 02:57:15 UTC

Hello!

I would like to run some MPI code with a cluster out of a normal laptop and
a ARM-architektur raspberrypi, but unsuccessfully the system would not
response, even though i install openmpi manuelly on both pc and raspberrypi.

***@pc:~ $ mpiexec -H pc,raspberrypi001 hello_world

i would like to ask, is it possible to run such a MPI porgram in a so
heterogeneous
cluster? if not possible, why? i am really really confuse about it, could
you please teach me?

best regards
Peng

Jeff Squyres (jsquyres)

2018-04-02 17:19:19 UTC

Permalink

I would like to run some MPI code with a cluster out of a normal laptop and a ARM-architektur raspberrypi, but unsuccessfully the system would not response, even though i install openmpi manuelly on both pc and raspberrypi.
i would like to ask, is it possible to run such a MPI porgram in a so heterogeneous cluster? if not possible, why? i am really really confuse about it, could you please teach me?

Heterogeneous support is, at best, not well supported. I would consider it a topic for an advanced user, and even so, our heterogeneous support inside Open MPI is probably not well tested these days. I would not recommend mixing machines with different data sizes and/or representations in a single job. There are many issues that come up; the easiest to describe is: what should MPI do when a message sent of type X is sent between two machines where X is a different size? Should MPI truncate the data? Should it round? Should it error? ...? There's no good answer.

My advice: run your job exclusively on your Raspberry Pi's *or* your closer of laptops, and you should probably be ok.

--
Jeff Squyres
***@cisco.com

dpchoudh .

2018-04-02 17:39:30 UTC

Permalink

Sorry for a pedantic follow up:

Is this (heterogeneous cluster support) something that is specified by
the MPI standard (perhaps as an optional component)? Do people know if
MPICH. MVAPICH, Intel MPI etc support it? (I do realize this is an
OpenMPI forum)

The reason I ask is that I have a mini Linux lab of sort that consists
of Linux running on many architectures, both 32 and 64 bit and both LE
and BE. Some have advanced fabrics, but all have garden variety
Ethernet. I mostly use this for software porting work, but I'd love to
set it up as a test bench for testing OpenMPI in a heterogeneous
environment and report issues, if that is something that the
developers want to achieve.

Thanks
Durga

$man why dump woman?
man: too many arguments

On Mon, Apr 2, 2018 at 1:19 PM, Jeff Squyres (jsquyres)

Post by Jeff Squyres (jsquyres)

Heterogeneous support is, at best, not well supported. I would consider it a topic for an advanced user, and even so, our heterogeneous support inside Open MPI is probably not well tested these days. I would not recommend mixing machines with different data sizes and/or representations in a single job. There are many issues that come up; the easiest to describe is: what should MPI do when a message sent of type X is sent between two machines where X is a different size? Should MPI truncate the data? Should it round? Should it error? ...? There's no good answer.
My advice: run your job exclusively on your Raspberry Pi's *or* your closer of laptops, and you should probably be ok.
--
Jeff Squyres
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users

Jeff Squyres (jsquyres)

2018-04-03 23:39:03 UTC

Permalink

Post by dpchoudh .
Is this (heterogeneous cluster support) something that is specified by
the MPI standard (perhaps as an optional component)?

The MPI standard states that if you send a message, you should receive the same values at the receiver. E.g., if you sent int=3, you should receive int=3, even if one machine is big endian and the other machine is little endian.

It does not specify what happens when data sizes are different (e.g., if type X is 4 bits on one side and 8 bits on the other) -- there's no good answers on what to do there.

Post by dpchoudh .
Do people know if
MPICH. MVAPICH, Intel MPI etc support it? (I do realize this is an
OpenMPI forum)

I don't know offhand. I know that this kind of support is very unpopular with MPI implementors because:

1. Nearly nobody uses it (we get *at most* one request a year to properly support BE<-->LE transformation).
2. It's difficult to implement BE<-->LE transformation properly without causing at least some performance loss and/or code complexity in the main datatype engine.
3. It is very difficult for MPI implementors to test properly (especially in automated environments).

#1 is probably the most important reason. If lots of people were asking for this, MPI implementors would take the time to figure out #2 and #3. But since almost no one asks for it, it gets pushed (waaaaaay) down on the priority list of things to implement.

Sorry -- just being brutally honest here. :-\

Post by dpchoudh .
The reason I ask is that I have a mini Linux lab of sort that consists
of Linux running on many architectures, both 32 and 64 bit and both LE
and BE. Some have advanced fabrics, but all have garden variety
Ethernet. I mostly use this for software porting work, but I'd love to
set it up as a test bench for testing OpenMPI in a heterogeneous
environment and report issues, if that is something that the
developers want to achieve.

Effectively, the current set of Open MPI developers have not put up any resources to fix, update, and maintain the BE<-->LE transformation in the Open MPI datatype engine. I don't think that there are any sane answers for what to do when datatypes are different sizes.

However, that being said, Open MPI is an open source community -- if someone wants to contribute pull requests and/or testing to support this feature, that would be great!

--
Jeff Squyres
***@cisco.com

Gilles Gouaillardet

2018-04-04 00:24:25 UTC

Permalink

Let me shed a different light on that.

Once in a while, I run Open MPI between x86_64 and sparcv9, and it works
quite well as far as I am concerned.

Note this is the master branch, and I never try older nor releases branches.

Note you likely need to configure Open MPI with --enable-heterogeneous
on both arch.

If you are still unlucky, then I suggest you download and build the
3.1.0rc3 version (with --enable-debug --enable-heterogeneous)

and then

mpirun --mca oob_base_verbose 10 --mca pml_base_verbose 10 --host ...
hostname

and then

mpirun --mca oob_base_verbose 10 --mca pml_base_verbose 10 --mca
btl_base_verbose 10 --host ... mpi_helloworld

and either open a github issue or post the logs on this ML

Cheers,

Gilles

Post by Jeff Squyres (jsquyres)

Post by dpchoudh .
Is this (heterogeneous cluster support) something that is specified by
the MPI standard (perhaps as an optional component)?

The MPI standard states that if you send a message, you should receive the same values at the receiver. E.g., if you sent int=3, you should receive int=3, even if one machine is big endian and the other machine is little endian.
It does not specify what happens when data sizes are different (e.g., if type X is 4 bits on one side and 8 bits on the other) -- there's no good answers on what to do there.

Post by dpchoudh .
Do people know if
MPICH. MVAPICH, Intel MPI etc support it? (I do realize this is an
OpenMPI forum)

1. Nearly nobody uses it (we get *at most* one request a year to properly support BE<-->LE transformation).
2. It's difficult to implement BE<-->LE transformation properly without causing at least some performance loss and/or code complexity in the main datatype engine.
3. It is very difficult for MPI implementors to test properly (especially in automated environments).
#1 is probably the most important reason. If lots of people were asking for this, MPI implementors would take the time to figure out #2 and #3. But since almost no one asks for it, it gets pushed (waaaaaay) down on the priority list of things to implement.
Sorry -- just being brutally honest here. :-\

Effectively, the current set of Open MPI developers have not put up any resources to fix, update, and maintain the BE<-->LE transformation in the Open MPI datatype engine. I don't think that there are any sane answers for what to do when datatypes are different sizes.
However, that being said, Open MPI is an open source community -- if someone wants to contribute pull requests and/or testing to support this feature, that would be great!

George Reeke

2018-04-04 15:35:42 UTC

Permalink

Dear colleagues,
FWIW, years ago I was looking at this problem and developed my
own solution (for C programs) with this structure:
--Be sure your code that works with ambiguous-length types like
'long' can handle different sizes. I have replacement unambiguous
typedef names like 'si32', 'ui64' etc. for the usual signed and
unsigned fixed-point numbers.
--Run your source code through a utility that analyzes a specified
set of variables, structures, and unions that will be used in
messages and builds tables giving their included types. Include
these tables in your makefiles.
--Replace malloc, calloc, realloc, free with my own versions,
where you pass a type argument pointing into to this table along
with number of items, etc. There are separate memory pools for
items that will be passed often, rarely, or never, just to make
things more efficient.
--Do all these calls on the rank 0 processor at program startup and
call a special broadcast routine that sets up data structures on
all the other processors to manage the conversions.
--Replace mpi message passing and broadcast calls with new routines
that use the type information (stored by malloc, calloc, etc.) to
determine what variables to lengthen or shorten or swap on arrival
at the destination. Regular mpi message passing is used inside
these routines and can be used natively for variables that do not
ever need length changes or byte swapping (i.e. text). I have a
simple set of routines to gather statistics across nodes with sum,
max, etc. operations, but not too fancy. I do not have versions of
any of the mpi operations that collect or distribute matrices, etc.
--A little routine must be written for every union. This is called
from the package when a union is received to determine which
member is present so the right conversion can be done.
--There was a hook to handle IBM (hex exponent) vs IEEE floating
point, but the code never got written.
Because this is all very complicated and demanding on the
programmer, I am not making it publicly available, but will be
glad to send it privately to anyone who really thinks they can
use it and is willing to get their hands dirty.

Post by Jeff Squyres (jsquyres)

Post by dpchoudh .
Is this (heterogeneous cluster support) something that is specified by
the MPI standard (perhaps as an optional component)?

The MPI standard states that if you send a message, you should receive the same values at the receiver. E.g., if you sent int=3, you should receive int=3, even if one machine is big endian and the other machine is little endian.
It does not specify what happens when data sizes are different (e.g., if type X is 4 bits on one side and 8 bits on the other) -- there's no good answers on what to do there.

Post by dpchoudh .
Do people know if
MPICH. MVAPICH, Intel MPI etc support it? (I do realize this is an
OpenMPI forum)

George Bosilca

2018-04-04 15:57:40 UTC

Permalink

We can always build complicated solutions, but in some cases sane and
simple solutions exists. Let me clear some of the misinformation in this
thread.

The MPI standard is clear what type of conversion is allowed and how it
should be done (for more info read Chapter 4): no type conversion is
allowed (don't send a long and expect a short), for everything else
truncation to a sane value is the rule. This is nothing new, the rules are
similar to other data conversion standards such as XDR. Thus, if you send
an MPI_LONG from a machine where long is 8 bytes to an MPI_LONG on a
machine where it is 4 bytes, you will get a valid number when possible,
otherwise [MAX|MIN]_LONG on the target machine. For floating point data the
rules are more complicated due to potential exponent and mantissa length
mismatch, but in general if the data is representable on the target
architecture a sane value is obtained. Otherwise, the data will be replaced
with one of the extremes. This also applies to file operations for as long
as the correct external32 type is used.

The datatype engine in Open MPI supports all these conversions, for as long
as the source and target machine are correctly identified. This
identification is only enabled when OMPI is compiled with support for
heterogeneous architectures.

George.

Post by George Reeke
Dear colleagues,
FWIW, years ago I was looking at this problem and developed my
--Be sure your code that works with ambiguous-length types like
'long' can handle different sizes. I have replacement unambiguous
typedef names like 'si32', 'ui64' etc. for the usual signed and
unsigned fixed-point numbers.
--Run your source code through a utility that analyzes a specified
set of variables, structures, and unions that will be used in
messages and builds tables giving their included types. Include
these tables in your makefiles.
--Replace malloc, calloc, realloc, free with my own versions,
where you pass a type argument pointing into to this table along
with number of items, etc. There are separate memory pools for
items that will be passed often, rarely, or never, just to make
things more efficient.
--Do all these calls on the rank 0 processor at program startup and
call a special broadcast routine that sets up data structures on
all the other processors to manage the conversions.
--Replace mpi message passing and broadcast calls with new routines
that use the type information (stored by malloc, calloc, etc.) to
determine what variables to lengthen or shorten or swap on arrival
at the destination. Regular mpi message passing is used inside
these routines and can be used natively for variables that do not
ever need length changes or byte swapping (i.e. text). I have a
simple set of routines to gather statistics across nodes with sum,
max, etc. operations, but not too fancy. I do not have versions of
any of the mpi operations that collect or distribute matrices, etc.
--A little routine must be written for every union. This is called
from the package when a union is received to determine which
member is present so the right conversion can be done.
--There was a hook to handle IBM (hex exponent) vs IEEE floating
point, but the code never got written.
Because this is all very complicated and demanding on the
programmer, I am not making it publicly available, but will be
glad to send it privately to anyone who really thinks they can
use it and is willing to get their hands dirty.

Post by Jeff Squyres (jsquyres)

Post by dpchoudh .
Is this (heterogeneous cluster support) something that is specified by
the MPI standard (perhaps as an optional component)?

The MPI standard states that if you send a message, you should receive

the same values at the receiver. E.g., if you sent int=3, you should
receive int=3, even if one machine is big endian and the other machine is
little endian.

Post by Jeff Squyres (jsquyres)
It does not specify what happens when data sizes are different (e.g., if

type X is 4 bits on one side and 8 bits on the other) -- there's no good
answers on what to do there.

Post by Jeff Squyres (jsquyres)

Post by dpchoudh .
Do people know if
MPICH. MVAPICH, Intel MPI etc support it? (I do realize this is an
OpenMPI forum)

_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users

George Reeke

2018-04-04 16:52:10 UTC

Permalink

Post by George Bosilca
We can always build complicated solutions, but in some cases sane and
simple solutions exists. Let me clear some of the misinformation in
this thread.

Oh, well, when I wrote the stuff I described earlier, it was before
MPI existed, or at least before I had heard of it, and I had access
only to a proprietary message passing library that just transmitted
byte strings. Clearly you are right about simplicity for most cases.
My solution might still occasionally be useful for its ability to
collect a whole tree of setup information (for example, for a
multi-layer neural network) with compile-time code analysis into a
"memory pool" and broadcast the whole thing with a single call
when changes occur at various stages of a computation, as this was
my original purpose (and I still use it after replacing the original
message passing calls with corresponding mpi calls even though all
the processors are now equivalent Intel cpus). Did I mention:
pointers within the data tree are maintained on each processor with
likely different values. Oh, and if anybody wants this, you have to
accept GPL licensing.
George Reeke