Discussion:
[OMPI users] MPI + system() call + Matlab MEX crashes
j***@gmail.com
2016-10-05 08:32:31 UTC
Permalink
Hello,

I have an application in C++(main.cpp) that is launched with multiple
processes via mpirun. Master process calls matlab via system('matlab
-nosplash -nodisplay -nojvm -nodesktop -r "interface"'), which executes
simple script interface.m that calls mexFunction (mexsolve.cpp) from which
I try to set up communication with the rest of the processes launched at
the beginning together with the master process. When I run the application
as listed below on two different machines I experience:

1) crash at MPI_Init() in the mexFunction() on cluster machine with
Linux 4.4.0-22-generic

2) error in MPI_Send() shown below on local machine with
Linux 3.10.0-229.el7.x86_64
[archimedes:31962] shmem: mmap: an error occurred while determining whether
or not /tmp/openmpi-sessions-***@archimedes_0/58444/1/shared_mem_pool.archimedes
could be created.
[archimedes:31962] create_and_attach: unable to create shared memory BTL
coordinating structure :: size 134217728
[archimedes:31962] shmem: mmap: an error occurred while determining whether
or not /tmp/openmpi-sessions-***@archimedes_0/58444/1/0/vader_segment.archimedes.0
could be created.
[archimedes][[58444,1],0][../../../../../opal/mca/btl/tcp/
btl_tcp_endpoint.c:800:mca_btl_tcp_endpoint_complete_connect] connect() to
<MY_IP> failed: Connection refused (111)

I launch application as following:
mpirun --mca mpi_warn_on_fork 0 --mca btl_openib_want_fork_support 1 -np 2
-npernode 1 ./main

I have openmpi-2.0.1 configured with --prefix=${INSTALLDIR}
--enable-mpi-fortran=all --with-pmi --disable-dlopen

For more details, the code is here: https://github.com/goghino/matlabMpiC

Thanks for any suggestions!

Juraj
Dmitry N. Mikushin
2016-10-05 09:41:12 UTC
Permalink
Hi Juraj,

Although MPI infrastructure may technically support forking, it's known
that not all system resources can correctly replicate themselves to forked
process. For example, forking inside MPI program with active CUDA driver
will result into crash.

Why not to compile down the MATLAB into a native library and link it with
the MPI application directly? E.g. like here:
https://www.mathworks.com/matlabcentral/answers/98867-how-do-i-create-a-c-shared-library-from-mex-files-using-the-matlab-compiler?requestedDomain=www.mathworks.com

Kind regards,
- Dmitry Mikushin.
Post by j***@gmail.com
Hello,
I have an application in C++(main.cpp) that is launched with multiple
processes via mpirun. Master process calls matlab via system('matlab
-nosplash -nodisplay -nojvm -nodesktop -r "interface"'), which executes
simple script interface.m that calls mexFunction (mexsolve.cpp) from which
I try to set up communication with the rest of the processes launched at
the beginning together with the master process. When I run the application
1) crash at MPI_Init() in the mexFunction() on cluster machine with
Linux 4.4.0-22-generic
2) error in MPI_Send() shown below on local machine with
Linux 3.10.0-229.el7.x86_64
[archimedes:31962] shmem: mmap: an error occurred while determining
could be created.
[archimedes:31962] create_and_attach: unable to create shared memory BTL
coordinating structure :: size 134217728
[archimedes:31962] shmem: mmap: an error occurred while determining
himedes_0/58444/1/0/vader_segment.archimedes.0 could be created.
[archimedes][[58444,1],0][../../../../../opal/mca/btl/tcp/bt
l_tcp_endpoint.c:800:mca_btl_tcp_endpoint_complete_connect] connect() to
<MY_IP> failed: Connection refused (111)
mpirun --mca mpi_warn_on_fork 0 --mca btl_openib_want_fork_support 1 -np
2 -npernode 1 ./main
I have openmpi-2.0.1 configured with --prefix=${INSTALLDIR}
--enable-mpi-fortran=all --with-pmi --disable-dlopen
For more details, the code is here: https://github.com/goghino/matlabMpiC
Thanks for any suggestions!
Juraj
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Gilles Gouaillardet
2016-10-06 00:03:55 UTC
Permalink
Juraj,


if i understand correctly, the "master" task calls MPI_Init(), and then
fork&exec matlab.

In some cases (lack of hardware support), fork cannot even work. but
let's assume it is fine for now.

Then, if i read between the lines, matlab calls mexFunction that MPI_Init().

As far as i am concerned, that cannot work.

The blocker is that a child cannot call MPI_Init() if its parent already
called MPI_Init()


Fortunatly, you have some options :-)
1) start matlab from mpirun.
for example, if you want one master, two slaves and matlab, you can do
something like
mpirun -np 1 master : -np 1 matlab : -np 2 slave

2) MPI_Comm_spawn matlab
master can MPI_Comm_spawn() matlab, and then matlab can merge the parent
communicator,
and communicate to master and slaves

3) use the approach suggested by Dmitry
/* this is specific to matlab, and i have no experience with it */

One last point, MPI_Init() can be invoked only once per task
(e.g. if your mexFunction does
MPI_Init(); work(); MPI_Finalize();
then it can be invoked only once per mpirun

Cheers,

Gilles
Post by Dmitry N. Mikushin
Hi Juraj,
Although MPI infrastructure may technically support forking, it's
known that not all system resources can correctly replicate themselves
to forked process. For example, forking inside MPI program with active
CUDA driver will result into crash.
Why not to compile down the MATLAB into a native library and link it
https://www.mathworks.com/matlabcentral/answers/98867-how-do-i-create-a-c-shared-library-from-mex-files-using-the-matlab-compiler?requestedDomain=www.mathworks.com
Kind regards,
- Dmitry Mikushin.
Hello,
I have an application in C++(main.cpp) that is launched with
multiple processes via mpirun. Master process calls matlab via
system('matlab -nosplash -nodisplay -nojvm -nodesktop -r
"interface"'), which executes simple script interface.m that calls
mexFunction (mexsolve.cpp) from which I try to set up
communication with the rest of the processes launched at the
beginning together with the master process. When I run the
1) crash at MPI_Init() in the mexFunction() on cluster machine
with Linux 4.4.0-22-generic
2) error in MPI_Send() shown below on local machine with
Linux 3.10.0-229.el7.x86_64
[archimedes:31962] shmem: mmap: an error occurred while
determining whether or not
could be created.
[archimedes:31962] create_and_attach: unable to create shared
memory BTL coordinating structure :: size 134217728
[archimedes:31962] shmem: mmap: an error occurred while
determining whether or not
could be created.
[archimedes][[58444,1],0][../../../../../opal/mca/btl/tcp/btl_tcp_endpoint.c:800:mca_btl_tcp_endpoint_complete_connect]
connect() to <MY_IP> failed: Connection refused (111)
mpirun --mca mpi_warn_on_fork 0 --mca btl_openib_want_fork_support
1 -np 2 -npernode 1 ./main
I have openmpi-2.0.1 configured with --prefix=${INSTALLDIR}
--enable-mpi-fortran=all --with-pmi --disable-dlopen
https://github.com/goghino/matlabMpiC
<https://github.com/goghino/matlabMpiC>
Thanks for any suggestions!
Juraj
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
<https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Bennet Fauber
2016-10-06 00:38:25 UTC
Permalink
Matlab may have its own MPI installed. It definitely does if you have
the parallel computing toolbox. If you have that, it could be causing
problems. If you can, you might consider compiling your Matlab
application into a standalone executable, then call that from your own
program. That bypasses the Matlab user interface and may prove more
tractable See the documentation for mcc if you have that.

http://www.mathworks.com/help/compiler/mcc.html

If you have that toolbox.

-- bennet
Post by Dmitry N. Mikushin
Juraj,
if i understand correctly, the "master" task calls MPI_Init(), and then
fork&exec matlab.
In some cases (lack of hardware support), fork cannot even work. but let's
assume it is fine for now.
Then, if i read between the lines, matlab calls mexFunction that MPI_Init().
As far as i am concerned, that cannot work.
The blocker is that a child cannot call MPI_Init() if its parent already
called MPI_Init()
Fortunatly, you have some options :-)
1) start matlab from mpirun.
for example, if you want one master, two slaves and matlab, you can do
something like
mpirun -np 1 master : -np 1 matlab : -np 2 slave
2) MPI_Comm_spawn matlab
master can MPI_Comm_spawn() matlab, and then matlab can merge the parent
communicator,
and communicate to master and slaves
3) use the approach suggested by Dmitry
/* this is specific to matlab, and i have no experience with it */
One last point, MPI_Init() can be invoked only once per task
(e.g. if your mexFunction does
MPI_Init(); work(); MPI_Finalize();
then it can be invoked only once per mpirun
Cheers,
Gilles
Hi Juraj,
Although MPI infrastructure may technically support forking, it's known that
not all system resources can correctly replicate themselves to forked
process. For example, forking inside MPI program with active CUDA driver
will result into crash.
Why not to compile down the MATLAB into a native library and link it with
https://www.mathworks.com/matlabcentral/answers/98867-how-do-i-create-a-c-shared-library-from-mex-files-using-the-matlab-compiler?requestedDomain=www.mathworks.com
Kind regards,
- Dmitry Mikushin.
Post by j***@gmail.com
Hello,
I have an application in C++(main.cpp) that is launched with multiple
processes via mpirun. Master process calls matlab via system('matlab
-nosplash -nodisplay -nojvm -nodesktop -r "interface"'), which executes
simple script interface.m that calls mexFunction (mexsolve.cpp) from which I
try to set up communication with the rest of the processes launched at the
beginning together with the master process. When I run the application as
1) crash at MPI_Init() in the mexFunction() on cluster machine with Linux
4.4.0-22-generic
2) error in MPI_Send() shown below on local machine with Linux
3.10.0-229.el7.x86_64
[archimedes:31962] shmem: mmap: an error occurred while determining
whether or not
could be created.
[archimedes:31962] create_and_attach: unable to create shared memory BTL
coordinating structure :: size 134217728
[archimedes:31962] shmem: mmap: an error occurred while determining
whether or not
could be created.
[archimedes][[58444,1],0][../../../../../opal/mca/btl/tcp/btl_tcp_endpoint.c:800:mca_btl_tcp_endpoint_complete_connect]
connect() to <MY_IP> failed: Connection refused (111)
mpirun --mca mpi_warn_on_fork 0 --mca btl_openib_want_fork_support 1 -np
2 -npernode 1 ./main
I have openmpi-2.0.1 configured with --prefix=${INSTALLDIR}
--enable-mpi-fortran=all --with-pmi --disable-dlopen
For more details, the code is here: https://github.com/goghino/matlabMpiC
Thanks for any suggestions!
Juraj
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Loading...