Discussion:
[OMPI users] help understand unhelpful ORTE error message
Jeff Hammond
2015-11-19 17:44:20 UTC
Permalink
I have no idea what this is trying to tell me. Help?

***@nid00024:~/MPI/qoit/collectives> mpirun -n 2 ./driver.x 64
[nid00024:00482] [[46168,0],0] ORTE_ERROR_LOG: Not found in file
../../../../../orte/mca/plm/alps/plm_alps_module.c at line 418

I can run the same job with srun without incident:

***@nid00024:~/MPI/qoit/collectives> srun -n 2 ./driver.x 64
MPI was initialized.

This is on the NERSC Cori Cray XC40 system. I build Open-MPI git head from
source for OFI libfabric.

I have many other issues, which I will report later. As a spoiler, if I
cannot use your mpirun, I cannot set any of the MCA options there. Is
there a method to set MCA options with environment variables? I could not
find this documented anywhere.

In particular, is there a way to cause shm to not use the global
filesystem? I see this issue comes up a lot and I read the list archives,
but the warning message (
https://github.com/hpc/cce-mpi-openmpi-1.6.4/blob/master/ompi/mca/common/sm/help-mpi-common-sm.txt)
suggested that I could override it by setting TMP, TEMP or TEMPDIR, which I
did to no avail.

Thanks,

Jeff

--
Jeff Hammond
***@gmail.com
http://jeffhammond.github.io/
Martin Siegert
2015-11-19 17:51:21 UTC
Permalink
Hi Jeff,
Post by Jeff Hammond
I have no idea what this is trying to tell me. Help?
[nid00024:00482] [[46168,0],0] ORTE_ERROR_LOG: Not found in file
../../../../../orte/mca/plm/alps/plm_alps_module.c at line 418
MPI was initialized.
This is on the NERSC Cori Cray XC40 system. I build Open-MPI git head from
source for OFI libfabric.
I have many other issues, which I will report later. As a spoiler, if I
cannot use your mpirun, I cannot set any of the MCA options there. Is
there a method to set MCA options with environment variables? I could not
find this documented anywhere.
In particular, is there a way to cause shm to not use the global
filesystem? I see this issue comes up a lot and I read the list archives,
but the warning message (
https://github.com/hpc/cce-mpi-openmpi-1.6.4/blob/master/ompi/mca/common/sm/
help-mpi-common-sm.txt) suggested that I could override it by setting
TMP,
Post by Jeff Hammond
TEMP or TEMPDIR, which I did to no avail.
From my experience on edison: the one environment variable that does
works is TMPDIR - the one that is not listed in the error message :-)

Can't help you with your mpirun problem though ...

Cheers,
Martin
--
Martin Siegert
Head, Research Computing
WestGrid/ComputeCanada Site Lead
Simon Fraser University
Burnaby, British Columbia
Howard
2015-11-19 18:45:23 UTC
Permalink
Hi Jeff

How did you configure for Cori? You need to be using the slurm plm component for that system. I know this sounds like gibberish.

There should be a with-slurm configure option to pick up this component.

Doesn't mpich have the option to use sysv memory? You may want to try that

Oh for tuning params you can use env variables. For example lets say rather than using the gni provider in ofi mtl you want to try sockets. Then do

Export OMPI_MCA_mtl_ofi_provider_include=sockets

In the spirit OMPI - may the force be with you.

Howard

Von meinem iPhone gesendet
Post by Martin Siegert
Hi Jeff,
I have no idea what this is trying to tell me. Help?
[nid00024:00482] [[46168,0],0] ORTE_ERROR_LOG: Not found in file
../../../../../orte/mca/plm/alps/plm_alps_module.c at line 418
MPI was initialized.
This is on the NERSC Cori Cray XC40 system. I build Open-MPI git head from
source for OFI libfabric.
I have many other issues, which I will report later. As a spoiler, if I
cannot use your mpirun, I cannot set any of the MCA options there. Is
there a method to set MCA options with environment variables? I could not
find this documented anywhere.
In particular, is there a way to cause shm to not use the global
filesystem? I see this issue comes up a lot and I read the list archives,
but the warning message (
https://github.com/hpc/cce-mpi-openmpi-1.6.4/blob/master/ompi/mca/common/sm/
help-mpi-common-sm.txt) suggested that I could override it by setting TMP,
TEMP or TEMPDIR, which I did to no avail.
From my experience on edison: the one environment variable that does works is TMPDIR - the one that is not listed in the error message :-)
Can't help you with your mpirun problem though ...
Cheers,
Martin
--
Martin Siegert
Head, Research Computing
WestGrid/ComputeCanada Site Lead
Simon Fraser University
Burnaby, British Columbia
_______________________________________________
users mailing list
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: http://www.open-mpi.org/community/lists/users/2015/11/28063.php
Jeff Hammond
2015-11-19 23:59:22 UTC
Permalink
Post by Howard
How did you configure for Cori? You need to be using the slurm plm
component for that system. I know this sounds like gibberish.
../configure --with-libfabric=$HOME/OFI/install-ofi-gcc-gni-cori \
--enable-mca-static=mtl-ofi \
--enable-mca-no-build=btl-openib,btl-vader,btl-ugni,btl-tcp \
--enable-static --disable-shared --disable-dlopen \
--prefix=$HOME/MPI/install-ompi-ofi-gcc-gni-xpmem-cori \
--with-cray-pmi --with-alps --with-cray-xpmem --with-slurm \
--without-verbs --without-fca --without-mxm --without-ucx \
--without-portals4 --without-psm --without-psm2 \
--without-udreg --without-ugni --without-munge \
--without-sge --without-loadleveler --without-tm --without-lsf \
--without-pvfs2 --without-plfs \
--without-cuda --disable-oshmem \
--disable-mpi-fortran --disable-oshmem-fortran \
LDFLAGS="-L/opt/cray/ugni/default/lib64 -lugni \
-L/opt/cray/alps/default/lib64 -lalps -lalpslli -lalpsutil
\ -ldl -lrt"


This is copied from
https://github.com/jeffhammond/HPCInfo/blob/master/ofi/README.md#open-mpi,
which I note in case you want to see what changes I've made at any point in
the future.
Post by Howard
There should be a with-slurm configure option to pick up this component.
Indeed there is.
Doesn't mpich have the option to use sysv memory? You may want to try that
MPICH? Look, I may have earned my way onto Santa's naughty list more than
a few times, but at least I have the decency not to post MPICH questions to
the Open-MPI list ;-)

If there is a way to tell Open-MPI to use shm_open without filesystem
backing (if that is even possible) at configure time, I'd love to do that.
Post by Howard
Oh for tuning params you can use env variables. For example lets say
rather than using the gni provider in ofi mtl you want to try sockets. Then
do
Export OMPI_MCA_mtl_ofi_provider_include=sockets
Thanks. I'm glad that there is an option to set them this way.
Post by Howard
In the spirit OMPI - may the force be with you.
All I will say here is that Open-MPI has a Vader BTL :-)
Post by Howard
Post by Martin Siegert
I have no idea what this is trying to tell me. Help?
[nid00024:00482] [[46168,0],0] ORTE_ERROR_LOG: Not found in file
../../../../../orte/mca/plm/alps/plm_alps_module.c at line 418
MPI was initialized.
This is on the NERSC Cori Cray XC40 system. I build Open-MPI git head
from
Post by Martin Siegert
source for OFI libfabric.
I have many other issues, which I will report later. As a spoiler, if I
cannot use your mpirun, I cannot set any of the MCA options there. Is
there a method to set MCA options with environment variables? I could
not
Post by Martin Siegert
find this documented anywhere.
In particular, is there a way to cause shm to not use the global
filesystem? I see this issue comes up a lot and I read the list
archives,
Post by Martin Siegert
but the warning message (
https://github.com/hpc/cce-mpi-openmpi-1.6.4/blob/master/ompi/mca/common/sm/
Post by Martin Siegert
help-mpi-common-sm.txt) suggested that I could override it by setting
TMP,
Post by Martin Siegert
TEMP or TEMPDIR, which I did to no avail.
From my experience on edison: the one environment variable that does
works is TMPDIR - the one that is not listed in the error message :-)
That's great. I will try that now. Is there a Github issue open already
to fix that documentation? If not...
Post by Howard
Post by Martin Siegert
Can't help you with your mpirun problem though ...
No worries. I appreciate all the help I can get.
Thanks,

Jeff
--
Jeff Hammond
***@gmail.com
http://jeffhammond.github.io/
Howard Pritchard
2015-11-20 00:11:58 UTC
Permalink
Hi Jeff H.

Why don't you just try configuring with

./configure --prefix=my_favorite_install_dir
--with-libfabric=install_dir_for_libfabric
make -j 8 install

and see what happens?

Make sure before you configure that you have PrgEnv-gnu or PrgEnv-intel
module loaded.

Those were the configure/compiler options I used to do testing of ofi mtl
on cori.

Jeff S. - this thread has gotten intermingled with mpich setup as well,
hence
the suggestion for the mpich shm mechanism.


Howard
Post by Jeff Hammond
Post by Howard
How did you configure for Cori? You need to be using the slurm plm
component for that system. I know this sounds like gibberish.
../configure --with-libfabric=$HOME/OFI/install-ofi-gcc-gni-cori \
--enable-mca-static=mtl-ofi \
--enable-mca-no-build=btl-openib,btl-vader,btl-ugni,btl-tcp \
--enable-static --disable-shared --disable-dlopen \
--prefix=$HOME/MPI/install-ompi-ofi-gcc-gni-xpmem-cori \
--with-cray-pmi --with-alps --with-cray-xpmem --with-slurm \
--without-verbs --without-fca --without-mxm --without-ucx \
--without-portals4 --without-psm --without-psm2 \
--without-udreg --without-ugni --without-munge \
--without-sge --without-loadleveler --without-tm --without-lsf \
--without-pvfs2 --without-plfs \
--without-cuda --disable-oshmem \
--disable-mpi-fortran --disable-oshmem-fortran \
LDFLAGS="-L/opt/cray/ugni/default/lib64 -lugni \ -L/opt/cray/alps/default/lib64 -lalps -lalpslli -lalpsutil \ -ldl -lrt"
This is copied from
https://github.com/jeffhammond/HPCInfo/blob/master/ofi/README.md#open-mpi,
which I note in case you want to see what changes I've made at any point in
the future.
Post by Howard
There should be a with-slurm configure option to pick up this component.
Indeed there is.
Doesn't mpich have the option to use sysv memory? You may want to try that
MPICH? Look, I may have earned my way onto Santa's naughty list more than
a few times, but at least I have the decency not to post MPICH questions to
the Open-MPI list ;-)
If there is a way to tell Open-MPI to use shm_open without filesystem
backing (if that is even possible) at configure time, I'd love to do that.
Post by Howard
Oh for tuning params you can use env variables. For example lets say
rather than using the gni provider in ofi mtl you want to try sockets. Then
do
Export OMPI_MCA_mtl_ofi_provider_include=sockets
Thanks. I'm glad that there is an option to set them this way.
Post by Howard
In the spirit OMPI - may the force be with you.
All I will say here is that Open-MPI has a Vader BTL :-)
Post by Howard
Post by Martin Siegert
I have no idea what this is trying to tell me. Help?
[nid00024:00482] [[46168,0],0] ORTE_ERROR_LOG: Not found in file
../../../../../orte/mca/plm/alps/plm_alps_module.c at line 418
MPI was initialized.
This is on the NERSC Cori Cray XC40 system. I build Open-MPI git head
from
Post by Martin Siegert
source for OFI libfabric.
I have many other issues, which I will report later. As a spoiler, if
I
Post by Martin Siegert
cannot use your mpirun, I cannot set any of the MCA options there. Is
there a method to set MCA options with environment variables? I could
not
Post by Martin Siegert
find this documented anywhere.
In particular, is there a way to cause shm to not use the global
filesystem? I see this issue comes up a lot and I read the list
archives,
Post by Martin Siegert
but the warning message (
https://github.com/hpc/cce-mpi-openmpi-1.6.4/blob/master/ompi/mca/common/sm/
Post by Martin Siegert
help-mpi-common-sm.txt) suggested that I could override it by setting
TMP,
Post by Martin Siegert
TEMP or TEMPDIR, which I did to no avail.
From my experience on edison: the one environment variable that does
works is TMPDIR - the one that is not listed in the error message :-)
That's great. I will try that now. Is there a Github issue open already
to fix that documentation? If not...
Post by Howard
Post by Martin Siegert
Can't help you with your mpirun problem though ...
No worries. I appreciate all the help I can get.
Thanks,
Jeff
--
Jeff Hammond
http://jeffhammond.github.io/
_______________________________________________
users mailing list
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
http://www.open-mpi.org/community/lists/users/2015/11/28072.php
Howard Pritchard
2015-11-20 00:17:45 UTC
Permalink
Hi Jeff,

I finally got an allocation on cori - its one busy machine.

Anyway, using the ompi i'd built on edison with the above recommended
configure options
I was able to run using either srun or mpirun on cori provided that in the
later case I used

mpirun -np X -N Y --mca plm slurm ./my_favorite_app

I will make an adjustment to the alps plm launcher to disqualify itself if
the wlm_detect
facility on the cray reports that srun is the launcher. That's a minor fix
and should make
it in to v2.x in a week or so. It will be a runtime selection so you only
have to build ompi
once for use either on edison or cori.

Howard
Post by Howard Pritchard
Hi Jeff H.
Why don't you just try configuring with
./configure --prefix=my_favorite_install_dir
--with-libfabric=install_dir_for_libfabric
make -j 8 install
and see what happens?
Make sure before you configure that you have PrgEnv-gnu or PrgEnv-intel
module loaded.
Those were the configure/compiler options I used to do testing of ofi mtl
on cori.
Jeff S. - this thread has gotten intermingled with mpich setup as well,
hence
the suggestion for the mpich shm mechanism.
Howard
Post by Jeff Hammond
Post by Howard
How did you configure for Cori? You need to be using the slurm plm
component for that system. I know this sounds like gibberish.
../configure --with-libfabric=$HOME/OFI/install-ofi-gcc-gni-cori \
--enable-mca-static=mtl-ofi \
--enable-mca-no-build=btl-openib,btl-vader,btl-ugni,btl-tcp \
--enable-static --disable-shared --disable-dlopen \
--prefix=$HOME/MPI/install-ompi-ofi-gcc-gni-xpmem-cori \
--with-cray-pmi --with-alps --with-cray-xpmem --with-slurm \
--without-verbs --without-fca --without-mxm --without-ucx \
--without-portals4 --without-psm --without-psm2 \
--without-udreg --without-ugni --without-munge \
--without-sge --without-loadleveler --without-tm --without-lsf \
--without-pvfs2 --without-plfs \
--without-cuda --disable-oshmem \
--disable-mpi-fortran --disable-oshmem-fortran \
LDFLAGS="-L/opt/cray/ugni/default/lib64 -lugni \ -L/opt/cray/alps/default/lib64 -lalps -lalpslli -lalpsutil \ -ldl -lrt"
This is copied from
https://github.com/jeffhammond/HPCInfo/blob/master/ofi/README.md#open-mpi,
which I note in case you want to see what changes I've made at any point in
the future.
Post by Howard
There should be a with-slurm configure option to pick up this component.
Indeed there is.
Doesn't mpich have the option to use sysv memory? You may want to try that
MPICH? Look, I may have earned my way onto Santa's naughty list more
than a few times, but at least I have the decency not to post MPICH
questions to the Open-MPI list ;-)
If there is a way to tell Open-MPI to use shm_open without filesystem
backing (if that is even possible) at configure time, I'd love to do that.
Post by Howard
Oh for tuning params you can use env variables. For example lets say
rather than using the gni provider in ofi mtl you want to try sockets. Then
do
Export OMPI_MCA_mtl_ofi_provider_include=sockets
Thanks. I'm glad that there is an option to set them this way.
Post by Howard
In the spirit OMPI - may the force be with you.
All I will say here is that Open-MPI has a Vader BTL :-)
Post by Howard
Post by Martin Siegert
I have no idea what this is trying to tell me. Help?
[nid00024:00482] [[46168,0],0] ORTE_ERROR_LOG: Not found in file
../../../../../orte/mca/plm/alps/plm_alps_module.c at line 418
MPI was initialized.
This is on the NERSC Cori Cray XC40 system. I build Open-MPI git
head from
Post by Martin Siegert
source for OFI libfabric.
I have many other issues, which I will report later. As a spoiler,
if I
Post by Martin Siegert
cannot use your mpirun, I cannot set any of the MCA options there. Is
there a method to set MCA options with environment variables? I
could not
Post by Martin Siegert
find this documented anywhere.
In particular, is there a way to cause shm to not use the global
filesystem? I see this issue comes up a lot and I read the list
archives,
Post by Martin Siegert
but the warning message (
https://github.com/hpc/cce-mpi-openmpi-1.6.4/blob/master/ompi/mca/common/sm/
Post by Martin Siegert
help-mpi-common-sm.txt) suggested that I could override it by
setting TMP,
Post by Martin Siegert
TEMP or TEMPDIR, which I did to no avail.
From my experience on edison: the one environment variable that does
works is TMPDIR - the one that is not listed in the error message :-)
That's great. I will try that now. Is there a Github issue open already
to fix that documentation? If not...
Post by Howard
Post by Martin Siegert
Can't help you with your mpirun problem though ...
No worries. I appreciate all the help I can get.
Thanks,
Jeff
--
Jeff Hammond
http://jeffhammond.github.io/
_______________________________________________
users mailing list
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
http://www.open-mpi.org/community/lists/users/2015/11/28072.php
Jeff Hammond
2015-11-20 00:30:47 UTC
Permalink
Post by Howard Pritchard
Hi Jeff H.
Why don't you just try configuring with
./configure --prefix=my_favorite_install_dir
--with-libfabric=install_dir_for_libfabric
make -j 8 install
and see what happens?
That was the first thing I tried. However, it seemed to give me a
Verbs-oriented build, and Verbs is the Sith lord to us JedOFIs :-)

From aforementioned Wiki:

../configure \
--with-libfabric=$HOME/OFI/install-ofi-gcc-gni-cori \
--disable-shared \
--prefix=$HOME/MPI/install-ompi-ofi-gcc-gni-cori

Unfortunately, this (above) leads to an mpicc that indicates support for IB
Verbs, not OFI.
I will try again though just in case.
Post by Howard Pritchard
Make sure before you configure that you have PrgEnv-gnu or PrgEnv-intel
module loaded.
Yeah, I know better than to use the Cray compilers for such things (e.g.
https://github.com/jeffhammond/OpenPA/commit/965ca014ea3148ee5349e16d2cec1024271a7415
)
Post by Howard Pritchard
Those were the configure/compiler options I used to do testing of ofi mtl
on cori.
Jeff S. - this thread has gotten intermingled with mpich setup as well,
hence
the suggestion for the mpich shm mechanism.
The first OSS implementation of MPI that I can use on Cray XC using OFI
gets a prize at the December MPI Forum.

Best,

Jeff
Post by Howard Pritchard
Howard
Post by Jeff Hammond
Post by Howard
How did you configure for Cori? You need to be using the slurm plm
component for that system. I know this sounds like gibberish.
../configure --with-libfabric=$HOME/OFI/install-ofi-gcc-gni-cori \
--enable-mca-static=mtl-ofi \
--enable-mca-no-build=btl-openib,btl-vader,btl-ugni,btl-tcp \
--enable-static --disable-shared --disable-dlopen \
--prefix=$HOME/MPI/install-ompi-ofi-gcc-gni-xpmem-cori \
--with-cray-pmi --with-alps --with-cray-xpmem --with-slurm \
--without-verbs --without-fca --without-mxm --without-ucx \
--without-portals4 --without-psm --without-psm2 \
--without-udreg --without-ugni --without-munge \
--without-sge --without-loadleveler --without-tm --without-lsf \
--without-pvfs2 --without-plfs \
--without-cuda --disable-oshmem \
--disable-mpi-fortran --disable-oshmem-fortran \
LDFLAGS="-L/opt/cray/ugni/default/lib64 -lugni \ -L/opt/cray/alps/default/lib64 -lalps -lalpslli -lalpsutil \ -ldl -lrt"
This is copied from
https://github.com/jeffhammond/HPCInfo/blob/master/ofi/README.md#open-mpi,
which I note in case you want to see what changes I've made at any point in
the future.
Post by Howard
There should be a with-slurm configure option to pick up this component.
Indeed there is.
Doesn't mpich have the option to use sysv memory? You may want to try that
MPICH? Look, I may have earned my way onto Santa's naughty list more
than a few times, but at least I have the decency not to post MPICH
questions to the Open-MPI list ;-)
If there is a way to tell Open-MPI to use shm_open without filesystem
backing (if that is even possible) at configure time, I'd love to do that.
Post by Howard
Oh for tuning params you can use env variables. For example lets say
rather than using the gni provider in ofi mtl you want to try sockets. Then
do
Export OMPI_MCA_mtl_ofi_provider_include=sockets
Thanks. I'm glad that there is an option to set them this way.
Post by Howard
In the spirit OMPI - may the force be with you.
All I will say here is that Open-MPI has a Vader BTL :-)
Post by Howard
Post by Martin Siegert
I have no idea what this is trying to tell me. Help?
[nid00024:00482] [[46168,0],0] ORTE_ERROR_LOG: Not found in file
../../../../../orte/mca/plm/alps/plm_alps_module.c at line 418
MPI was initialized.
This is on the NERSC Cori Cray XC40 system. I build Open-MPI git
head from
Post by Martin Siegert
source for OFI libfabric.
I have many other issues, which I will report later. As a spoiler,
if I
Post by Martin Siegert
cannot use your mpirun, I cannot set any of the MCA options there. Is
there a method to set MCA options with environment variables? I
could not
Post by Martin Siegert
find this documented anywhere.
In particular, is there a way to cause shm to not use the global
filesystem? I see this issue comes up a lot and I read the list
archives,
Post by Martin Siegert
but the warning message (
https://github.com/hpc/cce-mpi-openmpi-1.6.4/blob/master/ompi/mca/common/sm/
Post by Martin Siegert
help-mpi-common-sm.txt) suggested that I could override it by
setting TMP,
Post by Martin Siegert
TEMP or TEMPDIR, which I did to no avail.
From my experience on edison: the one environment variable that does
works is TMPDIR - the one that is not listed in the error message :-)
That's great. I will try that now. Is there a Github issue open already
to fix that documentation? If not...
Post by Howard
Post by Martin Siegert
Can't help you with your mpirun problem though ...
No worries. I appreciate all the help I can get.
Thanks,
Jeff
--
Jeff Hammond
http://jeffhammond.github.io/
_______________________________________________
users mailing list
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
http://www.open-mpi.org/community/lists/users/2015/11/28072.php
_______________________________________________
users mailing list
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
http://www.open-mpi.org/community/lists/users/2015/11/28073.php
--
Jeff Hammond
***@gmail.com
http://jeffhammond.github.io/
Dave Love
2015-11-20 14:25:06 UTC
Permalink
Post by Jeff Hammond
Post by Howard
Doesn't mpich have the option to use sysv memory? You may want to try that
MPICH? Look, I may have earned my way onto Santa's naughty list more than
a few times, but at least I have the decency not to post MPICH questions to
the Open-MPI list ;-)
If there is a way to tell Open-MPI to use shm_open without filesystem
backing (if that is even possible) at configure time, I'd love to do that.
I'm not sure I understand what's required, but is this what you're after?

$ ompi_info --param shmem all -l 9|grep priority
MCA shmem: parameter "shmem_mmap_priority" (current value: "50", data source: default, level: 3 user/all, type: int)
MCA shmem: parameter "shmem_posix_priority" (current value: "40", data source: default, level: 3 user/all, type: int)
MCA shmem: parameter "shmem_sysv_priority" (current value: "30", data source: default, level: 3 user/all, type: int)
Post by Jeff Hammond
Post by Howard
In the spirit OMPI - may the force be with you.
All I will say here is that Open-MPI has a Vader BTL :-)
Whatever that might mean.
Gilles Gouaillardet
2015-11-20 14:40:10 UTC
Permalink
Currently, ompi create a file in the temporary directory and then mmap it.
an obvious requirement is the temporary directory must have enough free
space for that file.
(this might be an issue on some disk less nodes)


a simple alternative could be to try /tmp, and if there is not enough
space, try /dev/shm
(unless the tmpdir has been set explicitly)

any thought ?

Gilles

btw, we already use the force, thanks to the ob1 pml and the yoda spml
Post by Dave Love
Post by Jeff Hammond
Post by Howard
Doesn't mpich have the option to use sysv memory? You may want to try
that
Post by Jeff Hammond
MPICH? Look, I may have earned my way onto Santa's naughty list more
than
Post by Jeff Hammond
a few times, but at least I have the decency not to post MPICH questions
to
Post by Jeff Hammond
the Open-MPI list ;-)
If there is a way to tell Open-MPI to use shm_open without filesystem
backing (if that is even possible) at configure time, I'd love to do
that.
I'm not sure I understand what's required, but is this what you're after?
$ ompi_info --param shmem all -l 9|grep priority
MCA shmem: parameter "shmem_mmap_priority" (current
value: "50", data source: default, level: 3 user/all, type: int)
MCA shmem: parameter "shmem_posix_priority" (current
value: "40", data source: default, level: 3 user/all, type: int)
MCA shmem: parameter "shmem_sysv_priority" (current
value: "30", data source: default, level: 3 user/all, type: int)
Post by Jeff Hammond
Post by Howard
In the spirit OMPI - may the force be with you.
All I will say here is that Open-MPI has a Vader BTL :-)
Whatever that might mean.
_______________________________________________
users mailing list
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
http://www.open-mpi.org/community/lists/users/2015/11/28084.php
Dave Love
2015-11-24 14:31:50 UTC
Permalink
Post by Gilles Gouaillardet
Currently, ompi create a file in the temporary directory and then mmap it.
an obvious requirement is the temporary directory must have enough free
space for that file.
(this might be an issue on some disk less nodes)
a simple alternative could be to try /tmp, and if there is not enough
space, try /dev/shm
(unless the tmpdir has been set explicitly)
any thought ?
/tmp is already the default if TMPDIR et al aren't defined, isn't it?

While you may not have any choice to use /dev/shm on a diskless node, it
doesn't seem a good thing to do by default for large maps. It wasn't
here.

[I've never been sure of the semantics of mmap over tmpfs.]

I think the important thing is clear explanation of any error, and
suggestions for workarounds. Presumably anyone operating diskless nodes
has made arrangements for this sort of thing.
Post by Gilles Gouaillardet
Gilles
btw, we already use the force, thanks to the ob1 pml and the yoda spml
I think that's assuming familiarity with something which leaves out some
people...
Jeff Squyres (jsquyres)
2015-11-30 16:05:37 UTC
Permalink
Post by Dave Love
Post by Gilles Gouaillardet
btw, we already use the force, thanks to the ob1 pml and the yoda spml
I think that's assuming familiarity with something which leaves out some
people...
FWIW, I agree: we use unhelpful names for components in Open MPI. What Gilles is specifically referring to here is that there are several Star Wars-based names of plugins in Open MPI. They mean something to us developers (they started off as a funny joke), but they mean little/nothing to end users.

I actually specifically called out this issue in the SC'15 Open MPI BOF:

Loading Image...

This is definitely an issue that is on the agenda for the face-to-face Open MPI developer's meeting in February (https://github.com/open-mpi/ompi/wiki/Meeting-2016-02).
--
Jeff Squyres
***@cisco.com
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Dave Love
2015-11-20 14:19:47 UTC
Permalink
Post by Martin Siegert
Post by Jeff Hammond
In particular, is there a way to cause shm to not use the global
filesystem? I see this issue comes up a lot and I read the list archives,
but the warning message (
https://github.com/hpc/cce-mpi-openmpi-1.6.4/blob/master/ompi/mca/common/sm/
help-mpi-common-sm.txt) suggested that I could override it by setting
TMP,
Post by Jeff Hammond
TEMP or TEMPDIR, which I did to no avail.
[Why look at such an old version?]
Post by Martin Siegert
From my experience on edison: the one environment variable that does
works is TMPDIR - the one that is not listed in the error message :-)
It's a tyepo -- see routine opal_tmp_directory.

I don't know about other resource managers, but SGE sets TMPDIR to a
job-specific directory. OMPI creates mmap files there, inter alia,
unless told otherwise by MCA orte_tmpdir_base or something more
specific. [You probably don't want to follow our vendor's
orte_tmpdir_base=/dev/shm...]
Dave Love
2015-11-20 14:18:30 UTC
Permalink
[There must be someone better to answer this, but since I've seen it:]
Post by Jeff Hammond
I have no idea what this is trying to tell me. Help?
[nid00024:00482] [[46168,0],0] ORTE_ERROR_LOG: Not found in file
../../../../../orte/mca/plm/alps/plm_alps_module.c at line 418
That must be a system error message, presumably indicating why the
process couldn't be launched; it's not in the OMPI source.
Post by Jeff Hammond
MPI was initialized.
This is on the NERSC Cori Cray XC40 system. I build Open-MPI git head from
source for OFI libfabric.
I have many other issues, which I will report later. As a spoiler, if I
cannot use your mpirun, I cannot set any of the MCA options there. Is
there a method to set MCA options with environment variables? I could not
find this documented anywhere.
mpirun(1) documents the mechanisms under "Setting MCA Parameters",
unless it's changed since 1.8. [I have wondered why a file in cwd isn't
a possibility, only in $HOME.]
Loading...