Discussion:
[OMPI users] disable slurm/munge from mpirun
Michael Di Domenico
2017-06-22 14:16:02 UTC
Permalink
is it possible to disable slurm/munge/psm/pmi(x) from the mpirun
command line or (better) using environment variables?

i'd like to use the installed version of openmpi i have on a
workstation, but it's linked with slurm from one of my clusters.

mpi/slurm work just fine on the cluster, but when i run it on a
workstation i get the below errors

mca_base_component_repositoy_open: unable to open mca_sec_munge:
libmunge missing
ORTE_ERROR_LOG Not found in file ess_hnp_module.c at line 648
opal_pmix_base_select failed
returned value not found (-13) instead of orte_success

there's probably a magical incantation of mca parameters, but i'm not
adept enough at determining what they are
John Hearns via users
2017-06-22 14:28:26 UTC
Permalink
Michael, try
--mca plm_rsh_agent ssh

I've been fooling with this myself recently, in the contect of a PBS cluster
Post by Michael Di Domenico
is it possible to disable slurm/munge/psm/pmi(x) from the mpirun
command line or (better) using environment variables?
i'd like to use the installed version of openmpi i have on a
workstation, but it's linked with slurm from one of my clusters.
mpi/slurm work just fine on the cluster, but when i run it on a
workstation i get the below errors
libmunge missing
ORTE_ERROR_LOG Not found in file ess_hnp_module.c at line 648
opal_pmix_base_select failed
returned value not found (-13) instead of orte_success
there's probably a magical incantation of mca parameters, but i'm not
adept enough at determining what they are
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
r***@open-mpi.org
2017-06-22 14:35:34 UTC
Permalink
You can add "OMPI_MCA_plm=rsh OMPI_MCA_sec=^munge” to your environment
Post by John Hearns via users
Michael, try
--mca plm_rsh_agent ssh
I've been fooling with this myself recently, in the contect of a PBS cluster
is it possible to disable slurm/munge/psm/pmi(x) from the mpirun
command line or (better) using environment variables?
i'd like to use the installed version of openmpi i have on a
workstation, but it's linked with slurm from one of my clusters.
mpi/slurm work just fine on the cluster, but when i run it on a
workstation i get the below errors
libmunge missing
ORTE_ERROR_LOG Not found in file ess_hnp_module.c at line 648
opal_pmix_base_select failed
returned value not found (-13) instead of orte_success
there's probably a magical incantation of mca parameters, but i'm not
adept enough at determining what they are
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Michael Di Domenico
2017-06-22 14:43:00 UTC
Permalink
that took care of one of the errors, but i missed a re-type on the second error

mca_base_component_repository_open: unable to open mca_pmix_pmix112:
libmunge missing

and the opal_pmix_base_select error is still there (which is what's
actually halting my job)
You can add "OMPI_MCA_plm=rsh OMPI_MCA_sec=^munge” to your environment
On Jun 22, 2017, at 7:28 AM, John Hearns via users
Michael, try
--mca plm_rsh_agent ssh
I've been fooling with this myself recently, in the contect of a PBS cluster
Post by Michael Di Domenico
is it possible to disable slurm/munge/psm/pmi(x) from the mpirun
command line or (better) using environment variables?
i'd like to use the installed version of openmpi i have on a
workstation, but it's linked with slurm from one of my clusters.
mpi/slurm work just fine on the cluster, but when i run it on a
workstation i get the below errors
libmunge missing
ORTE_ERROR_LOG Not found in file ess_hnp_module.c at line 648
opal_pmix_base_select failed
returned value not found (-13) instead of orte_success
there's probably a magical incantation of mca parameters, but i'm not
adept enough at determining what they are
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
John Hearns via users
2017-06-22 14:43:20 UTC
Permalink
Having had some problems with ssh launching (a few minutes ago) I can
confirm that this works:

--mca plm_rsh_agent "ssh -v"

Stupidly I thought there was a majr problem - when it turned otu I could
not ssh into a host.. ahem.
Post by r***@open-mpi.org
You can add "OMPI_MCA_plm=rsh OMPI_MCA_sec=^munge” to your environment
On Jun 22, 2017, at 7:28 AM, John Hearns via users <
Michael, try
--mca plm_rsh_agent ssh
I've been fooling with this myself recently, in the contect of a PBS cluster
Post by Michael Di Domenico
is it possible to disable slurm/munge/psm/pmi(x) from the mpirun
command line or (better) using environment variables?
i'd like to use the installed version of openmpi i have on a
workstation, but it's linked with slurm from one of my clusters.
mpi/slurm work just fine on the cluster, but when i run it on a
workstation i get the below errors
libmunge missing
ORTE_ERROR_LOG Not found in file ess_hnp_module.c at line 648
opal_pmix_base_select failed
returned value not found (-13) instead of orte_success
there's probably a magical incantation of mca parameters, but i'm not
adept enough at determining what they are
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Michael Di Domenico
2017-06-22 15:04:06 UTC
Permalink
On Thu, Jun 22, 2017 at 10:43 AM, John Hearns via users
Post by John Hearns via users
Having had some problems with ssh launching (a few minutes ago) I can
--mca plm_rsh_agent "ssh -v"
this doesn't do anything for me

if i set OMPI_MCA_sec=^munge

i can clear the mca_sec_munge error

but the mca_pmix_pmix112 and opal_pmix_base_select errors still
exists. the plm_rsh_agent switch/env var doesn't seem to affect that
error

down the road, i may still need the rsh_agent flag, but i think we're
still before that in the sequence of events
r***@open-mpi.org
2017-06-22 16:41:04 UTC
Permalink
I gather you are using OMPI 2.x, yes? And you configured it --with-pmi=<slurm-pmi-lib>, then moved the executables/libs to your workstation?

I suppose I could state the obvious and say “don’t do that - just rebuild it”, and I fear that (after checking the 2.x code) you really have no choice. OMPI v3.0 will have a way around the problem, but not the 2.x series.
Post by Michael Di Domenico
On Thu, Jun 22, 2017 at 10:43 AM, John Hearns via users
Post by John Hearns via users
Having had some problems with ssh launching (a few minutes ago) I can
--mca plm_rsh_agent "ssh -v"
this doesn't do anything for me
if i set OMPI_MCA_sec=^munge
i can clear the mca_sec_munge error
but the mca_pmix_pmix112 and opal_pmix_base_select errors still
exists. the plm_rsh_agent switch/env var doesn't seem to affect that
error
down the road, i may still need the rsh_agent flag, but i think we're
still before that in the sequence of events
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Michael Di Domenico
2017-06-23 11:41:14 UTC
Permalink
Post by r***@open-mpi.org
I gather you are using OMPI 2.x, yes? And you configured it --with-pmi=<slurm-pmi-lib>, then moved the executables/libs to your workstation?
correct
Post by r***@open-mpi.org
I suppose I could state the obvious and say “don’t do that - just rebuild it”
correct... but bummer... so much for being lazy...
Post by r***@open-mpi.org
and I fear that (after checking the 2.x code) you really have no choice. OMPI v3.0 will have a way around the problem, but not the 2.x series.
Loading...