Discussion:
[OMPI users] Cannot run MPI code on multiple cores with PBS
Castellana Michele
2018-10-03 19:02:13 UTC
Permalink
Dear all,
I am having trouble running an MPI code across multiple cores on a new computer cluster, which uses PBS. Here is a minimal example, where I want to run two MPI processes, each on a different node. The PBS script is

#!/bin/bash
#PBS -l walltime=00:01:00
#PBS -l mem=1gb
#PBS -l nodes=2:ppn=1
#PBS -q batch
#PBS -N test
mpirun -np 2 ./code.o

and when I submit it with

$qsub script.sh

I get the following message in the PBS error file

$ cat test.e1234
[shbli040:08879] mca_base_component_repository_open: unable to open mca_plm_tm: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory (ignored)
[shbli040:08879] mca_base_component_repository_open: unable to open mca_oob_ud: libibverbs.so.1: cannot open shared object file: No such file or directory (ignored)
[shbli040:08879] mca_base_component_repository_open: unable to open mca_ras_tm: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory (ignored)
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 2 slots
that were requested by the application:
./code.o

Either request fewer slots for your application, or make more slots available
for use.
—————————————————————————————————————

The PBS version is

$ qstat --version
Version: 6.1.2

and here is some additional information on the MPI version

$ mpicc -v
Using built-in specs.
COLLECT_GCC=/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
Target: x86_64-redhat-linux
[
]
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC)

Do you guys know what may be the issue here?

Thank you
Best,
Ralph H Castain
2018-10-03 19:33:19 UTC
Permalink
Did you configure OMPI —with-tm=<path-to-PBS-libs>? It looks like we didn’t build PBS support and so we only see one node with a single slot allocated to it.
Post by Castellana Michele
Dear all,
I am having trouble running an MPI code across multiple cores on a new computer cluster, which uses PBS. Here is a minimal example, where I want to run two MPI processes, each on a different node. The PBS script is
#!/bin/bash
#PBS -l walltime=00:01:00
#PBS -l mem=1gb
#PBS -l nodes=2:ppn=1
#PBS -q batch
#PBS -N test
mpirun -np 2 ./code.o
and when I submit it with
$qsub script.sh
I get the following message in the PBS error file
$ cat test.e1234
[shbli040:08879] mca_base_component_repository_open: unable to open mca_plm_tm: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory (ignored)
[shbli040:08879] mca_base_component_repository_open: unable to open mca_oob_ud: libibverbs.so.1: cannot open shared object file: No such file or directory (ignored)
[shbli040:08879] mca_base_component_repository_open: unable to open mca_ras_tm: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory (ignored)
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 2 slots
./code.o
Either request fewer slots for your application, or make more slots available
for use.
—————————————————————————————————————
The PBS version is
$ qstat --version
Version: 6.1.2
and here is some additional information on the MPI version
$ mpicc -v
Using built-in specs.
COLLECT_GCC=/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
Target: x86_64-redhat-linux
[
]
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC)
Do you guys know what may be the issue here?
Thank you
Best,
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Ralph H Castain
2018-10-03 19:41:48 UTC
Permalink
Actually, I see that you do have the tm components built, but they cannot be loaded because you are missing libcrypto from your LD_LIBRARY_PATH
Post by Ralph H Castain
Did you configure OMPI —with-tm=<path-to-PBS-libs>? It looks like we didn’t build PBS support and so we only see one node with a single slot allocated to it.
Post by Castellana Michele
Dear all,
I am having trouble running an MPI code across multiple cores on a new computer cluster, which uses PBS. Here is a minimal example, where I want to run two MPI processes, each on a different node. The PBS script is
#!/bin/bash
#PBS -l walltime=00:01:00
#PBS -l mem=1gb
#PBS -l nodes=2:ppn=1
#PBS -q batch
#PBS -N test
mpirun -np 2 ./code.o
and when I submit it with
$qsub script.sh
I get the following message in the PBS error file
$ cat test.e1234
[shbli040:08879] mca_base_component_repository_open: unable to open mca_plm_tm: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory (ignored)
[shbli040:08879] mca_base_component_repository_open: unable to open mca_oob_ud: libibverbs.so.1: cannot open shared object file: No such file or directory (ignored)
[shbli040:08879] mca_base_component_repository_open: unable to open mca_ras_tm: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory (ignored)
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 2 slots
./code.o
Either request fewer slots for your application, or make more slots available
for use.
—————————————————————————————————————
The PBS version is
$ qstat --version
Version: 6.1.2
and here is some additional information on the MPI version
$ mpicc -v
Using built-in specs.
COLLECT_GCC=/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
Target: x86_64-redhat-linux
[
]
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC)
Do you guys know what may be the issue here?
Thank you
Best,
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Castellana Michele
2018-10-03 20:24:05 UTC
Permalink
Dear Ralph,
Thank you for your reply. Do you know where I could find libcrypto.so.0.9.8 ?

Best,
On Oct 3, 2018, at 9:41 PM, Ralph H Castain <***@open-mpi.org<mailto:***@open-mpi.org>> wrote:

Actually, I see that you do have the tm components built, but they cannot be loaded because you are missing libcrypto from your LD_LIBRARY_PATH


On Oct 3, 2018, at 12:33 PM, Ralph H Castain <***@open-mpi.org<mailto:***@open-mpi.org>> wrote:

Did you configure OMPI —with-tm=<path-to-PBS-libs>? It looks like we didn’t build PBS support and so we only see one node with a single slot allocated to it.


On Oct 3, 2018, at 12:02 PM, Castellana Michele <***@curie.fr<mailto:***@curie.fr>> wrote:

Dear all,
I am having trouble running an MPI code across multiple cores on a new computer cluster, which uses PBS. Here is a minimal example, where I want to run two MPI processes, each on a different node. The PBS script is

#!/bin/bash
#PBS -l walltime=00:01:00
#PBS -l mem=1gb
#PBS -l nodes=2:ppn=1
#PBS -q batch
#PBS -N test
mpirun -np 2 ./code.o

and when I submit it with

$qsub script.sh

I get the following message in the PBS error file

$ cat test.e1234
[shbli040:08879] mca_base_component_repository_open: unable to open mca_plm_tm: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory (ignored)
[shbli040:08879] mca_base_component_repository_open: unable to open mca_oob_ud: libibverbs.so.1: cannot open shared object file: No such file or directory (ignored)
[shbli040:08879] mca_base_component_repository_open: unable to open mca_ras_tm: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory (ignored)
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 2 slots
that were requested by the application:
./code.o

Either request fewer slots for your application, or make more slots available
for use.
—————————————————————————————————————

The PBS version is

$ qstat --version
Version: 6.1.2

and here is some additional information on the MPI version

$ mpicc -v
Using built-in specs.
COLLECT_GCC=/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
Target: x86_64-redhat-linux
[
]
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC)

Do you guys know what may be the issue here?

Thank you
Best,







_______________________________________________
users mailing list
***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>
https://lists.open-mpi.org/mailman/listinfo/users

_______________________________________________
users mailing list
***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>
https://lists.open-mpi.org/mailman/listinfo/users

_______________________________________________
users mailing list
***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>
https://lists.open-mpi.org/mailman/listinfo/users
Jeff Squyres (jsquyres) via users
2018-10-03 21:00:46 UTC
Permalink
It's probably in your Linux distro somewhere -- I'd guess you're missing a package (e.g., an RPM or a deb) out on your compute nodes...?
Post by Castellana Michele
Dear Ralph,
Thank you for your reply. Do you know where I could find libcrypto.so.0.9.8 ?
Best,
Post by Ralph H Castain
Actually, I see that you do have the tm components built, but they cannot be loaded because you are missing libcrypto from your LD_LIBRARY_PATH
Did you configure OMPI —with-tm=<path-to-PBS-libs>? It looks like we didn’t build PBS support and so we only see one node with a single slot allocated to it.
Post by Castellana Michele
Dear all,
I am having trouble running an MPI code across multiple cores on a new computer cluster, which uses PBS. Here is a minimal example, where I want to run two MPI processes, each on a different node. The PBS script is
#!/bin/bash
#PBS -l walltime=00:01:00
#PBS -l mem=1gb
#PBS -l nodes=2:ppn=1
#PBS -q batch
#PBS -N test
mpirun -np 2 ./code.o
and when I submit it with
$qsub script.sh
I get the following message in the PBS error file
$ cat test.e1234
[shbli040:08879] mca_base_component_repository_open: unable to open mca_plm_tm: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory (ignored)
[shbli040:08879] mca_base_component_repository_open: unable to open mca_oob_ud: libibverbs.so.1: cannot open shared object file: No such file or directory (ignored)
[shbli040:08879] mca_base_component_repository_open: unable to open mca_ras_tm: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory (ignored)
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 2 slots
./code.o
Either request fewer slots for your application, or make more slots available
for use.
—————————————————————————————————————
The PBS version is
$ qstat --version
Version: 6.1.2
and here is some additional information on the MPI version
$ mpicc -v
Using built-in specs.
COLLECT_GCC=/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
Target: x86_64-redhat-linux
[…]
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC)
Do you guys know what may be the issue here?
Thank you
Best,
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
--
Jeff Squyres
***@cisco.com
Castellana Michele
2018-10-03 21:30:54 UTC
Permalink
Thank you, I found some libcrypto files in /usr/lib indeed:

$ ls libcry*
libcrypt-2.17.so libcrypto.so.10 libcrypto.so.1.0.2k libcrypt.so.1

but I could not find libcrypto.so.0.9.8. Here<https://www.howopensource.com/2011/08/utserver-error-while-loading-shared-libraries-libssl-so-0-9-8-libcrypto-so-0-9-8-solved/> they suggest to create a hyperlink, but if I do I still get an error from MPI. Is there another way around this?

Best,

On Oct 3, 2018, at 11:00 PM, Jeff Squyres (jsquyres) via users <***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>> wrote:

It's probably in your Linux distro somewhere -- I'd guess you're missing a package (e.g., an RPM or a deb) out on your compute nodes...?


On Oct 3, 2018, at 4:24 PM, Castellana Michele <***@curie.fr<mailto:***@curie.fr>> wrote:

Dear Ralph,
Thank you for your reply. Do you know where I could find libcrypto.so.0.9.8 ?

Best,
On Oct 3, 2018, at 9:41 PM, Ralph H Castain <***@open-mpi.org<mailto:***@open-mpi.org>> wrote:

Actually, I see that you do have the tm components built, but they cannot be loaded because you are missing libcrypto from your LD_LIBRARY_PATH


On Oct 3, 2018, at 12:33 PM, Ralph H Castain <***@open-mpi.org<mailto:***@open-mpi.org>> wrote:

Did you configure OMPI —with-tm=<path-to-PBS-libs>? It looks like we didn’t build PBS support and so we only see one node with a single slot allocated to it.


On Oct 3, 2018, at 12:02 PM, Castellana Michele <***@curie.fr<mailto:***@curie.fr>> wrote:

Dear all,
I am having trouble running an MPI code across multiple cores on a new computer cluster, which uses PBS. Here is a minimal example, where I want to run two MPI processes, each on a different node. The PBS script is

#!/bin/bash
#PBS -l walltime=00:01:00
#PBS -l mem=1gb
#PBS -l nodes=2:ppn=1
#PBS -q batch
#PBS -N test
mpirun -np 2 ./code.o

and when I submit it with

$qsub script.sh

I get the following message in the PBS error file

$ cat test.e1234
[shbli040:08879] mca_base_component_repository_open: unable to open mca_plm_tm: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory (ignored)
[shbli040:08879] mca_base_component_repository_open: unable to open mca_oob_ud: libibverbs.so.1: cannot open shared object file: No such file or directory (ignored)
[shbli040:08879] mca_base_component_repository_open: unable to open mca_ras_tm: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory (ignored)
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 2 slots
that were requested by the application:
./code.o

Either request fewer slots for your application, or make more slots available
for use.
—————————————————————————————————————

The PBS version is

$ qstat --version
Version: 6.1.2

and here is some additional information on the MPI version

$ mpicc -v
Using built-in specs.
COLLECT_GCC=/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
Target: x86_64-redhat-linux
[
]
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC)

Do you guys know what may be the issue here?

Thank you
Best,







_______________________________________________
users mailing list
***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>
https://lists.open-mpi.org/mailman/listinfo/users

_______________________________________________
users mailing list
***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>
https://lists.open-mpi.org/mailman/listinfo/users

_______________________________________________
users mailing list
***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>
https://lists.open-mpi.org/mailman/listinfo/users

_______________________________________________
users mailing list
***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>
https://lists.open-mpi.org/mailman/listinfo/users
--
Jeff Squyres
***@cisco.com<mailto:***@cisco.com>

_______________________________________________
users mailing list
***@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
Castellana Michele
2018-10-03 22:57:34 UTC
Permalink
I fixed it, the correct file was in /lib64, not in /lib.

Thank you for your help.
On Oct 3, 2018, at 11:30 PM, Castellana Michele <***@curie.fr<mailto:***@curie.fr>> wrote:

Thank you, I found some libcrypto files in /usr/lib indeed:

$ ls libcry*
libcrypt-2.17.so libcrypto.so.10 libcrypto.so.1.0.2k libcrypt.so.1

but I could not find libcrypto.so.0.9.8. Here<https://www.howopensource.com/2011/08/utserver-error-while-loading-shared-libraries-libssl-so-0-9-8-libcrypto-so-0-9-8-solved/> they suggest to create a hyperlink, but if I do I still get an error from MPI. Is there another way around this?

Best,

On Oct 3, 2018, at 11:00 PM, Jeff Squyres (jsquyres) via users <***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>> wrote:

It's probably in your Linux distro somewhere -- I'd guess you're missing a package (e.g., an RPM or a deb) out on your compute nodes...?


On Oct 3, 2018, at 4:24 PM, Castellana Michele <***@curie.fr<mailto:***@curie.fr>> wrote:

Dear Ralph,
Thank you for your reply. Do you know where I could find libcrypto.so.0.9.8 ?

Best,
On Oct 3, 2018, at 9:41 PM, Ralph H Castain <***@open-mpi.org<mailto:***@open-mpi.org>> wrote:

Actually, I see that you do have the tm components built, but they cannot be loaded because you are missing libcrypto from your LD_LIBRARY_PATH


On Oct 3, 2018, at 12:33 PM, Ralph H Castain <***@open-mpi.org<mailto:***@open-mpi.org>> wrote:

Did you configure OMPI —with-tm=<path-to-PBS-libs>? It looks like we didn’t build PBS support and so we only see one node with a single slot allocated to it.


On Oct 3, 2018, at 12:02 PM, Castellana Michele <***@curie.fr<mailto:***@curie.fr>> wrote:

Dear all,
I am having trouble running an MPI code across multiple cores on a new computer cluster, which uses PBS. Here is a minimal example, where I want to run two MPI processes, each on a different node. The PBS script is

#!/bin/bash
#PBS -l walltime=00:01:00
#PBS -l mem=1gb
#PBS -l nodes=2:ppn=1
#PBS -q batch
#PBS -N test
mpirun -np 2 ./code.o

and when I submit it with

$qsub script.sh

I get the following message in the PBS error file

$ cat test.e1234
[shbli040:08879] mca_base_component_repository_open: unable to open mca_plm_tm: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory (ignored)
[shbli040:08879] mca_base_component_repository_open: unable to open mca_oob_ud: libibverbs.so.1: cannot open shared object file: No such file or directory (ignored)
[shbli040:08879] mca_base_component_repository_open: unable to open mca_ras_tm: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory (ignored)
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 2 slots
that were requested by the application:
./code.o

Either request fewer slots for your application, or make more slots available
for use.
—————————————————————————————————————

The PBS version is

$ qstat --version
Version: 6.1.2

and here is some additional information on the MPI version

$ mpicc -v
Using built-in specs.
COLLECT_GCC=/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
Target: x86_64-redhat-linux
[
]
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC)

Do you guys know what may be the issue here?

Thank you
Best,







_______________________________________________
users mailing list
***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>
https://lists.open-mpi.org/mailman/listinfo/users

_______________________________________________
users mailing list
***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>
https://lists.open-mpi.org/mailman/listinfo/users

_______________________________________________
users mailing list
***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>
https://lists.open-mpi.org/mailman/listinfo/users

_______________________________________________
users mailing list
***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>
https://lists.open-mpi.org/mailman/listinfo/users
--
Jeff Squyres
***@cisco.com<mailto:***@cisco.com>

_______________________________________________
users mailing list
***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>
https://lists.open-mpi.org/mailman/listinfo/users

_______________________________________________
users mailing list
***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>
https://lists.open-mpi.org/mailman/listinfo/users
John Hearns via users
2018-10-04 08:30:47 UTC
Permalink
Michele one tip: log into a compute node using ssh and as your own username.
If you use the Modules envirnonment then load the modules you use in
the job script
then use the ldd utility to check if you can load all the libraries
in the code.io executable

Actually you are better to submit a short batch job which does not use
mpirun but uses ldd
A proper batch job will duplicate the environment you wish to run in.

ldd ./code.io

By the way, is the batch system PBSPro or OpenPBS? Version 6 seems a bit old.
Can you say what version of Redhat or CentOS this cluster is installed with?



On Thu, 4 Oct 2018 at 00:02, Castellana Michele
Post by Castellana Michele
I fixed it, the correct file was in /lib64, not in /lib.
Thank you for your help.
$ ls libcry*
libcrypt-2.17.so libcrypto.so.10 libcrypto.so.1.0.2k libcrypt.so.1
but I could not find libcrypto.so.0.9.8. Here they suggest to create a hyperlink, but if I do I still get an error from MPI. Is there another way around this?
Best,
It's probably in your Linux distro somewhere -- I'd guess you're missing a package (e.g., an RPM or a deb) out on your compute nodes...?
Dear Ralph,
Thank you for your reply. Do you know where I could find libcrypto.so.0.9.8 ?
Best,
Actually, I see that you do have the tm components built, but they cannot be loaded because you are missing libcrypto from your LD_LIBRARY_PATH
Did you configure OMPI —with-tm=<path-to-PBS-libs>? It looks like we didn’t build PBS support and so we only see one node with a single slot allocated to it.
Dear all,
I am having trouble running an MPI code across multiple cores on a new computer cluster, which uses PBS. Here is a minimal example, where I want to run two MPI processes, each on a different node. The PBS script is
#!/bin/bash
#PBS -l walltime=00:01:00
#PBS -l mem=1gb
#PBS -l nodes=2:ppn=1
#PBS -q batch
#PBS -N test
mpirun -np 2 ./code.o
and when I submit it with
$qsub script.sh
I get the following message in the PBS error file
$ cat test.e1234
[shbli040:08879] mca_base_component_repository_open: unable to open mca_plm_tm: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory (ignored)
[shbli040:08879] mca_base_component_repository_open: unable to open mca_oob_ud: libibverbs.so.1: cannot open shared object file: No such file or directory (ignored)
[shbli040:08879] mca_base_component_repository_open: unable to open mca_ras_tm: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory (ignored)
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 2 slots
./code.o
Either request fewer slots for your application, or make more slots available
for use.
—————————————————————————————————————
The PBS version is
$ qstat --version
Version: 6.1.2
and here is some additional information on the MPI version
$ mpicc -v
Using built-in specs.
COLLECT_GCC=/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
Target: x86_64-redhat-linux
[…]
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC)
Do you guys know what may be the issue here?
Thank you
Best,
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
--
Jeff Squyres
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Gilles Gouaillardet
2018-10-04 07:51:19 UTC
Permalink
In this case, some Open MPI plugins are missing some third party libraries,
so you would have to ldd all the plugins (e.g. the .so files) located
in <prefix>/lib/openmpi
in order to evidence any issue.

Cheers,

Gilles

On Thu, Oct 4, 2018 at 4:34 PM John Hearns via users
Post by John Hearns via users
Michele one tip: log into a compute node using ssh and as your own username.
If you use the Modules envirnonment then load the modules you use in
the job script
then use the ldd utility to check if you can load all the libraries
in the code.io executable
Actually you are better to submit a short batch job which does not use
mpirun but uses ldd
A proper batch job will duplicate the environment you wish to run in.
ldd ./code.io
By the way, is the batch system PBSPro or OpenPBS? Version 6 seems a bit old.
Can you say what version of Redhat or CentOS this cluster is installed with?
On Thu, 4 Oct 2018 at 00:02, Castellana Michele
Post by Castellana Michele
I fixed it, the correct file was in /lib64, not in /lib.
Thank you for your help.
$ ls libcry*
libcrypt-2.17.so libcrypto.so.10 libcrypto.so.1.0.2k libcrypt.so.1
but I could not find libcrypto.so.0.9.8. Here they suggest to create a hyperlink, but if I do I still get an error from MPI. Is there another way around this?
Best,
It's probably in your Linux distro somewhere -- I'd guess you're missing a package (e.g., an RPM or a deb) out on your compute nodes...?
Dear Ralph,
Thank you for your reply. Do you know where I could find libcrypto.so.0.9.8 ?
Best,
Actually, I see that you do have the tm components built, but they cannot be loaded because you are missing libcrypto from your LD_LIBRARY_PATH
Did you configure OMPI —with-tm=<path-to-PBS-libs>? It looks like we didn’t build PBS support and so we only see one node with a single slot allocated to it.
Dear all,
I am having trouble running an MPI code across multiple cores on a new computer cluster, which uses PBS. Here is a minimal example, where I want to run two MPI processes, each on a different node. The PBS script is
#!/bin/bash
#PBS -l walltime=00:01:00
#PBS -l mem=1gb
#PBS -l nodes=2:ppn=1
#PBS -q batch
#PBS -N test
mpirun -np 2 ./code.o
and when I submit it with
$qsub script.sh
I get the following message in the PBS error file
$ cat test.e1234
[shbli040:08879] mca_base_component_repository_open: unable to open mca_plm_tm: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory (ignored)
[shbli040:08879] mca_base_component_repository_open: unable to open mca_oob_ud: libibverbs.so.1: cannot open shared object file: No such file or directory (ignored)
[shbli040:08879] mca_base_component_repository_open: unable to open mca_ras_tm: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory (ignored)
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 2 slots
./code.o
Either request fewer slots for your application, or make more slots available
for use.
—————————————————————————————————————
The PBS version is
$ qstat --version
Version: 6.1.2
and here is some additional information on the MPI version
$ mpicc -v
Using built-in specs.
COLLECT_GCC=/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
Target: x86_64-redhat-linux
[…]
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC)
Do you guys know what may be the issue here?
Thank you
Best,
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
--
Jeff Squyres
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Castellana Michele
2018-10-04 09:09:12 UTC
Permalink
Dear John,
Thank you for your reply. I have tried

ldd mpirun ./code.o

but I get an error message, I do not know what is the proper syntax to use ldd command. Here is the information about the Linux version

$ cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION=“7"

May you please tell me how to check whether the batch system is PBSPro or OpenPBS?

Best,




On Oct 4, 2018, at 10:30 AM, John Hearns via users <***@lists.open-mpi.org> wrote:

Michele one tip: log into a compute node using ssh and as your own username.
If you use the Modules envirnonment then load the modules you use in
the job script
then use the ldd utility to check if you can load all the libraries
in the code.io executable

Actually you are better to submit a short batch job which does not use
mpirun but uses ldd
A proper batch job will duplicate the environment you wish to run in.

ldd ./code.io

By the way, is the batch system PBSPro or OpenPBS? Version 6 seems a bit old.
Can you say what version of Redhat or CentOS this cluster is installed with?



On Thu, 4 Oct 2018 at 00:02, Castellana Michele
<***@curie.fr> wrote:

I fixed it, the correct file was in /lib64, not in /lib.

Thank you for your help.

On Oct 3, 2018, at 11:30 PM, Castellana Michele <***@curie.fr> wrote:

Thank you, I found some libcrypto files in /usr/lib indeed:

$ ls libcry*
libcrypt-2.17.so libcrypto.so.10 libcrypto.so.1.0.2k libcrypt.so.1

but I could not find libcrypto.so.0.9.8. Here they suggest to create a hyperlink, but if I do I still get an error from MPI. Is there another way around this?

Best,

On Oct 3, 2018, at 11:00 PM, Jeff Squyres (jsquyres) via users <***@lists.open-mpi.org> wrote:

It's probably in your Linux distro somewhere -- I'd guess you're missing a package (e.g., an RPM or a deb) out on your compute nodes...?


On Oct 3, 2018, at 4:24 PM, Castellana Michele <***@curie.fr> wrote:

Dear Ralph,
Thank you for your reply. Do you know where I could find libcrypto.so.0.9.8 ?

Best,

On Oct 3, 2018, at 9:41 PM, Ralph H Castain <***@open-mpi.org> wrote:

Actually, I see that you do have the tm components built, but they cannot be loaded because you are missing libcrypto from your LD_LIBRARY_PATH


On Oct 3, 2018, at 12:33 PM, Ralph H Castain <***@open-mpi.org> wrote:

Did you configure OMPI —with-tm=<path-to-PBS-libs>? It looks like we didn’t build PBS support and so we only see one node with a single slot allocated to it.


On Oct 3, 2018, at 12:02 PM, Castellana Michele <***@curie.fr> wrote:

Dear all,
I am having trouble running an MPI code across multiple cores on a new computer cluster, which uses PBS. Here is a minimal example, where I want to run two MPI processes, each on a different node. The PBS script is

#!/bin/bash
#PBS -l walltime=00:01:00
#PBS -l mem=1gb
#PBS -l nodes=2:ppn=1
#PBS -q batch
#PBS -N test
mpirun -np 2 ./code.o

and when I submit it with

$qsub script.sh

I get the following message in the PBS error file

$ cat test.e1234
[shbli040:08879] mca_base_component_repository_open: unable to open mca_plm_tm: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory (ignored)
[shbli040:08879] mca_base_component_repository_open: unable to open mca_oob_ud: libibverbs.so.1: cannot open shared object file: No such file or directory (ignored)
[shbli040:08879] mca_base_component_repository_open: unable to open mca_ras_tm: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory (ignored)
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 2 slots
that were requested by the application:
./code.o

Either request fewer slots for your application, or make more slots available
for use.
—————————————————————————————————————

The PBS version is

$ qstat --version
Version: 6.1.2

and here is some additional information on the MPI version

$ mpicc -v
Using built-in specs.
COLLECT_GCC=/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
Target: x86_64-redhat-linux
[…]
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC)

Do you guys know what may be the issue here?

Thank you
Best,







_______________________________________________
users mailing list
***@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


_______________________________________________
users mailing list
***@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


_______________________________________________
users mailing list
***@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


_______________________________________________
users mailing list
***@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
--
Jeff Squyres
***@cisco.com

_______________________________________________
users mailing list
***@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


_______________________________________________
users mailing list
***@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


_______________________________________________
users mailing list
***@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
***@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
John Hearns via users
2018-10-04 13:12:59 UTC
Permalink
Michele, the command is ldd ./code.io
I just Googled - ldd means List dynamic Dependencies

To find out the PBS batch system type - that is a good question!
Try this: qstat --version



On Thu, 4 Oct 2018 at 10:12, Castellana Michele
Post by Castellana Michele
Dear John,
Thank you for your reply. I have tried
ldd mpirun ./code.o
but I get an error message, I do not know what is the proper syntax to use ldd command. Here is the information about the Linux version
$ cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION=“7"
May you please tell me how to check whether the batch system is PBSPro or OpenPBS?
Best,
Michele one tip: log into a compute node using ssh and as your own username.
If you use the Modules envirnonment then load the modules you use in
the job script
then use the ldd utility to check if you can load all the libraries
in the code.io executable
Actually you are better to submit a short batch job which does not use
mpirun but uses ldd
A proper batch job will duplicate the environment you wish to run in.
ldd ./code.io
By the way, is the batch system PBSPro or OpenPBS? Version 6 seems a bit old.
Can you say what version of Redhat or CentOS this cluster is installed with?
On Thu, 4 Oct 2018 at 00:02, Castellana Michele
I fixed it, the correct file was in /lib64, not in /lib.
Thank you for your help.
$ ls libcry*
libcrypt-2.17.so libcrypto.so.10 libcrypto.so.1.0.2k libcrypt.so.1
but I could not find libcrypto.so.0.9.8. Here they suggest to create a hyperlink, but if I do I still get an error from MPI. Is there another way around this?
Best,
It's probably in your Linux distro somewhere -- I'd guess you're missing a package (e.g., an RPM or a deb) out on your compute nodes...?
Dear Ralph,
Thank you for your reply. Do you know where I could find libcrypto.so.0.9.8 ?
Best,
Actually, I see that you do have the tm components built, but they cannot be loaded because you are missing libcrypto from your LD_LIBRARY_PATH
Did you configure OMPI —with-tm=<path-to-PBS-libs>? It looks like we didn’t build PBS support and so we only see one node with a single slot allocated to it.
Dear all,
I am having trouble running an MPI code across multiple cores on a new computer cluster, which uses PBS. Here is a minimal example, where I want to run two MPI processes, each on a different node. The PBS script is
#!/bin/bash
#PBS -l walltime=00:01:00
#PBS -l mem=1gb
#PBS -l nodes=2:ppn=1
#PBS -q batch
#PBS -N test
mpirun -np 2 ./code.o
and when I submit it with
$qsub script.sh
I get the following message in the PBS error file
$ cat test.e1234
[shbli040:08879] mca_base_component_repository_open: unable to open mca_plm_tm: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory (ignored)
[shbli040:08879] mca_base_component_repository_open: unable to open mca_oob_ud: libibverbs.so.1: cannot open shared object file: No such file or directory (ignored)
[shbli040:08879] mca_base_component_repository_open: unable to open mca_ras_tm: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory (ignored)
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 2 slots
./code.o
Either request fewer slots for your application, or make more slots available
for use.
—————————————————————————————————————
The PBS version is
$ qstat --version
Version: 6.1.2
and here is some additional information on the MPI version
$ mpicc -v
Using built-in specs.
COLLECT_GCC=/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
Target: x86_64-redhat-linux
[…]
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC)
Do you guys know what may be the issue here?
Thank you
Best,
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
--
Jeff Squyres
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Jeff Squyres (jsquyres) via users
2018-10-04 18:47:23 UTC
Permalink
Note that what Gilles said is correct: it's not just the dependent libraries of libmpi.so (and friends) that matter -- it's also the dependent libraries of all of Open MPI's plugins that matter.

You can run "ldd *.so" in the lib directory where you installed Open MPI, but you'll also need to "ldd *.so" in the lib/openmpi directory -- that's where Open MPI installs its plugins.

I suspect that if you run "ldd lib/openmpi/mca_plm_tm.so" on the head node, you'll see all the dependent libraries listed. But if you run the same command on your back-end compute nodes, it might say "not found" for some of the libraries.
Post by John Hearns via users
Michele, the command is ldd ./code.io
I just Googled - ldd means List dynamic Dependencies
To find out the PBS batch system type - that is a good question!
Try this: qstat --version
On Thu, 4 Oct 2018 at 10:12, Castellana Michele
Post by Castellana Michele
Dear John,
Thank you for your reply. I have tried
ldd mpirun ./code.o
but I get an error message, I do not know what is the proper syntax to use ldd command. Here is the information about the Linux version
$ cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION=“7"
May you please tell me how to check whether the batch system is PBSPro or OpenPBS?
Best,
Michele one tip: log into a compute node using ssh and as your own username.
If you use the Modules envirnonment then load the modules you use in
the job script
then use the ldd utility to check if you can load all the libraries
in the code.io executable
Actually you are better to submit a short batch job which does not use
mpirun but uses ldd
A proper batch job will duplicate the environment you wish to run in.
ldd ./code.io
By the way, is the batch system PBSPro or OpenPBS? Version 6 seems a bit old.
Can you say what version of Redhat or CentOS this cluster is installed with?
On Thu, 4 Oct 2018 at 00:02, Castellana Michele
I fixed it, the correct file was in /lib64, not in /lib.
Thank you for your help.
$ ls libcry*
libcrypt-2.17.so libcrypto.so.10 libcrypto.so.1.0.2k libcrypt.so.1
but I could not find libcrypto.so.0.9.8. Here they suggest to create a hyperlink, but if I do I still get an error from MPI. Is there another way around this?
Best,
It's probably in your Linux distro somewhere -- I'd guess you're missing a package (e.g., an RPM or a deb) out on your compute nodes...?
Dear Ralph,
Thank you for your reply. Do you know where I could find libcrypto.so.0.9.8 ?
Best,
Actually, I see that you do have the tm components built, but they cannot be loaded because you are missing libcrypto from your LD_LIBRARY_PATH
Did you configure OMPI —with-tm=<path-to-PBS-libs>? It looks like we didn’t build PBS support and so we only see one node with a single slot allocated to it.
Dear all,
I am having trouble running an MPI code across multiple cores on a new computer cluster, which uses PBS. Here is a minimal example, where I want to run two MPI processes, each on a different node. The PBS script is
#!/bin/bash
#PBS -l walltime=00:01:00
#PBS -l mem=1gb
#PBS -l nodes=2:ppn=1
#PBS -q batch
#PBS -N test
mpirun -np 2 ./code.o
and when I submit it with
$qsub script.sh
I get the following message in the PBS error file
$ cat test.e1234
[shbli040:08879] mca_base_component_repository_open: unable to open mca_plm_tm: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory (ignored)
[shbli040:08879] mca_base_component_repository_open: unable to open mca_oob_ud: libibverbs.so.1: cannot open shared object file: No such file or directory (ignored)
[shbli040:08879] mca_base_component_repository_open: unable to open mca_ras_tm: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory (ignored)
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 2 slots
./code.o
Either request fewer slots for your application, or make more slots available
for use.
—————————————————————————————————————
The PBS version is
$ qstat --version
Version: 6.1.2
and here is some additional information on the MPI version
$ mpicc -v
Using built-in specs.
COLLECT_GCC=/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
Target: x86_64-redhat-linux
[…]
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC)
Do you guys know what may be the issue here?
Thank you
Best,
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
--
Jeff Squyres
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
--
Jeff Squyres
***@cisco.com
Castellana Michele
2018-10-09 15:03:41 UTC
Permalink
Dear John,
Thank you for your reply. Here is the output of ldd

$ ldd ./code.io<http://code.io>
linux-vdso.so.1 => (0x00007ffcc759f000)
liblapack.so.3 => /usr/lib64/liblapack.so.3 (0x00007fbc1c613000)
libgsl.so.0 => /usr/lib64/libgsl.so.0 (0x00007fbc1c1ea000)
libgslcblas.so.0 => /usr/lib64/libgslcblas.so.0 (0x00007fbc1bfad000)
libmpi.so.40 => /data/users/xx/openmpi/lib/libmpi.so.40 (0x00007fbc1bcad000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007fbc1b9a6000)
libm.so.6 => /usr/lib64/libm.so.6 (0x00007fbc1b6a4000)
libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x00007fbc1b48e000)
libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x00007fbc1b272000)
libc.so.6 => /usr/lib64/libc.so.6 (0x00007fbc1aea5000)
libblas.so.3 => /usr/lib64/libblas.so.3 (0x00007fbc1ac4c000)
libgfortran.so.3 => /usr/lib64/libgfortran.so.3 (0x00007fbc1a92a000)
libsatlas.so.3 => /usr/lib64/atlas/libsatlas.so.3 (0x00007fbc19cdd000)
libopen-rte.so.40 => /data/users/xx/openmpi/lib/libopen-rte.so.40 (0x00007fbc19a2d000)
libopen-pal.so.40 => /data/users/xx/openmpi/lib/libopen-pal.so.40 (0x00007fbc19733000)
libdl.so.2 => /usr/lib64/libdl.so.2 (0x00007fbc1952f000)
librt.so.1 => /usr/lib64/librt.so.1 (0x00007fbc19327000)
libutil.so.1 => /usr/lib64/libutil.so.1 (0x00007fbc19124000)
libz.so.1 => /usr/lib64/libz.so.1 (0x00007fbc18f0e000)
/lib64/ld-linux-x86-64.so.2 (0x00007fbc1cd70000)
libquadmath.so.0 => /usr/lib64/libquadmath.so.0 (0x00007fbc18cd2000)

and the one for the PBS version

$ qstat --version
Version: 6.1.2
Commit: 661e092552de43a785c15d39a3634a541d86898e

After I created the symbolic links libcrypto.so.0.9.8 libssl.so.0.9.8, I still have one error message left from MPI:

mca_base_component_repository_open: unable to open mca_btl_openib: libibverbs.so.1: cannot open shared object file: No such file or directory (ignored)

Please let me know if you have any suggestions.

Best,


On Oct 4, 2018, at 3:12 PM, John Hearns via users <***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>> wrote:

Michele, the command is ldd ./code.io<http://code.io>
I just Googled - ldd means List dynamic Dependencies

To find out the PBS batch system type - that is a good question!
Try this: qstat --version



On Thu, 4 Oct 2018 at 10:12, Castellana Michele
<***@curie.fr<mailto:***@curie.fr>> wrote:

Dear John,
Thank you for your reply. I have tried

ldd mpirun ./code.o

but I get an error message, I do not know what is the proper syntax to use ldd command. Here is the information about the Linux version

$ cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION=“7"

May you please tell me how to check whether the batch system is PBSPro or OpenPBS?

Best,




On Oct 4, 2018, at 10:30 AM, John Hearns via users <***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>> wrote:

Michele one tip: log into a compute node using ssh and as your own username.
If you use the Modules envirnonment then load the modules you use in
the job script
then use the ldd utility to check if you can load all the libraries
in the code.io<http://code.io> executable

Actually you are better to submit a short batch job which does not use
mpirun but uses ldd
A proper batch job will duplicate the environment you wish to run in.

ldd ./code.io<http://code.io>

By the way, is the batch system PBSPro or OpenPBS? Version 6 seems a bit old.
Can you say what version of Redhat or CentOS this cluster is installed with?



On Thu, 4 Oct 2018 at 00:02, Castellana Michele
<***@curie.fr<mailto:***@curie.fr>> wrote:

I fixed it, the correct file was in /lib64, not in /lib.

Thank you for your help.

On Oct 3, 2018, at 11:30 PM, Castellana Michele <***@curie.fr<mailto:***@curie.fr>> wrote:

Thank you, I found some libcrypto files in /usr/lib indeed:

$ ls libcry*
libcrypt-2.17.so libcrypto.so.10 libcrypto.so.1.0.2k libcrypt.so.1

but I could not find libcrypto.so.0.9.8. Here they suggest to create a hyperlink, but if I do I still get an error from MPI. Is there another way around this?

Best,

On Oct 3, 2018, at 11:00 PM, Jeff Squyres (jsquyres) via users <***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>> wrote:

It's probably in your Linux distro somewhere -- I'd guess you're missing a package (e.g., an RPM or a deb) out on your compute nodes...?


On Oct 3, 2018, at 4:24 PM, Castellana Michele <***@curie.fr<mailto:***@curie.fr>> wrote:

Dear Ralph,
Thank you for your reply. Do you know where I could find libcrypto.so.0.9.8 ?

Best,

On Oct 3, 2018, at 9:41 PM, Ralph H Castain <***@open-mpi.org<mailto:***@open-mpi.org>> wrote:

Actually, I see that you do have the tm components built, but they cannot be loaded because you are missing libcrypto from your LD_LIBRARY_PATH


On Oct 3, 2018, at 12:33 PM, Ralph H Castain <***@open-mpi.org<mailto:***@open-mpi.org>> wrote:

Did you configure OMPI —with-tm=<path-to-PBS-libs>? It looks like we didn’t build PBS support and so we only see one node with a single slot allocated to it.


On Oct 3, 2018, at 12:02 PM, Castellana Michele <***@curie.fr<mailto:***@curie.fr>> wrote:

Dear all,
I am having trouble running an MPI code across multiple cores on a new computer cluster, which uses PBS. Here is a minimal example, where I want to run two MPI processes, each on a different node. The PBS script is

#!/bin/bash
#PBS -l walltime=00:01:00
#PBS -l mem=1gb
#PBS -l nodes=2:ppn=1
#PBS -q batch
#PBS -N test
mpirun -np 2 ./code.o

and when I submit it with

$qsub script.sh

I get the following message in the PBS error file

$ cat test.e1234
[shbli040:08879] mca_base_component_repository_open: unable to open mca_plm_tm: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory (ignored)
[shbli040:08879] mca_base_component_repository_open: unable to open mca_oob_ud: libibverbs.so.1: cannot open shared object file: No such file or directory (ignored)
[shbli040:08879] mca_base_component_repository_open: unable to open mca_ras_tm: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory (ignored)
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 2 slots
that were requested by the application:
./code.o

Either request fewer slots for your application, or make more slots available
for use.
—————————————————————————————————————

The PBS version is

$ qstat --version
Version: 6.1.2

and here is some additional information on the MPI version

$ mpicc -v
Using built-in specs.
COLLECT_GCC=/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
Target: x86_64-redhat-linux
[
]
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC)

Do you guys know what may be the issue here?

Thank you
Best,







_______________________________________________
users mailing list
***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>
https://lists.open-mpi.org/mailman/listinfo/users


_______________________________________________
users mailing list
***@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


_______________________________________________
users mailing list
***@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


_______________________________________________
users mailing list
***@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
--
Jeff Squyres
***@cisco.com

_______________________________________________
users mailing list
***@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


_______________________________________________
users mailing list
***@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


_______________________________________________
users mailing list
***@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
***@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

_______________________________________________
users mailing list
***@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>
https://lists.open-mpi.org/mailman/listinfo/users
John Hearns via users
2018-10-09 15:53:36 UTC
Permalink
Michele, as other have said libibverbs.so.1 is not in your library path.
Can you ask the person who manages yoru cluster where libibverbs is
located on the compute nodes?
Also try to run ibv_devinfo

On Tue, 9 Oct 2018 at 16:03, Castellana Michele
Post by Castellana Michele
Dear John,
Thank you for your reply. Here is the output of ldd
$ ldd ./code.io
linux-vdso.so.1 => (0x00007ffcc759f000)
liblapack.so.3 => /usr/lib64/liblapack.so.3 (0x00007fbc1c613000)
libgsl.so.0 => /usr/lib64/libgsl.so.0 (0x00007fbc1c1ea000)
libgslcblas.so.0 => /usr/lib64/libgslcblas.so.0 (0x00007fbc1bfad000)
libmpi.so.40 => /data/users/xx/openmpi/lib/libmpi.so.40 (0x00007fbc1bcad000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007fbc1b9a6000)
libm.so.6 => /usr/lib64/libm.so.6 (0x00007fbc1b6a4000)
libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x00007fbc1b48e000)
libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x00007fbc1b272000)
libc.so.6 => /usr/lib64/libc.so.6 (0x00007fbc1aea5000)
libblas.so.3 => /usr/lib64/libblas.so.3 (0x00007fbc1ac4c000)
libgfortran.so.3 => /usr/lib64/libgfortran.so.3 (0x00007fbc1a92a000)
libsatlas.so.3 => /usr/lib64/atlas/libsatlas.so.3 (0x00007fbc19cdd000)
libopen-rte.so.40 => /data/users/xx/openmpi/lib/libopen-rte.so.40 (0x00007fbc19a2d000)
libopen-pal.so.40 => /data/users/xx/openmpi/lib/libopen-pal.so.40 (0x00007fbc19733000)
libdl.so.2 => /usr/lib64/libdl.so.2 (0x00007fbc1952f000)
librt.so.1 => /usr/lib64/librt.so.1 (0x00007fbc19327000)
libutil.so.1 => /usr/lib64/libutil.so.1 (0x00007fbc19124000)
libz.so.1 => /usr/lib64/libz.so.1 (0x00007fbc18f0e000)
/lib64/ld-linux-x86-64.so.2 (0x00007fbc1cd70000)
libquadmath.so.0 => /usr/lib64/libquadmath.so.0 (0x00007fbc18cd2000)
and the one for the PBS version
$ qstat --version
Version: 6.1.2
Commit: 661e092552de43a785c15d39a3634a541d86898e
mca_base_component_repository_open: unable to open mca_btl_openib: libibverbs.so.1: cannot open shared object file: No such file or directory (ignored)
Please let me know if you have any suggestions.
Best,
Michele, the command is ldd ./code.io
I just Googled - ldd means List dynamic Dependencies
To find out the PBS batch system type - that is a good question!
Try this: qstat --version
On Thu, 4 Oct 2018 at 10:12, Castellana Michele
Dear John,
Thank you for your reply. I have tried
ldd mpirun ./code.o
but I get an error message, I do not know what is the proper syntax to use ldd command. Here is the information about the Linux version
$ cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION=“7"
May you please tell me how to check whether the batch system is PBSPro or OpenPBS?
Best,
Michele one tip: log into a compute node using ssh and as your own username.
If you use the Modules envirnonment then load the modules you use in
the job script
then use the ldd utility to check if you can load all the libraries
in the code.io executable
Actually you are better to submit a short batch job which does not use
mpirun but uses ldd
A proper batch job will duplicate the environment you wish to run in.
ldd ./code.io
By the way, is the batch system PBSPro or OpenPBS? Version 6 seems a bit old.
Can you say what version of Redhat or CentOS this cluster is installed with?
On Thu, 4 Oct 2018 at 00:02, Castellana Michele
I fixed it, the correct file was in /lib64, not in /lib.
Thank you for your help.
$ ls libcry*
libcrypt-2.17.so libcrypto.so.10 libcrypto.so.1.0.2k libcrypt.so.1
but I could not find libcrypto.so.0.9.8. Here they suggest to create a hyperlink, but if I do I still get an error from MPI. Is there another way around this?
Best,
It's probably in your Linux distro somewhere -- I'd guess you're missing a package (e.g., an RPM or a deb) out on your compute nodes...?
Dear Ralph,
Thank you for your reply. Do you know where I could find libcrypto.so.0.9.8 ?
Best,
Actually, I see that you do have the tm components built, but they cannot be loaded because you are missing libcrypto from your LD_LIBRARY_PATH
Did you configure OMPI —with-tm=<path-to-PBS-libs>? It looks like we didn’t build PBS support and so we only see one node with a single slot allocated to it.
Dear all,
I am having trouble running an MPI code across multiple cores on a new computer cluster, which uses PBS. Here is a minimal example, where I want to run two MPI processes, each on a different node. The PBS script is
#!/bin/bash
#PBS -l walltime=00:01:00
#PBS -l mem=1gb
#PBS -l nodes=2:ppn=1
#PBS -q batch
#PBS -N test
mpirun -np 2 ./code.o
and when I submit it with
$qsub script.sh
I get the following message in the PBS error file
$ cat test.e1234
[shbli040:08879] mca_base_component_repository_open: unable to open mca_plm_tm: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory (ignored)
[shbli040:08879] mca_base_component_repository_open: unable to open mca_oob_ud: libibverbs.so.1: cannot open shared object file: No such file or directory (ignored)
[shbli040:08879] mca_base_component_repository_open: unable to open mca_ras_tm: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory (ignored)
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 2 slots
./code.o
Either request fewer slots for your application, or make more slots available
for use.
—————————————————————————————————————
The PBS version is
$ qstat --version
Version: 6.1.2
and here is some additional information on the MPI version
$ mpicc -v
Using built-in specs.
COLLECT_GCC=/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
Target: x86_64-redhat-linux
[…]
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC)
Do you guys know what may be the issue here?
Thank you
Best,
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
--
Jeff Squyres
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Castellana Michele
2018-10-10 21:02:09 UTC
Permalink
Dear John,
I see, thank you for your reply. Unfortunately the cluster support is of poor quality, and it would take a while to get this information from them. Is there any way in which I can check this by myself? Also, it looks like ibv_devinfo does not exist on the cluster

$ ibv_devinfo
-bash: ibv_devinfo: command not found

Best,
Michele


On Oct 9, 2018, at 5:53 PM, John Hearns <***@googlemail.com<mailto:***@googlemail.com>> wrote:

Michele, as other have said libibverbs.so.1 is not in your library path.
Can you ask the person who manages yoru cluster where libibverbs is
located on the compute nodes?
Also try to run ibv_devinfo

On Tue, 9 Oct 2018 at 16:03, Castellana Michele
<***@curie.fr<mailto:***@curie.fr>> wrote:

Dear John,
Thank you for your reply. Here is the output of ldd

$ ldd ./code.io<http://code.io>
linux-vdso.so.1 => (0x00007ffcc759f000)
liblapack.so.3 => /usr/lib64/liblapack.so.3 (0x00007fbc1c613000)
libgsl.so.0 => /usr/lib64/libgsl.so.0 (0x00007fbc1c1ea000)
libgslcblas.so.0 => /usr/lib64/libgslcblas.so.0 (0x00007fbc1bfad000)
libmpi.so.40 => /data/users/xx/openmpi/lib/libmpi.so.40 (0x00007fbc1bcad000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007fbc1b9a6000)
libm.so.6 => /usr/lib64/libm.so.6 (0x00007fbc1b6a4000)
libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x00007fbc1b48e000)
libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x00007fbc1b272000)
libc.so.6 => /usr/lib64/libc.so.6 (0x00007fbc1aea5000)
libblas.so.3 => /usr/lib64/libblas.so.3 (0x00007fbc1ac4c000)
libgfortran.so.3 => /usr/lib64/libgfortran.so.3 (0x00007fbc1a92a000)
libsatlas.so.3 => /usr/lib64/atlas/libsatlas.so.3 (0x00007fbc19cdd000)
libopen-rte.so.40 => /data/users/xx/openmpi/lib/libopen-rte.so.40 (0x00007fbc19a2d000)
libopen-pal.so.40 => /data/users/xx/openmpi/lib/libopen-pal.so.40 (0x00007fbc19733000)
libdl.so.2 => /usr/lib64/libdl.so.2 (0x00007fbc1952f000)
librt.so.1 => /usr/lib64/librt.so.1 (0x00007fbc19327000)
libutil.so.1 => /usr/lib64/libutil.so.1 (0x00007fbc19124000)
libz.so.1 => /usr/lib64/libz.so.1 (0x00007fbc18f0e000)
/lib64/ld-linux-x86-64.so.2 (0x00007fbc1cd70000)
libquadmath.so.0 => /usr/lib64/libquadmath.so.0 (0x00007fbc18cd2000)

and the one for the PBS version

$ qstat --version
Version: 6.1.2
Commit: 661e092552de43a785c15d39a3634a541d86898e

After I created the symbolic links libcrypto.so.0.9.8 libssl.so.0.9.8, I still have one error message left from MPI:

mca_base_component_repository_open: unable to open mca_btl_openib: libibverbs.so.1: cannot open shared object file: No such file or directory (ignored)

Please let me know if you have any suggestions.

Best,


On Oct 4, 2018, at 3:12 PM, John Hearns via users <***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>> wrote:

Michele, the command is ldd ./code.io<http://code.io>
I just Googled - ldd means List dynamic Dependencies

To find out the PBS batch system type - that is a good question!
Try this: qstat --version



On Thu, 4 Oct 2018 at 10:12, Castellana Michele
<***@curie.fr<mailto:***@curie.fr>> wrote:


Dear John,
Thank you for your reply. I have tried

ldd mpirun ./code.o

but I get an error message, I do not know what is the proper syntax to use ldd command. Here is the information about the Linux version

$ cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION=“7"

May you please tell me how to check whether the batch system is PBSPro or OpenPBS?

Best,




On Oct 4, 2018, at 10:30 AM, John Hearns via users <***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>> wrote:

Michele one tip: log into a compute node using ssh and as your own username.
If you use the Modules envirnonment then load the modules you use in
the job script
then use the ldd utility to check if you can load all the libraries
in the code.io<http://code.io> executable

Actually you are better to submit a short batch job which does not use
mpirun but uses ldd
A proper batch job will duplicate the environment you wish to run in.

ldd ./code.io<http://code.io>

By the way, is the batch system PBSPro or OpenPBS? Version 6 seems a bit old.
Can you say what version of Redhat or CentOS this cluster is installed with?



On Thu, 4 Oct 2018 at 00:02, Castellana Michele
<***@curie.fr<mailto:***@curie.fr>> wrote:

I fixed it, the correct file was in /lib64, not in /lib.

Thank you for your help.

On Oct 3, 2018, at 11:30 PM, Castellana Michele <***@curie.fr<mailto:***@curie.fr>> wrote:

Thank you, I found some libcrypto files in /usr/lib indeed:

$ ls libcry*
libcrypt-2.17.so libcrypto.so.10 libcrypto.so.1.0.2k libcrypt.so.1

but I could not find libcrypto.so.0.9.8. Here they suggest to create a hyperlink, but if I do I still get an error from MPI. Is there another way around this?

Best,

On Oct 3, 2018, at 11:00 PM, Jeff Squyres (jsquyres) via users <***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>> wrote:

It's probably in your Linux distro somewhere -- I'd guess you're missing a package (e.g., an RPM or a deb) out on your compute nodes...?


On Oct 3, 2018, at 4:24 PM, Castellana Michele <***@curie.fr<mailto:***@curie.fr>> wrote:

Dear Ralph,
Thank you for your reply. Do you know where I could find libcrypto.so.0.9.8 ?

Best,

On Oct 3, 2018, at 9:41 PM, Ralph H Castain <***@open-mpi.org<mailto:***@open-mpi.org>> wrote:

Actually, I see that you do have the tm components built, but they cannot be loaded because you are missing libcrypto from your LD_LIBRARY_PATH


On Oct 3, 2018, at 12:33 PM, Ralph H Castain <***@open-mpi.org<mailto:***@open-mpi.org>> wrote:

Did you configure OMPI —with-tm=<path-to-PBS-libs>? It looks like we didn’t build PBS support and so we only see one node with a single slot allocated to it.


On Oct 3, 2018, at 12:02 PM, Castellana Michele <***@curie.fr<mailto:***@curie.fr>> wrote:

Dear all,
I am having trouble running an MPI code across multiple cores on a new computer cluster, which uses PBS. Here is a minimal example, where I want to run two MPI processes, each on a different node. The PBS script is

#!/bin/bash
#PBS -l walltime=00:01:00
#PBS -l mem=1gb
#PBS -l nodes=2:ppn=1
#PBS -q batch
#PBS -N test
mpirun -np 2 ./code.o

and when I submit it with

$qsub script.sh

I get the following message in the PBS error file

$ cat test.e1234
[shbli040:08879] mca_base_component_repository_open: unable to open mca_plm_tm: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory (ignored)
[shbli040:08879] mca_base_component_repository_open: unable to open mca_oob_ud: libibverbs.so.1: cannot open shared object file: No such file or directory (ignored)
[shbli040:08879] mca_base_component_repository_open: unable to open mca_ras_tm: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory (ignored)
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 2 slots
that were requested by the application:
./code.o

Either request fewer slots for your application, or make more slots available
for use.
—————————————————————————————————————

The PBS version is

$ qstat --version
Version: 6.1.2

and here is some additional information on the MPI version

$ mpicc -v
Using built-in specs.
COLLECT_GCC=/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
Target: x86_64-redhat-linux
[
]
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC)

Do you guys know what may be the issue here?

Thank you
Best,







_______________________________________________
users mailing list
***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>
https://lists.open-mpi.org/mailman/listinfo/users


_______________________________________________
users mailing list
***@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


_______________________________________________
users mailing list
***@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


_______________________________________________
users mailing list
***@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
--
Jeff Squyres
***@cisco.com

_______________________________________________
users mailing list
***@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


_______________________________________________
users mailing list
***@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


_______________________________________________
users mailing list
***@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
***@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

_______________________________________________
users mailing list
***@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

_______________________________________________
users mailing list
***@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
Bennet Fauber
2018-10-10 23:47:54 UTC
Permalink
There is a linux utility program `locate` that may be installed on
your system. You could try

$ locate ibv_devinfo

Thus, mine returns

$ locate ibv_devinfo
/usr/bin/ibv_devinfo
/usr/share/man/man1/ibv_devinfo.1.gz

That should find it if it is on local disk and not in a network
filesystem, and if the locate database is relatively complete.

I hope that helps, -- bennet

On Wed, Oct 10, 2018 at 5:03 PM Castellana Michele
Post by Castellana Michele
Dear John,
I see, thank you for your reply. Unfortunately the cluster support is of poor quality, and it would take a while to get this information from them. Is there any way in which I can check this by myself? Also, it looks like ibv_devinfo does not exist on the cluster
$ ibv_devinfo
-bash: ibv_devinfo: command not found
Best,
Michele
Michele, as other have said libibverbs.so.1 is not in your library path.
Can you ask the person who manages yoru cluster where libibverbs is
located on the compute nodes?
Also try to run ibv_devinfo
On Tue, 9 Oct 2018 at 16:03, Castellana Michele
Dear John,
Thank you for your reply. Here is the output of ldd
$ ldd ./code.io
linux-vdso.so.1 => (0x00007ffcc759f000)
liblapack.so.3 => /usr/lib64/liblapack.so.3 (0x00007fbc1c613000)
libgsl.so.0 => /usr/lib64/libgsl.so.0 (0x00007fbc1c1ea000)
libgslcblas.so.0 => /usr/lib64/libgslcblas.so.0 (0x00007fbc1bfad000)
libmpi.so.40 => /data/users/xx/openmpi/lib/libmpi.so.40 (0x00007fbc1bcad000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007fbc1b9a6000)
libm.so.6 => /usr/lib64/libm.so.6 (0x00007fbc1b6a4000)
libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x00007fbc1b48e000)
libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x00007fbc1b272000)
libc.so.6 => /usr/lib64/libc.so.6 (0x00007fbc1aea5000)
libblas.so.3 => /usr/lib64/libblas.so.3 (0x00007fbc1ac4c000)
libgfortran.so.3 => /usr/lib64/libgfortran.so.3 (0x00007fbc1a92a000)
libsatlas.so.3 => /usr/lib64/atlas/libsatlas.so.3 (0x00007fbc19cdd000)
libopen-rte.so.40 => /data/users/xx/openmpi/lib/libopen-rte.so.40 (0x00007fbc19a2d000)
libopen-pal.so.40 => /data/users/xx/openmpi/lib/libopen-pal.so.40 (0x00007fbc19733000)
libdl.so.2 => /usr/lib64/libdl.so.2 (0x00007fbc1952f000)
librt.so.1 => /usr/lib64/librt.so.1 (0x00007fbc19327000)
libutil.so.1 => /usr/lib64/libutil.so.1 (0x00007fbc19124000)
libz.so.1 => /usr/lib64/libz.so.1 (0x00007fbc18f0e000)
/lib64/ld-linux-x86-64.so.2 (0x00007fbc1cd70000)
libquadmath.so.0 => /usr/lib64/libquadmath.so.0 (0x00007fbc18cd2000)
and the one for the PBS version
$ qstat --version
Version: 6.1.2
Commit: 661e092552de43a785c15d39a3634a541d86898e
mca_base_component_repository_open: unable to open mca_btl_openib: libibverbs.so.1: cannot open shared object file: No such file or directory (ignored)
Please let me know if you have any suggestions.
Best,
Michele, the command is ldd ./code.io
I just Googled - ldd means List dynamic Dependencies
To find out the PBS batch system type - that is a good question!
Try this: qstat --version
On Thu, 4 Oct 2018 at 10:12, Castellana Michele
Dear John,
Thank you for your reply. I have tried
ldd mpirun ./code.o
but I get an error message, I do not know what is the proper syntax to use ldd command. Here is the information about the Linux version
$ cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION=“7"
May you please tell me how to check whether the batch system is PBSPro or OpenPBS?
Best,
Michele one tip: log into a compute node using ssh and as your own username.
If you use the Modules envirnonment then load the modules you use in
the job script
then use the ldd utility to check if you can load all the libraries
in the code.io executable
Actually you are better to submit a short batch job which does not use
mpirun but uses ldd
A proper batch job will duplicate the environment you wish to run in.
ldd ./code.io
By the way, is the batch system PBSPro or OpenPBS? Version 6 seems a bit old.
Can you say what version of Redhat or CentOS this cluster is installed with?
On Thu, 4 Oct 2018 at 00:02, Castellana Michele
I fixed it, the correct file was in /lib64, not in /lib.
Thank you for your help.
$ ls libcry*
libcrypt-2.17.so libcrypto.so.10 libcrypto.so.1.0.2k libcrypt.so.1
but I could not find libcrypto.so.0.9.8. Here they suggest to create a hyperlink, but if I do I still get an error from MPI. Is there another way around this?
Best,
It's probably in your Linux distro somewhere -- I'd guess you're missing a package (e.g., an RPM or a deb) out on your compute nodes...?
Dear Ralph,
Thank you for your reply. Do you know where I could find libcrypto.so.0.9.8 ?
Best,
Actually, I see that you do have the tm components built, but they cannot be loaded because you are missing libcrypto from your LD_LIBRARY_PATH
Did you configure OMPI —with-tm=<path-to-PBS-libs>? It looks like we didn’t build PBS support and so we only see one node with a single slot allocated to it.
Dear all,
I am having trouble running an MPI code across multiple cores on a new computer cluster, which uses PBS. Here is a minimal example, where I want to run two MPI processes, each on a different node. The PBS script is
#!/bin/bash
#PBS -l walltime=00:01:00
#PBS -l mem=1gb
#PBS -l nodes=2:ppn=1
#PBS -q batch
#PBS -N test
mpirun -np 2 ./code.o
and when I submit it with
$qsub script.sh
I get the following message in the PBS error file
$ cat test.e1234
[shbli040:08879] mca_base_component_repository_open: unable to open mca_plm_tm: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory (ignored)
[shbli040:08879] mca_base_component_repository_open: unable to open mca_oob_ud: libibverbs.so.1: cannot open shared object file: No such file or directory (ignored)
[shbli040:08879] mca_base_component_repository_open: unable to open mca_ras_tm: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory (ignored)
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 2 slots
./code.o
Either request fewer slots for your application, or make more slots available
for use.
—————————————————————————————————————
The PBS version is
$ qstat --version
Version: 6.1.2
and here is some additional information on the MPI version
$ mpicc -v
Using built-in specs.
COLLECT_GCC=/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
Target: x86_64-redhat-linux
[…]
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC)
Do you guys know what may be the issue here?
Thank you
Best,
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
--
Jeff Squyres
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Jeff Squyres (jsquyres) via users
2018-10-11 14:35:58 UTC
Permalink
If you don't have ibv_devinfo installed on your compute nodes, then you likely don't have the verbs package installed at all on your compute nodes. That's why you're getting errors about not finding libibverbs.so.

Specifically:

- It sounds like Open MPI was able to find libibverbs.so when it was built. So whatever node you were on when you configured/compiled/installed Open MPI, that node had libibverbs.so (and friends) installed properly, Open MPI found them during configure/make, and therefore it built/installed support for verbs.

- But then you're running that installed Open MPI on nodes where libibverbs.so potentially is not available (e.g., that package was not installed), so Open MPI fails to load the verbs-based plugins (because they need libibverbs.so), and therefore Open MPI emits warnings about that.

The same may well be true for the crypto libraries.
Post by Castellana Michele
Dear John,
I see, thank you for your reply. Unfortunately the cluster support is of poor quality, and it would take a while to get this information from them. Is there any way in which I can check this by myself? Also, it looks like ibv_devinfo does not exist on the cluster
$ ibv_devinfo
-bash: ibv_devinfo: command not found
Best,
Michele
Post by John Hearns via users
Michele, as other have said libibverbs.so.1 is not in your library path.
Can you ask the person who manages yoru cluster where libibverbs is
located on the compute nodes?
Also try to run ibv_devinfo
On Tue, 9 Oct 2018 at 16:03, Castellana Michele
Post by Castellana Michele
Dear John,
Thank you for your reply. Here is the output of ldd
$ ldd ./code.io
linux-vdso.so.1 => (0x00007ffcc759f000)
liblapack.so.3 => /usr/lib64/liblapack.so.3 (0x00007fbc1c613000)
libgsl.so.0 => /usr/lib64/libgsl.so.0 (0x00007fbc1c1ea000)
libgslcblas.so.0 => /usr/lib64/libgslcblas.so.0 (0x00007fbc1bfad000)
libmpi.so.40 => /data/users/xx/openmpi/lib/libmpi.so.40 (0x00007fbc1bcad000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007fbc1b9a6000)
libm.so.6 => /usr/lib64/libm.so.6 (0x00007fbc1b6a4000)
libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x00007fbc1b48e000)
libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x00007fbc1b272000)
libc.so.6 => /usr/lib64/libc.so.6 (0x00007fbc1aea5000)
libblas.so.3 => /usr/lib64/libblas.so.3 (0x00007fbc1ac4c000)
libgfortran.so.3 => /usr/lib64/libgfortran.so.3 (0x00007fbc1a92a000)
libsatlas.so.3 => /usr/lib64/atlas/libsatlas.so.3 (0x00007fbc19cdd000)
libopen-rte.so.40 => /data/users/xx/openmpi/lib/libopen-rte.so.40 (0x00007fbc19a2d000)
libopen-pal.so.40 => /data/users/xx/openmpi/lib/libopen-pal.so.40 (0x00007fbc19733000)
libdl.so.2 => /usr/lib64/libdl.so.2 (0x00007fbc1952f000)
librt.so.1 => /usr/lib64/librt.so.1 (0x00007fbc19327000)
libutil.so.1 => /usr/lib64/libutil.so.1 (0x00007fbc19124000)
libz.so.1 => /usr/lib64/libz.so.1 (0x00007fbc18f0e000)
/lib64/ld-linux-x86-64.so.2 (0x00007fbc1cd70000)
libquadmath.so.0 => /usr/lib64/libquadmath.so.0 (0x00007fbc18cd2000)
and the one for the PBS version
$ qstat --version
Version: 6.1.2
Commit: 661e092552de43a785c15d39a3634a541d86898e
mca_base_component_repository_open: unable to open mca_btl_openib: libibverbs.so.1: cannot open shared object file: No such file or directory (ignored)
Please let me know if you have any suggestions.
Best,
Michele, the command is ldd ./code.io
I just Googled - ldd means List dynamic Dependencies
To find out the PBS batch system type - that is a good question!
Try this: qstat --version
On Thu, 4 Oct 2018 at 10:12, Castellana Michele
Dear John,
Thank you for your reply. I have tried
ldd mpirun ./code.o
but I get an error message, I do not know what is the proper syntax to use ldd command. Here is the information about the Linux version
$ cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION=“7"
May you please tell me how to check whether the batch system is PBSPro or OpenPBS?
Best,
Michele one tip: log into a compute node using ssh and as your own username.
If you use the Modules envirnonment then load the modules you use in
the job script
then use the ldd utility to check if you can load all the libraries
in the code.io executable
Actually you are better to submit a short batch job which does not use
mpirun but uses ldd
A proper batch job will duplicate the environment you wish to run in.
ldd ./code.io
By the way, is the batch system PBSPro or OpenPBS? Version 6 seems a bit old.
Can you say what version of Redhat or CentOS this cluster is installed with?
On Thu, 4 Oct 2018 at 00:02, Castellana Michele
I fixed it, the correct file was in /lib64, not in /lib.
Thank you for your help.
$ ls libcry*
libcrypt-2.17.so libcrypto.so.10 libcrypto.so.1.0.2k libcrypt.so.1
but I could not find libcrypto.so.0.9.8. Here they suggest to create a hyperlink, but if I do I still get an error from MPI. Is there another way around this?
Best,
It's probably in your Linux distro somewhere -- I'd guess you're missing a package (e.g., an RPM or a deb) out on your compute nodes...?
Dear Ralph,
Thank you for your reply. Do you know where I could find libcrypto.so.0.9.8 ?
Best,
Actually, I see that you do have the tm components built, but they cannot be loaded because you are missing libcrypto from your LD_LIBRARY_PATH
Did you configure OMPI —with-tm=<path-to-PBS-libs>? It looks like we didn’t build PBS support and so we only see one node with a single slot allocated to it.
Dear all,
I am having trouble running an MPI code across multiple cores on a new computer cluster, which uses PBS. Here is a minimal example, where I want to run two MPI processes, each on a different node. The PBS script is
#!/bin/bash
#PBS -l walltime=00:01:00
#PBS -l mem=1gb
#PBS -l nodes=2:ppn=1
#PBS -q batch
#PBS -N test
mpirun -np 2 ./code.o
and when I submit it with
$qsub script.sh
I get the following message in the PBS error file
$ cat test.e1234
[shbli040:08879] mca_base_component_repository_open: unable to open mca_plm_tm: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory (ignored)
[shbli040:08879] mca_base_component_repository_open: unable to open mca_oob_ud: libibverbs.so.1: cannot open shared object file: No such file or directory (ignored)
[shbli040:08879] mca_base_component_repository_open: unable to open mca_ras_tm: libcrypto.so.0.9.8: cannot open shared object file: No such file or directory (ignored)
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 2 slots
./code.o
Either request fewer slots for your application, or make more slots available
for use.
—————————————————————————————————————
The PBS version is
$ qstat --version
Version: 6.1.2
and here is some additional information on the MPI version
$ mpicc -v
Using built-in specs.
COLLECT_GCC=/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
Target: x86_64-redhat-linux
[…]
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC)
Do you guys know what may be the issue here?
Thank you
Best,
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
--
Jeff Squyres
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
--
Jeff Squyres
***@cisco.com

Loading...