Hi Gilles
Thank you for your prompt response.
Here is some information about the system
Ubuntu 16.04 server
Linux-4.4.0-75-generic-x86_64-with-Ubuntu-16.04-xenial
On HP PROLIANT DL320R05 Generation 5, 4GB RAM, 4x120GB raid-1 HDD, 2
ethernet ports 10/100/1000
HP StorageWorks 70 Modular Smart Array with 14x120GB HDD (RAID-5)
44 HP Proliant BL465c server blade, double AMD Opteron Model 2218(2.6GHz,
2MB, 95W), 4 GB RAM, 2 NC370i Multifunction Gigabit Servers Adapters, 120GB
User's area is shared with the nodes.
ssh and torque 6.0.2 services works fine
Torque and openmpi 2.1.0 are installed from tarball. configure
--prefix=/storage/exp_soft/tuc is used for the deployment of openmpi 2.1.0.
After make and make install binaries, lib and include files of openmpi2.1.0
are located under /storage/exp_soft/tuc .
/storage is a shared file system for all the nodes of the cluster
$PATH:
/storage/exp_soft/tuc/bin
/storage/exp_soft/tuc/sbin
/storage/exp_soft/tuc/torque/bin
/storage/exp_soft/tuc/torque/sbin
/usr/local/sbin
/usr/local/bin
/usr/sbin
/usr/bin
/sbin
/bin
/snap/bin
LD_LIBRARY_PATH=/storage/exp_soft/tuc/lib
C_INCLUDE_PATH=/storage/exp_soft/tuc/include
I use also jupyterhub (with cluster tab enabled) as a user interface to the
cluster. After the installation of python and some dependencies???? mpich
and openmpi are also installed in the system directories.
----------------------------------------------------------------------------
----------------------------------------------------------------------------
--------------------------
mpirun --allow-run-as-root --mca shmem_base_verbose 100 ...
[se01.grid.tuc.gr:19607] mca: base: components_register: registering
framework shmem components
[se01.grid.tuc.gr:19607] mca: base: components_register: found loaded
component sysv
[se01.grid.tuc.gr:19607] mca: base: components_register: component sysv
register function successful
[se01.grid.tuc.gr:19607] mca: base: components_register: found loaded
component posix
[se01.grid.tuc.gr:19607] mca: base: components_register: component posix
register function successful
[se01.grid.tuc.gr:19607] mca: base: components_register: found loaded
component mmap
[se01.grid.tuc.gr:19607] mca: base: components_register: component mmap
register function successful
[se01.grid.tuc.gr:19607] mca: base: components_open: opening shmem
components
[se01.grid.tuc.gr:19607] mca: base: components_open: found loaded component
sysv
[se01.grid.tuc.gr:19607] mca: base: components_open: component sysv open
function successful
[se01.grid.tuc.gr:19607] mca: base: components_open: found loaded component
posix
[se01.grid.tuc.gr:19607] mca: base: components_open: component posix open
function successful
[se01.grid.tuc.gr:19607] mca: base: components_open: found loaded component
mmap
[se01.grid.tuc.gr:19607] mca: base: components_open: component mmap open
function successful
[se01.grid.tuc.gr:19607] shmem: base: runtime_query: Auto-selecting shmem
components
[se01.grid.tuc.gr:19607] shmem: base: runtime_query: (shmem) Querying
component (run-time) [sysv]
[se01.grid.tuc.gr:19607] shmem: base: runtime_query: (shmem) Query of
component [sysv] set priority to 30
[se01.grid.tuc.gr:19607] shmem: base: runtime_query: (shmem) Querying
component (run-time) [posix]
[se01.grid.tuc.gr:19607] shmem: base: runtime_query: (shmem) Query of
component [posix] set priority to 40
[se01.grid.tuc.gr:19607] shmem: base: runtime_query: (shmem) Querying
component (run-time) [mmap]
[se01.grid.tuc.gr:19607] shmem: base: runtime_query: (shmem) Query of
component [mmap] set priority to 50
[se01.grid.tuc.gr:19607] shmem: base: runtime_query: (shmem) Selected
component [mmap]
[se01.grid.tuc.gr:19607] mca: base: close: unloading component sysv
[se01.grid.tuc.gr:19607] mca: base: close: unloading component posix
[se01.grid.tuc.gr:19607] shmem: base: best_runnable_component_name:
Searching for best runnable component.
[se01.grid.tuc.gr:19607] shmem: base: best_runnable_component_name: Found
best runnable component: (mmap).
--------------------------------------------------------------------------
mpirun was unable to find the specified executable file, and therefore
did not launch the job. This error was first reported for process
rank 0; it may have occurred for other processes as well.
NOTE: A common cause for this error is misspelling a mpirun command
line parameter option (remember that mpirun interprets the first
unrecognized command line token as the executable).
Node: se01
Executable: ...
--------------------------------------------------------------------------
2 total processes failed to start
[se01.grid.tuc.gr:19607] mca: base: close: component mmap closed
[se01.grid.tuc.gr:19607] mca: base: close: unloading component mmap
jb
-----Original Message-----
From: users [mailto:users-***@lists.open-mpi.org] On Behalf Of
***@rist.or.jp
Sent: Monday, May 15, 2017 1:47 PM
To: Open MPI Users <***@lists.open-mpi.org>
Subject: Re: [OMPI users] (no subject)
Ioannis,
### What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git
branch name and hash, etc.)
### Describe how Open MPI was installed (e.g., from a source/
distribution tarball, from a git clone, from an operating system
distribution package, etc.)
### Please describe the system on which you are running
* Operating system/version:
* Computer hardware:
* Network type:
also, what if you
mpirun --mca shmem_base_verbose 100 ...
Cheers,
Gilles
----- Original Message -----
Post by Ioannis BotsisHi
I am trying to run the following simple demo to a cluster of two nodes
----------------------------------------------------------------------
------------------------------------
Post by Ioannis Botsis#include <mpi.h>
#include <stdio.h>
int main(int argc, char** argv) {
MPI_Init(NULL, NULL);
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
char processor_name[MPI_MAX_PROCESSOR_NAME];
int name_len;
MPI_Get_processor_name(processor_name, &name_len);
printf("Hello world from processor %s, rank %d" " out of %d
processors\n", processor_name, world_rank, world_size);
MPI_Finalize();
}
----------------------------------------------------------------------
---------------------------
Post by Ioannis Botsisi get always the message
----------------------------------------------------------------------
--------------------------
Post by Ioannis BotsisIt looks like opal_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
opal_shmem_base_select failed
--> Returned value -1 instead of OPAL_SUCCESS
----------------------------------------------------------------------
----------------------------
Post by Ioannis Botsisany hint?
Ioannis Botsis
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users