Discussion:
[OMPI users] "No objects of the specified type were found on at least one node"
Gilles Gouaillardet
2017-03-09 11:30:13 UTC
Permalink
Can you run
lstopo
in your machine, and post the output ?

can you also try
mpirun --map-by socket --bind-to socket ...
and see if it helps ?

Cheers,

Gilles
Hi,
which version of ompi are you running ?
2.0.1
this error can occur on systems with no NUMA object (e.g. single
socket with hwloc < 2)
as a workaround, you can
mpirun --map-by socket ...
with --map-by socket I get exactly the same issue (both in the login and
the compute node)
I will upgrade to 2.0.2 and see if this changes something.
Thanks,
--
Ángel de Vicente
http://www.iac.es/galeria/angelv/
------------------------------------------------------------
---------------------------------
ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de
Datos, acceda a http://www.iac.es/disclaimer.php
WARNING: For more information on privacy and fulfilment of the Law
concerning the Protection of Data, consult http://www.iac.es/disclaimer.
php?lang=en
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Angel de Vicente
2017-03-09 12:40:56 UTC
Permalink
Hi,
Post by Gilles Gouaillardet
Can you run
lstopo
in your machine, and post the output ?
no lstopo in my machine. This is part of hwloc, right?
Post by Gilles Gouaillardet
can you also try
mpirun --map-by socket --bind-to socket ...
and see if it helps ?
same issue.


Perhaps I need to compile hwloc as well??
--
Ángel de Vicente
http://www.iac.es/galeria/angelv/
---------------------------------------------------------------------------------------------
ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de Datos, acceda a http://www.iac.es/disclaimer.php
WARNING: For more information on privacy and fulfilment of the Law concerning the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en
g***@rist.or.jp
2017-03-09 12:51:58 UTC
Permalink
Yes, lstopo is part of hwloc

by default, Open MPI uses an embedded version of hwloc 1.11.2,
so i suggest you install the full hwloc with the same version

Cheers,

Gilles

----- Original Message -----
Post by Angel de Vicente
Hi,
Post by Gilles Gouaillardet
Can you run
lstopo
in your machine, and post the output ?
no lstopo in my machine. This is part of hwloc, right?
Post by Gilles Gouaillardet
can you also try
mpirun --map-by socket --bind-to socket ...
and see if it helps ?
same issue.
Perhaps I need to compile hwloc as well??
--
Ángel de Vicente
http://www.iac.es/galeria/angelv/
----------------------------------------------------------------------
-----------------------
Post by Angel de Vicente
ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protecció
n de Datos, acceda a http://www.iac.es/disclaimer.php
Post by Angel de Vicente
WARNING: For more information on privacy and fulfilment of the Law
concerning the Protection of Data, consult
http://www.iac.es/disclaimer.php?lang=en
Post by Angel de Vicente
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Angel de Vicente
2017-03-09 14:28:09 UTC
Permalink
Hi again,

thanks for your help. I installed the latest OpenMPI (2.0.2).

lstopo output:

,----
| lstopo --version
| lstopo 1.11.2
|
| lstopo
| Machine (7861MB)
| L2 L#0 (1024KB) + L1d L#0 (32KB) + L1i L#0 (64KB) + Core L#0 + PU L#0
| (P#0)
| L2 L#1 (1024KB) + L1d L#1 (32KB) + L1i L#1 (64KB) + Core L#1 + PU L#1
| (P#1)
| L2 L#2 (1024KB) + L1d L#2 (32KB) + L1i L#2 (64KB) + Core L#2 + PU L#2
| (P#2)
| L2 L#3 (1024KB) + L1d L#3 (32KB) + L1i L#3 (64KB) + Core L#3 + PU L#3
| (P#3)
| HostBridge L#0
| PCIBridge
| PCI 1014:028c
| Block L#0 "sda"
| PCI 14c1:8043
| Net L#1 "myri0"
| PCIBridge
| PCI 14e4:166b
| Net L#2 "eth0"
| PCI 14e4:166b
| Net L#3 "eth1"
| PCIBridge
| PCI 1002:515e
`----

I started with GCC 6.3.0, compiled OpenMPI 2.0.2 with it, and then HDF5
1.10.0-patch1 with it. Our code then compiles OK with it, and it runs OK
without "mpirun":

,----
| ./mancha3D
| __ __ _____
| /'\_/`\ /\ \ /'__`\ /\ _ `\
| /\ \ __ ___ ___\ \ \___ __ /\_\L\ \\ \ \/\ \
| \ \ \__\ \ /'__`\ /' _ `\ /'___\ \ _ `\ /'__`\\/_/_\_<_\ \ \ \ \
| \ \ \_/\ \/\ \L\.\_/\ \/\ \/\ \__/\ \ \ \ \/\ \L\.\_/\ \L\ \\ \ \_\ \
| \ \_\\ \_\ \__/.\_\ \_\ \_\ \____\\ \_\ \_\ \__/.\_\ \____/ \ \____/
| \/_/ \/_/\/__/\/_/\/_/\/_/\/____/ \/_/\/_/\/__/\/_/\/___/ \/___/
|
| ./mancha3D should be given the name of a control file as argument.
`----




But it complains as before when run with mpirun

,----
| mpirun --map-by socket --bind-to socket -np 1 ./mancha3D
| --------------------------------------------------------------------------
| No objects of the specified type were found on at least one node:
|
| Type: Package
| Node: login1
|
| The map cannot be done as specified.
| --------------------------------------------------------------------------
`----


If I submit it directly with srun, then the code runs, but not in
parallel, and two individual copies of the code are started:

,----
| srun -n 2 ./mancha3D
| __ __ _____
| /'\_/`\ /\ \ /'__`\ /\ _ `\
| /\ \ __ ___ ___\ \ \___ __ /\_\L\ \\ \ \/\ \
| \ \ \__\ \ /'__`\ /' _ `\ /'___\ \ _ `\ /'__`\\/_/_\_<_\ \ \ \ \
| \ \ \_/\ \/\ \L\.\_/\ \/\ \/\ \__/\ \ \ \ \/\ \L\.\_/\ \L\ \\ \ \_\ \
| \ \_\\ \_\ \__/.\_\ \_\ \_\ \____\\ \_\ \_\ \__/.\_\ \____/ \ \____/
| \/_/ \/_/\/__/\/_/\/_/\/_/\/____/ \/_/\/_/\/__/\/_/\/___/ \/___/
|
| should be given the name of a control file as argument.
| __ __ _____
| /'\_/`\ /\ \ /'__`\ /\ _ `\
| /\ \ __ ___ ___\ \ \___ __ /\_\L\ \\ \ \/\ \
| \ \ \__\ \ /'__`\ /' _ `\ /'___\ \ _ `\ /'__`\\/_/_\_<_\ \ \ \ \
| \ \ \_/\ \/\ \L\.\_/\ \/\ \/\ \__/\ \ \ \ \/\ \L\.\_/\ \L\ \\ \ \_\ \
| \ \_\\ \_\ \__/.\_\ \_\ \_\ \____\\ \_\ \_\ \__/.\_\ \____/ \ \____/
| \/_/ \/_/\/__/\/_/\/_/\/_/\/____/ \/_/\/_/\/__/\/_/\/___/ \/___/
|
| should be given the name of a control file as argument.
`----



Any ideas are welcome. Many thanks,
--
Ángel de Vicente
http://www.iac.es/galeria/angelv/
---------------------------------------------------------------------------------------------
ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de Datos, acceda a http://www.iac.es/disclaimer.php
WARNING: For more information on privacy and fulfilment of the Law concerning the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en
Brice Goglin
2017-03-09 15:04:16 UTC
Permalink
What's this machine made of? (processor, etc)
What kernel are you running ?

Getting no "socket" or "package" at all is quite rare these days.

Brice
Post by Angel de Vicente
Hi again,
thanks for your help. I installed the latest OpenMPI (2.0.2).
,----
| lstopo --version
| lstopo 1.11.2
|
| lstopo
| Machine (7861MB)
| L2 L#0 (1024KB) + L1d L#0 (32KB) + L1i L#0 (64KB) + Core L#0 + PU L#0
| (P#0)
| L2 L#1 (1024KB) + L1d L#1 (32KB) + L1i L#1 (64KB) + Core L#1 + PU L#1
| (P#1)
| L2 L#2 (1024KB) + L1d L#2 (32KB) + L1i L#2 (64KB) + Core L#2 + PU L#2
| (P#2)
| L2 L#3 (1024KB) + L1d L#3 (32KB) + L1i L#3 (64KB) + Core L#3 + PU L#3
| (P#3)
| HostBridge L#0
| PCIBridge
| PCI 1014:028c
| Block L#0 "sda"
| PCI 14c1:8043
| Net L#1 "myri0"
| PCIBridge
| PCI 14e4:166b
| Net L#2 "eth0"
| PCI 14e4:166b
| Net L#3 "eth1"
| PCIBridge
| PCI 1002:515e
`----
I started with GCC 6.3.0, compiled OpenMPI 2.0.2 with it, and then HDF5
1.10.0-patch1 with it. Our code then compiles OK with it, and it runs OK
,----
| ./mancha3D
| __ __ _____
| /'\_/`\ /\ \ /'__`\ /\ _ `\
| /\ \ __ ___ ___\ \ \___ __ /\_\L\ \\ \ \/\ \
| \ \ \__\ \ /'__`\ /' _ `\ /'___\ \ _ `\ /'__`\\/_/_\_<_\ \ \ \ \
| \ \ \_/\ \/\ \L\.\_/\ \/\ \/\ \__/\ \ \ \ \/\ \L\.\_/\ \L\ \\ \ \_\ \
| \ \_\\ \_\ \__/.\_\ \_\ \_\ \____\\ \_\ \_\ \__/.\_\ \____/ \ \____/
| \/_/ \/_/\/__/\/_/\/_/\/_/\/____/ \/_/\/_/\/__/\/_/\/___/ \/___/
|
| ./mancha3D should be given the name of a control file as argument.
`----
But it complains as before when run with mpirun
,----
| mpirun --map-by socket --bind-to socket -np 1 ./mancha3D
| --------------------------------------------------------------------------
|
| Type: Package
| Node: login1
|
| The map cannot be done as specified.
| --------------------------------------------------------------------------
`----
If I submit it directly with srun, then the code runs, but not in
,----
| srun -n 2 ./mancha3D
| __ __ _____
| /'\_/`\ /\ \ /'__`\ /\ _ `\
| /\ \ __ ___ ___\ \ \___ __ /\_\L\ \\ \ \/\ \
| \ \ \__\ \ /'__`\ /' _ `\ /'___\ \ _ `\ /'__`\\/_/_\_<_\ \ \ \ \
| \ \ \_/\ \/\ \L\.\_/\ \/\ \/\ \__/\ \ \ \ \/\ \L\.\_/\ \L\ \\ \ \_\ \
| \ \_\\ \_\ \__/.\_\ \_\ \_\ \____\\ \_\ \_\ \__/.\_\ \____/ \ \____/
| \/_/ \/_/\/__/\/_/\/_/\/_/\/____/ \/_/\/_/\/__/\/_/\/___/ \/___/
|
| should be given the name of a control file as argument.
| __ __ _____
| /'\_/`\ /\ \ /'__`\ /\ _ `\
| /\ \ __ ___ ___\ \ \___ __ /\_\L\ \\ \ \/\ \
| \ \ \__\ \ /'__`\ /' _ `\ /'___\ \ _ `\ /'__`\\/_/_\_<_\ \ \ \ \
| \ \ \_/\ \/\ \L\.\_/\ \/\ \/\ \__/\ \ \ \ \/\ \L\.\_/\ \L\ \\ \ \_\ \
| \ \_\\ \_\ \__/.\_\ \_\ \_\ \____\\ \_\ \_\ \__/.\_\ \____/ \ \____/
| \/_/ \/_/\/__/\/_/\/_/\/_/\/____/ \/_/\/_/\/__/\/_/\/___/ \/___/
|
| should be given the name of a control file as argument.
`----
Any ideas are welcome. Many thanks,
Angel de Vicente
2017-03-09 15:12:01 UTC
Permalink
Can this help? If you think any other information could be relevant, let me
know.

Cheers,
Ángel

cat /proc/cpuinfo
processor : 0
cpu : PPC970MP, altivec supported
clock : 2297.700000MHz
revision : 1.1 (pvr 0044 0101)

[4 processors]

timebase : 14318000
machine : CHRP IBM,8844-Z0C

uname -a
Linux login1 2.6.16.60-perfctr-0.42.4-ppc64 #1 SMP Fri Aug 21 15:25:15 CEST
2009 ppc64 ppc64 ppc64 GNU/Linux

lsb_release -a
Distributor ID: SUSE LINUX
Description: SUSE Linux Enterprise Server 10 (ppc)
Release: 10
Post by Brice Goglin
What's this machine made of? (processor, etc)
What kernel are you running ?
Getting no "socket" or "package" at all is quite rare these days.
Brice
Post by Angel de Vicente
Hi again,
thanks for your help. I installed the latest OpenMPI (2.0.2).
,----
| lstopo --version
| lstopo 1.11.2
|
| lstopo
| Machine (7861MB)
| L2 L#0 (1024KB) + L1d L#0 (32KB) + L1i L#0 (64KB) + Core L#0 + PU L#0
| (P#0)
| L2 L#1 (1024KB) + L1d L#1 (32KB) + L1i L#1 (64KB) + Core L#1 + PU L#1
| (P#1)
| L2 L#2 (1024KB) + L1d L#2 (32KB) + L1i L#2 (64KB) + Core L#2 + PU L#2
| (P#2)
| L2 L#3 (1024KB) + L1d L#3 (32KB) + L1i L#3 (64KB) + Core L#3 + PU L#3
| (P#3)
| HostBridge L#0
| PCIBridge
| PCI 1014:028c
| Block L#0 "sda"
| PCI 14c1:8043
| Net L#1 "myri0"
| PCIBridge
| PCI 14e4:166b
| Net L#2 "eth0"
| PCI 14e4:166b
| Net L#3 "eth1"
| PCIBridge
| PCI 1002:515e
`----
I started with GCC 6.3.0, compiled OpenMPI 2.0.2 with it, and then HDF5
1.10.0-patch1 with it. Our code then compiles OK with it, and it runs OK
,----
| ./mancha3D
| __ __ _____
| /'\_/`\ /\ \ /'__`\ /\ _ `\
| /\ \ __ ___ ___\ \ \___ __ /\_\L\ \\ \ \/\ \
| \ \ \__\ \ /'__`\ /' _ `\ /'___\ \ _ `\ /'__`\\/_/_\_<_\ \ \ \ \
| \ \ \_/\ \/\ \L\.\_/\ \/\ \/\ \__/\ \ \ \ \/\ \L\.\_/\ \L\ \\ \ \_\
\
Post by Angel de Vicente
| \ \_\\ \_\ \__/.\_\ \_\ \_\ \____\\ \_\ \_\ \__/.\_\ \____/ \
\____/
Post by Angel de Vicente
| \/_/ \/_/\/__/\/_/\/_/\/_/\/____/ \/_/\/_/\/__/\/_/\/___/ \/___/
|
| ./mancha3D should be given the name of a control file as argument.
`----
But it complains as before when run with mpirun
,----
| mpirun --map-by socket --bind-to socket -np 1 ./mancha3D
| ------------------------------------------------------------
--------------
Post by Angel de Vicente
|
| Type: Package
| Node: login1
|
| The map cannot be done as specified.
| ------------------------------------------------------------
--------------
Post by Angel de Vicente
`----
If I submit it directly with srun, then the code runs, but not in
,----
| srun -n 2 ./mancha3D
| __ __ _____
| /'\_/`\ /\ \ /'__`\ /\ _ `\
| /\ \ __ ___ ___\ \ \___ __ /\_\L\ \\ \ \/\ \
| \ \ \__\ \ /'__`\ /' _ `\ /'___\ \ _ `\ /'__`\\/_/_\_<_\ \ \ \ \
| \ \ \_/\ \/\ \L\.\_/\ \/\ \/\ \__/\ \ \ \ \/\ \L\.\_/\ \L\ \\ \ \_\
\
Post by Angel de Vicente
| \ \_\\ \_\ \__/.\_\ \_\ \_\ \____\\ \_\ \_\ \__/.\_\ \____/ \
\____/
Post by Angel de Vicente
| \/_/ \/_/\/__/\/_/\/_/\/_/\/____/ \/_/\/_/\/__/\/_/\/___/ \/___/
|
| should be given the name of a control file as argument.
| __ __ _____
| /'\_/`\ /\ \ /'__`\ /\ _ `\
| /\ \ __ ___ ___\ \ \___ __ /\_\L\ \\ \ \/\ \
| \ \ \__\ \ /'__`\ /' _ `\ /'___\ \ _ `\ /'__`\\/_/_\_<_\ \ \ \ \
| \ \ \_/\ \/\ \L\.\_/\ \/\ \/\ \__/\ \ \ \ \/\ \L\.\_/\ \L\ \\ \ \_\
\
Post by Angel de Vicente
| \ \_\\ \_\ \__/.\_\ \_\ \_\ \____\\ \_\ \_\ \__/.\_\ \____/ \
\____/
Post by Angel de Vicente
| \/_/ \/_/\/__/\/_/\/_/\/_/\/____/ \/_/\/_/\/__/\/_/\/___/ \/___/
|
| should be given the name of a control file as argument.
`----
Any ideas are welcome. Many thanks,
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Brice Goglin
2017-03-09 15:19:03 UTC
Permalink
Ok, that's a very old kernel on a very old POWER processor, it's
expected that hwloc doesn't get much topology information, and it's then
expected that OpenMPI cannot apply most binding policies.

Brice
Post by Angel de Vicente
Can this help? If you think any other information could be relevant,
let me know.
Cheers,
Ángel
cat /proc/cpuinfo
processor : 0
cpu : PPC970MP, altivec supported
clock : 2297.700000MHz
revision : 1.1 (pvr 0044 0101)
[4 processors]
timebase : 14318000
machine : CHRP IBM,8844-Z0C
uname -a
Linux login1 2.6.16.60-perfctr-0.42.4-ppc64 #1 SMP Fri Aug 21 15:25:15
CEST 2009 ppc64 ppc64 ppc64 GNU/Linux
lsb_release -a
Distributor ID: SUSE LINUX
Description: SUSE Linux Enterprise Server 10 (ppc)
Release: 10
What's this machine made of? (processor, etc)
What kernel are you running ?
Getting no "socket" or "package" at all is quite rare these days.
Brice
Post by Angel de Vicente
Hi again,
thanks for your help. I installed the latest OpenMPI (2.0.2).
,----
| lstopo --version
| lstopo 1.11.2
|
| lstopo
| Machine (7861MB)
| L2 L#0 (1024KB) + L1d L#0 (32KB) + L1i L#0 (64KB) + Core L#0
+ PU L#0
Post by Angel de Vicente
| (P#0)
| L2 L#1 (1024KB) + L1d L#1 (32KB) + L1i L#1 (64KB) + Core L#1
+ PU L#1
Post by Angel de Vicente
| (P#1)
| L2 L#2 (1024KB) + L1d L#2 (32KB) + L1i L#2 (64KB) + Core L#2
+ PU L#2
Post by Angel de Vicente
| (P#2)
| L2 L#3 (1024KB) + L1d L#3 (32KB) + L1i L#3 (64KB) + Core L#3
+ PU L#3
Post by Angel de Vicente
| (P#3)
| HostBridge L#0
| PCIBridge
| PCI 1014:028c
| Block L#0 "sda"
| PCI 14c1:8043
| Net L#1 "myri0"
| PCIBridge
| PCI 14e4:166b
| Net L#2 "eth0"
| PCI 14e4:166b
| Net L#3 "eth1"
| PCIBridge
| PCI 1002:515e
`----
I started with GCC 6.3.0, compiled OpenMPI 2.0.2 with it, and
then HDF5
Post by Angel de Vicente
1.10.0-patch1 with it. Our code then compiles OK with it, and it
runs OK
Post by Angel de Vicente
,----
| ./mancha3D
| __ __
_____
Post by Angel de Vicente
| /'\_/`\ /\ \ /'__`\
/\ _ `\
Post by Angel de Vicente
| /\ \ __ ___ ___\ \ \___ __ /\_\L\ \\
\ \/\ \
Post by Angel de Vicente
| \ \ \__\ \ /'__`\ /' _ `\ /'___\ \ _ `\
/'__`\\/_/_\_<_\ \ \ \ \
Post by Angel de Vicente
| \ \ \_/\ \/\ \L\.\_/\ \/\ \/\ \__/\ \ \ \ \/\ \L\.\_/\ \L\
\\ \ \_\ \
Post by Angel de Vicente
| \ \_\\ \_\ \__/.\_\ \_\ \_\ \____\\ \_\ \_\ \__/.\_\
\____/ \ \____/
Post by Angel de Vicente
| \/_/ \/_/\/__/\/_/\/_/\/_/\/____/
\/_/\/_/\/__/\/_/\/___/ \/___/
Post by Angel de Vicente
|
| ./mancha3D should be given the name of a control file as
argument.
Post by Angel de Vicente
`----
But it complains as before when run with mpirun
,----
| mpirun --map-by socket --bind-to socket -np 1 ./mancha3D
|
--------------------------------------------------------------------------
Post by Angel de Vicente
|
| Type: Package
| Node: login1
|
| The map cannot be done as specified.
|
--------------------------------------------------------------------------
Post by Angel de Vicente
`----
If I submit it directly with srun, then the code runs, but not in
,----
| srun -n 2 ./mancha3D
| __ __
_____
Post by Angel de Vicente
| /'\_/`\ /\ \ /'__`\
/\ _ `\
Post by Angel de Vicente
| /\ \ __ ___ ___\ \ \___ __ /\_\L\ \\
\ \/\ \
Post by Angel de Vicente
| \ \ \__\ \ /'__`\ /' _ `\ /'___\ \ _ `\
/'__`\\/_/_\_<_\ \ \ \ \
Post by Angel de Vicente
| \ \ \_/\ \/\ \L\.\_/\ \/\ \/\ \__/\ \ \ \ \/\ \L\.\_/\ \L\
\\ \ \_\ \
Post by Angel de Vicente
| \ \_\\ \_\ \__/.\_\ \_\ \_\ \____\\ \_\ \_\ \__/.\_\
\____/ \ \____/
Post by Angel de Vicente
| \/_/ \/_/\/__/\/_/\/_/\/_/\/____/
\/_/\/_/\/__/\/_/\/___/ \/___/
Post by Angel de Vicente
|
| should be given the name of a control file as argument.
| __ __
_____
Post by Angel de Vicente
| /'\_/`\ /\ \ /'__`\
/\ _ `\
Post by Angel de Vicente
| /\ \ __ ___ ___\ \ \___ __ /\_\L\ \\
\ \/\ \
Post by Angel de Vicente
| \ \ \__\ \ /'__`\ /' _ `\ /'___\ \ _ `\
/'__`\\/_/_\_<_\ \ \ \ \
Post by Angel de Vicente
| \ \ \_/\ \/\ \L\.\_/\ \/\ \/\ \__/\ \ \ \ \/\ \L\.\_/\ \L\
\\ \ \_\ \
Post by Angel de Vicente
| \ \_\\ \_\ \__/.\_\ \_\ \_\ \____\\ \_\ \_\ \__/.\_\
\____/ \ \____/
Post by Angel de Vicente
| \/_/ \/_/\/__/\/_/\/_/\/_/\/____/
\/_/\/_/\/__/\/_/\/___/ \/___/
Post by Angel de Vicente
|
| should be given the name of a control file as argument.
`----
Any ideas are welcome. Many thanks,
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
<https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Gilles Gouaillardet
2017-03-10 00:16:33 UTC
Permalink
Angel,


i suggest you get an xml topo with

hwloc --of xml

on both your "exotic" POWER platform and a more standard and recent one.

then you can manually edit the xml topology and add the missing objects.


finally, you can pass this to Open MPI like this


mpirun --mca hwloc_base_topo_file mytopo.xml ...


Cheers,


Gilles
Post by Brice Goglin
Ok, that's a very old kernel on a very old POWER processor, it's
expected that hwloc doesn't get much topology information, and it's
then expected that OpenMPI cannot apply most binding policies.
Brice
Post by Angel de Vicente
Can this help? If you think any other information could be relevant,
let me know.
Cheers,
Ángel
cat /proc/cpuinfo
processor : 0
cpu : PPC970MP, altivec supported
clock : 2297.700000MHz
revision : 1.1 (pvr 0044 0101)
[4 processors]
timebase : 14318000
machine : CHRP IBM,8844-Z0C
uname -a
Linux login1 2.6.16.60-perfctr-0.42.4-ppc64 #1 SMP Fri Aug 21
15:25:15 CEST 2009 ppc64 ppc64 ppc64 GNU/Linux
lsb_release -a
Distributor ID: SUSE LINUX
Description: SUSE Linux Enterprise Server 10 (ppc)
Release: 10
What's this machine made of? (processor, etc)
What kernel are you running ?
Getting no "socket" or "package" at all is quite rare these days.
Brice
Post by Angel de Vicente
Hi again,
thanks for your help. I installed the latest OpenMPI (2.0.2).
,----
| lstopo --version
| lstopo 1.11.2
|
| lstopo
| Machine (7861MB)
| L2 L#0 (1024KB) + L1d L#0 (32KB) + L1i L#0 (64KB) + Core
L#0 + PU L#0
Post by Angel de Vicente
| (P#0)
| L2 L#1 (1024KB) + L1d L#1 (32KB) + L1i L#1 (64KB) + Core
L#1 + PU L#1
Post by Angel de Vicente
| (P#1)
| L2 L#2 (1024KB) + L1d L#2 (32KB) + L1i L#2 (64KB) + Core
L#2 + PU L#2
Post by Angel de Vicente
| (P#2)
| L2 L#3 (1024KB) + L1d L#3 (32KB) + L1i L#3 (64KB) + Core
L#3 + PU L#3
Post by Angel de Vicente
| (P#3)
| HostBridge L#0
| PCIBridge
| PCI 1014:028c
| Block L#0 "sda"
| PCI 14c1:8043
| Net L#1 "myri0"
| PCIBridge
| PCI 14e4:166b
| Net L#2 "eth0"
| PCI 14e4:166b
| Net L#3 "eth1"
| PCIBridge
| PCI 1002:515e
`----
I started with GCC 6.3.0, compiled OpenMPI 2.0.2 with it, and
then HDF5
Post by Angel de Vicente
1.10.0-patch1 with it. Our code then compiles OK with it, and
it runs OK
Post by Angel de Vicente
,----
| ./mancha3D
| __ __ _____
| /'\_/`\ /\ \ /'__`\ /\ _ `\
| /\ \ __ ___ ___\ \ \___ __ /\_\L\ \\
\ \/\ \
Post by Angel de Vicente
| \ \ \__\ \ /'__`\ /' _ `\ /'___\ \ _ `\
/'__`\\/_/_\_<_\ \ \ \ \
Post by Angel de Vicente
| \ \ \_/\ \/\ \L\.\_/\ \/\ \/\ \__/\ \ \ \ \/\ \L\.\_/\ \L\
\\ \ \_\ \
Post by Angel de Vicente
| \ \_\\ \_\ \__/.\_\ \_\ \_\ \____\\ \_\ \_\ \__/.\_\
\____/ \ \____/
Post by Angel de Vicente
| \/_/ \/_/\/__/\/_/\/_/\/_/\/____/
\/_/\/_/\/__/\/_/\/___/ \/___/
Post by Angel de Vicente
|
| ./mancha3D should be given the name of a control file as
argument.
Post by Angel de Vicente
`----
But it complains as before when run with mpirun
,----
| mpirun --map-by socket --bind-to socket -np 1 ./mancha3D
|
--------------------------------------------------------------------------
Post by Angel de Vicente
|
| Type: Package
| Node: login1
|
| The map cannot be done as specified.
|
--------------------------------------------------------------------------
Post by Angel de Vicente
`----
If I submit it directly with srun, then the code runs, but not in
,----
| srun -n 2 ./mancha3D
| __ __ _____
| /'\_/`\ /\ \ /'__`\ /\ _ `\
| /\ \ __ ___ ___\ \ \___ __ /\_\L\ \\
\ \/\ \
Post by Angel de Vicente
| \ \ \__\ \ /'__`\ /' _ `\ /'___\ \ _ `\
/'__`\\/_/_\_<_\ \ \ \ \
Post by Angel de Vicente
| \ \ \_/\ \/\ \L\.\_/\ \/\ \/\ \__/\ \ \ \ \/\ \L\.\_/\ \L\
\\ \ \_\ \
Post by Angel de Vicente
| \ \_\\ \_\ \__/.\_\ \_\ \_\ \____\\ \_\ \_\ \__/.\_\
\____/ \ \____/
Post by Angel de Vicente
| \/_/ \/_/\/__/\/_/\/_/\/_/\/____/
\/_/\/_/\/__/\/_/\/___/ \/___/
Post by Angel de Vicente
|
| should be given the name of a control file as argument.
| __ __ _____
| /'\_/`\ /\ \ /'__`\ /\ _ `\
| /\ \ __ ___ ___\ \ \___ __ /\_\L\ \\
\ \/\ \
Post by Angel de Vicente
| \ \ \__\ \ /'__`\ /' _ `\ /'___\ \ _ `\
/'__`\\/_/_\_<_\ \ \ \ \
Post by Angel de Vicente
| \ \ \_/\ \/\ \L\.\_/\ \/\ \/\ \__/\ \ \ \ \/\ \L\.\_/\ \L\
\\ \ \_\ \
Post by Angel de Vicente
| \ \_\\ \_\ \__/.\_\ \_\ \_\ \____\\ \_\ \_\ \__/.\_\
\____/ \ \____/
Post by Angel de Vicente
| \/_/ \/_/\/__/\/_/\/_/\/_/\/____/
\/_/\/_/\/__/\/_/\/___/ \/___/
Post by Angel de Vicente
|
| should be given the name of a control file as argument.
`----
Any ideas are welcome. Many thanks,
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
<https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Angel de Vicente
2017-03-13 08:29:24 UTC
Permalink
Post by Brice Goglin
Ok, that's a very old kernel on a very old POWER processor, it's
expected that hwloc doesn't get much topology information, and it's
then expected that OpenMPI cannot apply most binding policies.
Just in case it can add anything, I tried with an older OpenMPI version
(1.10.6), and I cannot get it to work either, but the message is
different:

,----
| --------------------------------------------------------------------------
| No objects of the specified type were found on at least one node:
|
| Type: Socket
| Node: s01c1b08
|
| The map cannot be done as specified.
| --------------------------------------------------------------------------
`----
--
Ángel de Vicente
http://www.iac.es/galeria/angelv/
---------------------------------------------------------------------------------------------
ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de Datos, acceda a http://www.iac.es/disclaimer.php
WARNING: For more information on privacy and fulfilment of the Law concerning the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en
Loading...