Can you send the full verbose output with "--mca btl_base_verbose 100"?
Post by carlos aguniHi Gilles.
Thank you for your reply! :)
I'm now using a compiled version of OpenMPI 3.0.2 and all seems to work fine now.
c01
c02
c03
c02
c01
c01
c03
Which is expected.
Now when I run a MPI_Spawn it prints out a warning message which refers to it getting the wrong IP.
Check the command. I'll highlight some verbose.
Hello world from processor c01, rank 0 out of 2 processors
Im the spawned rank 0
Hello world from processor c03, rank 1 out of 2 processors
[[35996,2],0][btl_tcp_endpoint.c:755:mca_btl_tcp_endpoint_start_connect] from c03 to: c01 Unable to connect to the peer 10.0.0.1 on port 1024: Network is unreachable
[c03:06355] pml_ob1_sendreq.c:235 FATAL
[c01:05462] [[36010,0],0] oob:tcp:init adding 10.0.0.1 to our list of V4 connections
[c01:05462] [[36010,0],0] oob:tcp:init adding 172.16.0.1 to our list of V4 connections
[c01:05462] [[36010,0],0] oob:tcp:init adding 172.21.1.136 to our list of V4 connections
[c03:06225] [[36010,0],1] oob:tcp:init adding 192.168.0.1 to our list of V4 connections
[c03:06225] [[36010,0],1] oob:tcp:init adding 172.16.0.2 to our list of V4 connections
Is there a way to suppress it?
c01
ens8 10.0.0.1/24
ens9 172.16.0.1/24
eth0 172.21.1.136/24
c02
eth0 10.0.0.2/24
c03
ens8 192.168.0.1/24
eth1 172.16.0.2/24
c04
eth0 192.168.0.2/24
Regards,
Carlos.
Carlos,
Open MPI 3.0.2 has been released, and it contains several bug fixes, so I do
encourage you to upgrade and try again.
if it still does not work, can you please run
mpirun --mca oob_base_verbose 10 ...
and then compress and post the output ?
out of curiosity, would
mpirun --mca routed_radix 1 ...
work in your environment ?
once we can analyze the logs, we should be able to figure out what is going wrong.
Cheers,
Gilles
Just realized my email wasn't sent to the archive.
Hi!
Thank you all for your reply Jeff, Gilles and rhc.
Thank you Jeff and rhc for clarifying to me some of the openmpi's
internals.
Post by r***@open-mpi.orgPost by r***@open-mpi.orgFWIW: we never send interface names to other hosts - just dot
addresses
Post by r***@open-mpi.orgShould have clarified - when you specify an interface name for the
MCA param, then it is the interface name that is transferred as
that is the value of the MCA param. However, once we determine our
address, we only transfer dot addresses between ourselves
If only dot addresses are sent to the hosts then why doesn't
openmpi use the default route like `ip route get <other host IP>`
instead of choosing a random one? Is it an expected behaviour? Can
it be changed?
Sorry. As Gilles pointed out I forgot to mention which openmpi
version I was using. I'm using openmpi 3.0.0 gcc 7.3.0 from
openhpc. Centos 7.5.
Post by r***@open-mpi.orgmpirun—mca oob_tcp_if_exclude192.168.100.0/24
<http://192.168.100.0/24>...
I cannot just exclude that interface cause after that I want to
add another computer that's on a different network. And this is
where things get messy :( I cannot just include and exclude
networks cause I have different machines on different networks.
compute01
compute02
compute03
ens3
192.168.100.104/24 <http://192.168.100.104/24>
10.0.0.227/24 <http://10.0.0.227/24>
192.168.100.105/24 <http://192.168.100.105/24>
ens8
10.0.0.228/24 <http://10.0.0.228/24>
172.21.1.128/24 <http://172.21.1.128/24>
---
ens9
172.21.1.155/24 <http://172.21.1.155/24>
---
---
So I'm in compute01 MPI_spawning another process on compute02 and
compute03.
With both MPI_Spawn and `mpirun -n 3 -host
compute01,compute02,compute03 hostname`
`mpirun --oversubscribe --allow-run-as-root -n 3 --mca
oob_tcp_if_include 10.0.0.0/24,192.168.100.0/24
<http://10.0.0.0/24,192.168.100.0/24> -host
compute01,compute02,compute03 hostname`
WARNING: An invalid value was given for oob_tcp_if_include. This
value will be ignored.
...
Message: Did not find interface matching this subnet
This would all work if it were to use the system's internals like
`ip route`.
Best regards,
Carlos.
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users