Discussion:
[OMPI users] tcp_peer_send_blocking: send() to socket 9 failed: Broken pipe (32) on openvz containers
k***@gmail.com
2016-06-24 09:40:42 UTC
Permalink
Hi all!

I am trying to build a cluster for MPI jobs using OpenVZ containers (https://openvz.org/Main_Page).
I've been successfully using openvz+openmpi during many years but can't make it work with OpenMPI
1.10.x.
So I have a server with openvz support enabled. The output of it's ifconfig:

[***@server]$ ifconfig

eth0 Link encap:Ethernet HWaddr **************************
inet addr:10.0.50.35 Bcast:10.0.50.255 Mask:255.255.255.0
inet6 addr: fe80::ec4:7aff:feb0:cf7e/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:6117448 errors:103 dropped:0 overruns:0 frame:56
TX packets:765411 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3608033195 (3.3 GiB) TX bytes:70005631 (66.7 MiB)
Memory:fb120000-fb13ffff

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:52 errors:0 dropped:0 overruns:0 frame:0
TX packets:52 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:3788 (3.6 KiB) TX bytes:3788 (3.6 KiB)

venet0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
inet6 addr: fe80::1/128 Scope:Link
UP BROADCAST POINTOPOINT RUNNING NOARP MTU:1500 Metric:1
RX packets:486052 errors:0 dropped:0 overruns:0 frame:0
TX packets:805540 errors:0 dropped:17 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:26815645 (25.5 MiB) TX bytes:1186438623 (1.1 GiB)

There are two openvz containers running on that server:
[***@server ~]# vzlist -a
CTID NPROC STATUS IP_ADDR HOSTNAME
110 16 running 10.0.50.40 ct110.domain.org
111 11 running 10.0.50.41 ct111.domain.org

On one of the container I've built openmpi 1.10.3 with the following commands:
$ ./configure --prefix=/opt/openmpi/1.10.3 CXX=g++ --with-cuda=/usr/local/cuda CC=gcc CFLAGS=-m64
CXXFLAGS=-m64 2>&1|tee ~/openmpi-1.10.3_v1.log

$ make -j20

[root]$ make install

So openmpi was installed in /opt/openmpi/1.10.3/. The second container is a exact clone of the first
one.

The passwordless ssh was enabled across both containers:
[***@ct110 ~]$ ssh 10.0.50.41
Last login: Fri Jun 24 16:49:03 2016 from 10.0.50.40

[***@ct111 ~]$ ssh 10.0.50.40
Last login: Fri Jun 24 16:37:35 2016 from 10.0.50.41

But the simple test via mpi does not work:
mpirun -np 1 -host 10.0.50.41 hostname
[ct111.domain.org:00899] [[13749,0],1] tcp_peer_send_blocking: send() to socket 9 failed: Broken
pipe (32)
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.

* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.

* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
--------------------------------------------------------------------------

Although an evironment on the host with 10.0.50.41 ip address seems OK:
[***@ct110 ~]$ ssh 10.0.50.41 env|grep PATH
LD_LIBRARY_PATH=:/usr/local/cuda/lib64:/opt/openmpi/1.10.3/lib
PATH=/usr/local/bin:/bin:/usr/bin:/home/user/bin:/usr/local/cuda/bin:/opt/openmpi/1.10.3/bin

The ifconfig output from the inside containers:
[***@ct110 /]# ifconfig
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:38 errors:0 dropped:0 overruns:0 frame:0
TX packets:38 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:4559 (4.4 KiB) TX bytes:4559 (4.4 KiB)

venet0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
inet addr:127.0.0.1 P-t-P:127.0.0.1 Bcast:0.0.0.0 Mask:255.255.255.255
UP BROADCAST POINTOPOINT RUNNING NOARP MTU:1500 Metric:1
RX packets:772 errors:0 dropped:0 overruns:0 frame:0
TX packets:853 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:112128 (109.5 KiB) TX bytes:122092 (119.2 KiB)

venet0:0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
inet addr:10.0.50.40 P-t-P:10.0.50.40 Bcast:10.0.50.40 Mask:255.255.255.255
UP BROADCAST POINTOPOINT RUNNING NOARP MTU:1500 Metric:1

[***@ct111 /]# ifconfig
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:24 errors:0 dropped:0 overruns:0 frame:0
TX packets:24 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1200 (1.1 KiB) TX bytes:1200 (1.1 KiB)

venet0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
inet addr:127.0.0.1 P-t-P:127.0.0.1 Bcast:0.0.0.0 Mask:255.255.255.255
UP BROADCAST POINTOPOINT RUNNING NOARP MTU:1500 Metric:1
RX packets:855 errors:0 dropped:0 overruns:0 frame:0
TX packets:774 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:122212 (119.3 KiB) TX bytes:112304 (109.6 KiB)

venet0:0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
inet addr:10.0.50.41 P-t-P:10.0.50.41 Bcast:10.0.50.41 Mask:255.255.255.255
UP BROADCAST POINTOPOINT RUNNING NOARP MTU:1500 Metric:1

Exactly the same error I get if I try to restrict network interface to venet0:0:
[***@ct110 ~]$ /opt/openmpi/1.10.3/bin/mpirun --mca btl self,tcp --mca btl_tcp_if_include venet0:0
-np 1 -host 10.0.50.41 hostname
[ct111.domain.org:00945] [[13704,0],1] tcp_peer_send_blocking: send() to socket 9 failed: Broken
pipe (32)
[ ... snip....]

Although I can successfully run hostname and hello.bin executable from the same container where I
submit from:
[***@ct110 hello]$ /opt/openmpi/1.10.3/bin/mpirun --mca btl self,tcp --mca btl_tcp_if_include
venet0:0 -np 1 -host 10.0.50.40 hostname
ct110.domain.org
[***@ct110 hello]$ /opt/openmpi/1.10.3/bin/mpirun --mca btl self,tcp --mca btl_tcp_if_include
venet0:0 -np 1 -host 10.0.50.40 ./hello.bin
Hello world! from processor 0 (name=ct110.domain.org ) out of 1
wall clock time = 0.000002

Iptables is off on both containers.

I would assume that I faced with bug #3339 (https://svn.open-mpi.org/trac/ompi/ticket/3339) but I
have another cluster based on openvz containers with openmpi 1.6.5 which works perfectly for a
several years.

I would appreciate any help on that issue.

Best regards,
Nikolay.
Jeff Squyres (jsquyres)
2016-06-24 10:43:50 UTC
Permalink
Nikolay --

Thanks for all the detail! That helps a tremendous amount.

Open MPI actually uses IP networks in *two* ways:

1. for command and control
2. for MPI communications

Your use of btl_tcp_if_include regulates #2, but not #1 -- you need to add another MCA param to regulate #1. Try this:

mpirun --mca btl_tcp_if_include venet0:0 --mca oob_tcp_if_include venet0:0 ...

See if that works.
Post by k***@gmail.com
Hi all!
I am trying to build a cluster for MPI jobs using OpenVZ containers (https://openvz.org/Main_Page).
I've been successfully using openvz+openmpi during many years but can't make it work with OpenMPI 1.10.x.
eth0 Link encap:Ethernet HWaddr **************************
inet addr:10.0.50.35 Bcast:10.0.50.255 Mask:255.255.255.0
inet6 addr: fe80::ec4:7aff:feb0:cf7e/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:6117448 errors:103 dropped:0 overruns:0 frame:56
TX packets:765411 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3608033195 (3.3 GiB) TX bytes:70005631 (66.7 MiB)
Memory:fb120000-fb13ffff
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:52 errors:0 dropped:0 overruns:0 frame:0
TX packets:52 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:3788 (3.6 KiB) TX bytes:3788 (3.6 KiB)
venet0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
inet6 addr: fe80::1/128 Scope:Link
UP BROADCAST POINTOPOINT RUNNING NOARP MTU:1500 Metric:1
RX packets:486052 errors:0 dropped:0 overruns:0 frame:0
TX packets:805540 errors:0 dropped:17 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:26815645 (25.5 MiB) TX bytes:1186438623 (1.1 GiB)
CTID NPROC STATUS IP_ADDR HOSTNAME
110 16 running 10.0.50.40 ct110.domain.org
111 11 running 10.0.50.41 ct111.domain.org
$ ./configure --prefix=/opt/openmpi/1.10.3 CXX=g++ --with-cuda=/usr/local/cuda CC=gcc CFLAGS=-m64 CXXFLAGS=-m64 2>&1|tee ~/openmpi-1.10.3_v1.log
$ make -j20
[root]$ make install
So openmpi was installed in /opt/openmpi/1.10.3/. The second container is a exact clone of the first one.
Last login: Fri Jun 24 16:49:03 2016 from 10.0.50.40
Last login: Fri Jun 24 16:37:35 2016 from 10.0.50.41
mpirun -np 1 -host 10.0.50.41 hostname
[ct111.domain.org:00899] [[13749,0],1] tcp_peer_send_blocking: send() to socket 9 failed: Broken pipe (32)
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default
* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.
* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.
* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.
* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
--------------------------------------------------------------------------
LD_LIBRARY_PATH=:/usr/local/cuda/lib64:/opt/openmpi/1.10.3/lib
PATH=/usr/local/bin:/bin:/usr/bin:/home/user/bin:/usr/local/cuda/bin:/opt/openmpi/1.10.3/bin
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:38 errors:0 dropped:0 overruns:0 frame:0
TX packets:38 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:4559 (4.4 KiB) TX bytes:4559 (4.4 KiB)
venet0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
inet addr:127.0.0.1 P-t-P:127.0.0.1 Bcast:0.0.0.0 Mask:255.255.255.255
UP BROADCAST POINTOPOINT RUNNING NOARP MTU:1500 Metric:1
RX packets:772 errors:0 dropped:0 overruns:0 frame:0
TX packets:853 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:112128 (109.5 KiB) TX bytes:122092 (119.2 KiB)
venet0:0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
inet addr:10.0.50.40 P-t-P:10.0.50.40 Bcast:10.0.50.40 Mask:255.255.255.255
UP BROADCAST POINTOPOINT RUNNING NOARP MTU:1500 Metric:1
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:24 errors:0 dropped:0 overruns:0 frame:0
TX packets:24 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1200 (1.1 KiB) TX bytes:1200 (1.1 KiB)
venet0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
inet addr:127.0.0.1 P-t-P:127.0.0.1 Bcast:0.0.0.0 Mask:255.255.255.255
UP BROADCAST POINTOPOINT RUNNING NOARP MTU:1500 Metric:1
RX packets:855 errors:0 dropped:0 overruns:0 frame:0
TX packets:774 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:122212 (119.3 KiB) TX bytes:112304 (109.6 KiB)
venet0:0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
inet addr:10.0.50.41 P-t-P:10.0.50.41 Bcast:10.0.50.41 Mask:255.255.255.255
UP BROADCAST POINTOPOINT RUNNING NOARP MTU:1500 Metric:1
[ct111.domain.org:00945] [[13704,0],1] tcp_peer_send_blocking: send() to socket 9 failed: Broken pipe (32)
[ ... snip....]
ct110.domain.org
Hello world! from processor 0 (name=ct110.domain.org ) out of 1
wall clock time = 0.000002
Iptables is off on both containers.
I would assume that I faced with bug #3339 (https://svn.open-mpi.org/trac/ompi/ticket/3339) but I have another cluster based on openvz containers with openmpi 1.6.5 which works perfectly for a several years.
I would appreciate any help on that issue.
Best regards,
Nikolay.
_______________________________________________
users mailing list
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: http://www.open-mpi.org/community/lists/users/2016/06/29540.php
--
Jeff Squyres
***@cisco.com
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
k***@gmail.com
2016-06-24 11:26:00 UTC
Permalink
Post by Jeff Squyres (jsquyres)
Nikolay --
Thanks for all the detail! That helps a tremendous amount.
1. for command and control
2. for MPI communications
mpirun --mca btl_tcp_if_include venet0:0 --mca oob_tcp_if_include venet0:0 ...
See if that works.
Jeff, thanks a lot for such prompt reply, detailed explanation and suggestion! But unfortunately the
error is still the same:

[***@ct110 hello]$ /opt/openmpi/1.10.3/bin/mpirun --mca btl self,tcp --mca btl_tcp_if_include
venet0:0 --mca oob_tcp_if_include venet0:0 -np 1 -host 10.0.50.41 hostname
[ct111.domain.org:01054] [[12888,0],1] tcp_peer_send_blocking: send() to socket 9 failed: Broken
pipe (32)
[...snip...]
Jeff Squyres (jsquyres)
2016-06-24 13:08:07 UTC
Permalink
This post might be inappropriate. Click to display it.
k***@gmail.com
2016-06-24 14:31:27 UTC
Permalink
Jeff, It works now! Thank you so much!

[***@ct110 hello]$ /opt/openmpi/1.10.3-1/bin/mpirun --mca btl self,tcp --mca btl_tcp_if_include
venet0:0 --mca oob_tcp_if_include venet0:0 -npernode 1 -np 2 --hostfile mpi_hosts.txt hostname
ct110
ct111

[***@ct110 hello]$ /opt/openmpi/1.10.3-1/bin/mpirun --mca btl self,tcp --mca btl_tcp_if_include
venet0:0 --mca oob_tcp_if_include venet0:0 -npernode 1 -np 2 --hostfile mpi_hosts.txt ./hello.bin
Hello world! from processor 0 (name=ct110 ) out of 2
wall clock time = 0.000001
Hello world! from processor 1 (name=ct111 ) out of 2
wall clock time = 0.000002

It's not even needed to specify venet0:0:
[***@ct110 hello]$ /opt/openmpi/1.10.3-1/bin/mpirun -npernode 1 -np 2 --hostfile mpi_hosts.txt
./hello.bin
Hello world! from processor 0 (name=ct110 ) out of 2
wall clock time = 0.000002
Hello world! from processor 1 (name=ct111 ) out of 2
wall clock time = 0.000001

Thanks a lot indeed!
Post by Jeff Squyres (jsquyres)
Post by Jeff Squyres (jsquyres)
mpirun --mca btl_tcp_if_include venet0:0 --mca oob_tcp_if_include venet0:0 ...
See if that works.
[ct111.domain.org:01054] [[12888,0],1] tcp_peer_send_blocking: send() to socket 9 failed: Broken pipe (32)
[...snip...]
-----
diff --git a/opal/mca/if/posix_ipv4/if_posix.c b/opal/mca/if/posix_ipv4/if_posix
index 6f75533..ed447e7 100644
--- a/opal/mca/if/posix_ipv4/if_posix.c
+++ b/opal/mca/if/posix_ipv4/if_posix.c
@@ -221,6 +221,15 @@ static int if_posix_open(void)
strncpy(intf->if_name, ifr->ifr_name, sizeof(intf->if_name) - 1);
intf->if_flags = ifr->ifr_flags;
+ // JMS Hackaround for OpenVZ
+ if (strcmp(intf->if_name, "venet0") == 0) {
+ opal_output_verbose(1, opal_if_base_framework.framework_output,
+ "OpenVZ hack:%s:%d: skipping interface venet0",
+ __FILE__, __LINE__);
+ OBJ_RELEASE(intf);
+ continue;
+ }
+
/* every new address gets its own internal if_index */
intf->if_index = opal_list_get_size(&opal_if_list)+1;
-----
Can you try this and see if it works for you?
If so, we might need to something a bit more methodical / deliberate to make Open MPI work on openvz.
Jeff Squyres (jsquyres)
2016-06-24 14:45:50 UTC
Permalink
This post might be inappropriate. Click to display it.
Loading...