Discussion:
[OMPI users] Open MPI config issue- Intel x86_64 to aarch64
Brendan Myers
2018-04-16 20:04:52 UTC
Permalink
Hello All,



I am not able to complete a simple mpirun command between two nodes over
tcp:

mpirun -np 2 -host 10.20.1.122,10.20.1.231 --mca btl tcp hostname

I am returned the error:

ORTE_ERROR_LOG: Data unpack would read past end of buffer in file
base/plm_base_launch_support.c at line 1232



I am able to run this command locally on each machine if I remove the remote
server from the host list. I am also able to ssh/scp over the two systems
using the same interfaces called as hosts in the above command, Key exchange
is working correctly.



I do not believe there Is an Open MPI version mismatch, both are at Open MPI
3.0.0 installed from the same tarball, configured with no parameters, and
compiled with no flags.



The main differences between the systems are:

1) one (EulerOS) is aarch64 and the other (sm-node-22) is Intel x84_64.

2) EulerOS is running a customized CentOS 7.3 with kernel 4.14.10

3) sm-node-22 is running CentOS 7.4 with kernel 3.10.0-693.el7.x86_64
(development + creative workstation packages)



I have attached all of the outputs that I thought would be helpful
including:

1) ompi_info for both systems

2) OS info for both systems

3) CPU info for both systems

4) Outputs of configure for both systems (large text files)



This setup is an attempt to mirror a reportedly working layout used by a
vendor attending our plugfest.

If there is some information I have overlooked that would be helpful please
let me know what you would like and how to get it. We appreciate any
assistance.





Thank you,



Brendan T. W. Myers

***@soft-forge.com <mailto:***@soft-forge.com>

Software Forge Inc

Loading...