Discussion:
[OMPI users] Received eager message(s) from an unknown process error on KNL
Gallardo, Esthela
2017-04-28 00:21:17 UTC
Permalink
Hello,

I am currently running a couple of benchmarks on two Intel Xeon Phi 7250 second-generation KNL MIC compute nodes using Open MPI 2.1.0. While trying to run the osu_bcast benchmark with 8 MPI tasks (4 on each node), I noticed the following error in my output:

Received eager message(s) ptype=0x1 opcode=0xcc from an unknown process (err=49)

I have tried running the benchmark in the following manners:
mpirun -np 8 ./osu_bcast
mpirun -np 8 -hostfile hosti --npernode 4 ./osu_bcast
mpirun -np 8 -hostfile hosti --npernode 4 --mca mtl psm2 ./osu_bcast

But, nothing changes the error message at the end. Note, that the error does not really impact the results of the benchmark, so it’s possible that the error may be occurring in MPI_Finalize.

Also, in order to try to avoid getting this error, I tried to build the library with both of these configurations:
./configure --prefix=<path_to_build_folder> CC=icc CXX=icpc FC=ifort CFLAGS=-xCORE-AVX2 -axMIC-AVX512 CXXFLAGS=-xCORE-AVX2 -axMIC-AVX512 FFLAGS=-xCORE-AVX2 -axMIC-AVX512 LDFLAGS=-xCORE-AVX2 -axMIC-AVX512

./configure --prefix=<path_to_build_folder> —enable-orterun-prefix-by-default —with-cma=yes --with-psm2 CC=icc CXX=icpc FC=ifort --disable-shared --enable-static --without-slurm

However, this did not help prevent the occurrence of the error either. I was wondering if anyone has encountered this issue before, and what can be done in order to get rid of the error message.

Thank you,

Esthela Gallardo
George Bosilca
2017-04-28 02:46:11 UTC
Permalink
Esthela,

This error message is generated internally by the PSM2 library, so you will
not be able to get rid of it simply by recompiling Open MPI.

George.


On Thu, Apr 27, 2017 at 8:21 PM, Gallardo, Esthela <
Post by Gallardo, Esthela
Hello,
I am currently running a couple of benchmarks on two Intel Xeon Phi 7250
second-generation KNL MIC compute nodes using Open MPI 2.1.0. While trying
to run the osu_bcast benchmark with 8 MPI tasks (4 on each node), I noticed
Received eager message(s) ptype=0x1 opcode=0xcc from an unknown process (err=49)
mpirun -np 8 ./osu_bcast
mpirun -np 8 -hostfile hosti --npernode 4 ./osu_bcast
mpirun -np 8 -hostfile hosti --npernode 4 --mca mtl psm2 ./osu_bcast
But, nothing changes the error message at the end. Note, that the error
does not really impact the results of the benchmark, so it’s possible that
the error may be occurring in MPI_Finalize.
Also, in order to try to avoid getting this error, I tried to build the
./configure --prefix=<path_to_build_folder> CC=icc CXX=icpc FC=ifort
CFLAGS=-xCORE-AVX2 -axMIC-AVX512 CXXFLAGS=-xCORE-AVX2 -axMIC-AVX512
FFLAGS=-xCORE-AVX2 -axMIC-AVX512 LDFLAGS=-xCORE-AVX2 -axMIC-AVX512
./configure --prefix=<path_to_build_folder> —enable-orterun-prefix-by-default
—with-cma=yes --with-psm2 CC=icc CXX=icpc FC=ifort --disable-shared
--enable-static --without-slurm
However, this did not help prevent the occurrence of the error either. I
was wondering if anyone has encountered this issue before, and what can be
done in order to get rid of the error message.
Thank you,
Esthela Gallardo
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Cabral, Matias A
2017-04-28 19:16:36 UTC
Permalink
Hi Esthela,

As George mentions, this is indeed libpsm2 printing this error. Opcode=0xCC is a disconnect retry. There are a few scenarios that could be happening, but can simplify in saying it is an already disconnected endpoint message arriving late. What version of Intel Ompin-path Software or libpsm2 do you have in your system? We have not seen this error since the release of IFS 10.3.0. I suggest updating and testing again.

https://downloadcenter.intel.com/download/26567/Intel-Omni-Path-Fabric-Software-Including-Intel-Omni-Path-Host-Fabric-Interface-Driver-?v=t

Thanks,

_MAC

From: users [mailto:users-***@lists.open-mpi.org] On Behalf Of George Bosilca
Sent: Thursday, April 27, 2017 7:46 PM
To: Open MPI Users <***@lists.open-mpi.org>
Subject: Re: [OMPI users] Received eager message(s) from an unknown process error on KNL

Esthela,

This error message is generated internally by the PSM2 library, so you will not be able to get rid of it simply by recompiling Open MPI.

George.


On Thu, Apr 27, 2017 at 8:21 PM, Gallardo, Esthela <***@miners.utep.edu<mailto:***@miners.utep.edu>> wrote:
Hello,

I am currently running a couple of benchmarks on two Intel Xeon Phi 7250 second-generation KNL MIC compute nodes using Open MPI 2.1.0. While trying to run the osu_bcast benchmark with 8 MPI tasks (4 on each node), I noticed the following error in my output:

Received eager message(s) ptype=0x1 opcode=0xcc from an unknown process (err=49)

I have tried running the benchmark in the following manners:
mpirun -np 8 ./osu_bcast
mpirun -np 8 -hostfile hosti --npernode 4 ./osu_bcast
mpirun -np 8 -hostfile hosti --npernode 4 --mca mtl psm2 ./osu_bcast

But, nothing changes the error message at the end. Note, that the error does not really impact the results of the benchmark, so it’s possible that the error may be occurring in MPI_Finalize.

Also, in order to try to avoid getting this error, I tried to build the library with both of these configurations:
./configure --prefix=<path_to_build_folder> CC=icc CXX=icpc FC=ifort CFLAGS=-xCORE-AVX2 -axMIC-AVX512 CXXFLAGS=-xCORE-AVX2 -axMIC-AVX512 FFLAGS=-xCORE-AVX2 -axMIC-AVX512 LDFLAGS=-xCORE-AVX2 -axMIC-AVX512

./configure --prefix=<path_to_build_folder> —enable-orterun-prefix-by-default —with-cma=yes --with-psm2 CC=icc CXX=icpc FC=ifort --disable-shared --enable-static --without-slurm

However, this did not help prevent the occurrence of the error either. I was wondering if anyone has encountered this issue before, and what can be done in order to get rid of the error message.

Thank you,

Esthela Gallardo



_______________________________________________
users mailing list
***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Gallardo, Esthela
2017-05-02 18:33:24 UTC
Permalink
Hi,

Thank you so much for your help. I don’t have the permissions to update the software on the system I am using, but I will let the administrators know about the release.


Esthela Gallardo

From: users <users-***@lists.open-mpi.org> on behalf of "Cabral, Matias A" <***@intel.com>
Reply-To: Open MPI Users <***@lists.open-mpi.org>
Date: Friday, April 28, 2017 at 1:16 PM
To: Open MPI Users <***@lists.open-mpi.org>
Subject: Re: [OMPI users] Received eager message(s) from an unknown process error on KNL

Hi Esthela,

As George mentions, this is indeed libpsm2 printing this error. Opcode=0xCC is a disconnect retry. There are a few scenarios that could be happening, but can simplify in saying it is an already disconnected endpoint message arriving late. What version of Intel Ompin-path Software or libpsm2 do you have in your system? We have not seen this error since the release of IFS 10.3.0. I suggest updating and testing again.

https://downloadcenter.intel.com/download/26567/Intel-Omni-Path-Fabric-Software-Including-Intel-Omni-Path-Host-Fabric-Interface-Driver-?v=t

Thanks,

_MAC

From: users [mailto:users-***@lists.open-mpi.org] On Behalf Of George Bosilca
Sent: Thursday, April 27, 2017 7:46 PM
To: Open MPI Users <***@lists.open-mpi.org>
Subject: Re: [OMPI users] Received eager message(s) from an unknown process error on KNL

Esthela,

This error message is generated internally by the PSM2 library, so you will not be able to get rid of it simply by recompiling Open MPI.

George.


On Thu, Apr 27, 2017 at 8:21 PM, Gallardo, Esthela <***@miners.utep.edu<mailto:***@miners.utep.edu>> wrote:
Hello,

I am currently running a couple of benchmarks on two Intel Xeon Phi 7250 second-generation KNL MIC compute nodes using Open MPI 2.1.0. While trying to run the osu_bcast benchmark with 8 MPI tasks (4 on each node), I noticed the following error in my output:

Received eager message(s) ptype=0x1 opcode=0xcc from an unknown process (err=49)

I have tried running the benchmark in the following manners:
mpirun -np 8 ./osu_bcast
mpirun -np 8 -hostfile hosti --npernode 4 ./osu_bcast
mpirun -np 8 -hostfile hosti --npernode 4 --mca mtl psm2 ./osu_bcast

But, nothing changes the error message at the end. Note, that the error does not really impact the results of the benchmark, so it’s possible that the error may be occurring in MPI_Finalize.

Also, in order to try to avoid getting this error, I tried to build the library with both of these configurations:
./configure --prefix=<path_to_build_folder> CC=icc CXX=icpc FC=ifort CFLAGS=-xCORE-AVX2 -axMIC-AVX512 CXXFLAGS=-xCORE-AVX2 -axMIC-AVX512 FFLAGS=-xCORE-AVX2 -axMIC-AVX512 LDFLAGS=-xCORE-AVX2 -axMIC-AVX512

./configure --prefix=<path_to_build_folder> —enable-orterun-prefix-by-default —with-cma=yes --with-psm2 CC=icc CXX=icpc FC=ifort --disable-shared --enable-static --without-slurm

However, this did not help prevent the occurrence of the error either. I was wondering if anyone has encountered this issue before, and what can be done in order to get rid of the error message.

Thank you,

Esthela Gallardo



_______________________________________________
users mailing list
***@lists.open-mpi.org<mailto:***@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Loading...