Discussion:
[OMPI users] Failed to create a queue pair (QP) error
Ilchenko Evgeniy
2017-03-26 03:33:09 UTC
Permalink
Hi all!

I install older version openmpi 1.8 and get other error. For command

mpirun -np 1 prog

I get next output:

--------------------------------------------------------------------------
WARNING: It appears that your OpenFabrics subsystem is configured to only
allow registering part of your physical memory. This can cause MPI jobs to
run with erratic performance, hang, and/or crash.

This may be caused by your OpenFabrics vendor limiting the amount of
physical memory that can be registered. You should investigate the
relevant Linux kernel module parameters that control how much physical
memory can be registered, and increase them to allow registering all
physical memory on your machine.

See this Open MPI FAQ item for more information on these Linux kernel module
parameters:
http://www.open-mpi.org/faq/?category=openfabrics#ib-..

Local host: node107
Registerable memory: 32768 MiB
Total memory: 65459 MiB

Your MPI job will continue, but may be behave poorly and/or hang.
--------------------------------------------------------------------------
hello from 0
hello from 1
[node107:48993] 1 more process has sent help message help-mpi-
btl-openib.txt / reg mem limit low
[node107:48993] Set MCA parameter "orte_base_help_aggregate" to 0 to
see all help / error messages

Other installed soft (Intel MPI library) work fine, without any errors and
using all 64GB memory.

For OpenMPI I don't use any PBS manager (Torque, slurm, etc.), I work on
single node. I get to the node by command

ssh node107

For command

cat /etc/security/limits.conf

I get next output:

...
* soft rss 2000000
* soft stack 2000000
* hard stack unlimited
* soft data unlimited
* hard data unlimited
* soft memlock unlimited
* hard memlock unlimited
* soft nproc 10000
* hard nproc 10000
* soft nofile 10000
* hard nofile 10000
* hard cpu unlimited
* soft cpu unlimited
...

For command

cat /sys/module/mlx4_core/parameters/log_num_mtt

I get output:

0

Command:

cat /sys/module/mlx4_core/parameters/log_mtts_per_seg

output:

3

Command:

getconf PAGESIZE

output:

4096

With this params and by formula

max_reg_mem = (2^log_num_mtt) * (2^log_mtts_per_seg) * PAGE_SIZE

max_reg_mem = 32768 bytes, nor 32GB, how specified in openmpi warning.

I think that the cause of errors for different versions (1.8 and 2.1 ) is
the same...

What is the reason for this?

What programs or settings may restrict memory for openmpi?
Gilles Gouaillardet
2017-03-26 04:51:35 UTC
Permalink
Iirc, there used to be a bug in Open MPI leading to such a false positive,
but I cannot remember the details.
I recommend you use at least the latest 1.10 (which is really a 1.8 + a few
more features and several bug fixes)
An other option is to simply +1 a mtt parameter and see if it helps

Cheers,

Gilles
Post by Ilchenko Evgeniy
Hi all!
I install older version openmpi 1.8 and get other error. For command
mpirun -np 1 prog
--------------------------------------------------------------------------
WARNING: It appears that your OpenFabrics subsystem is configured to only
allow registering part of your physical memory. This can cause MPI jobs to
run with erratic performance, hang, and/or crash.
This may be caused by your OpenFabrics vendor limiting the amount of
physical memory that can be registered. You should investigate the
relevant Linux kernel module parameters that control how much physical
memory can be registered, and increase them to allow registering all
physical memory on your machine.
See this Open MPI FAQ item for more information on these Linux kernel module
http://www.open-mpi.org/faq/?category=openfabrics#ib-..
Local host: node107
Registerable memory: 32768 MiB
Total memory: 65459 MiB
Your MPI job will continue, but may be behave poorly and/or hang.
--------------------------------------------------------------------------
hello from 0
hello from 1
[node107:48993] 1 more process has sent help message help-mpi- btl-openib.txt / reg mem limit low
[node107:48993] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
Other installed soft (Intel MPI library) work fine, without any errors and
using all 64GB memory.
For OpenMPI I don't use any PBS manager (Torque, slurm, etc.), I work on
single node. I get to the node by command
ssh node107
For command
cat /etc/security/limits.conf
...
* soft rss 2000000
* soft stack 2000000
* hard stack unlimited
* soft data unlimited
* hard data unlimited
* soft memlock unlimited
* hard memlock unlimited
* soft nproc 10000
* hard nproc 10000
* soft nofile 10000
* hard nofile 10000
* hard cpu unlimited
* soft cpu unlimited
...
For command
cat /sys/module/mlx4_core/parameters/log_num_mtt
0
cat /sys/module/mlx4_core/parameters/log_mtts_per_seg
3
getconf PAGESIZE
4096
With this params and by formula
max_reg_mem = (2^log_num_mtt) * (2^log_mtts_per_seg) * PAGE_SIZE
max_reg_mem = 32768 bytes, nor 32GB, how specified in openmpi warning.
I think that the cause of errors for different versions (1.8 and 2.1 ) is
the same...
What is the reason for this?
What programs or settings may restrict memory for openmpi?
Ilchenko Evgeniy
2017-03-29 17:04:39 UTC
Permalink
Hi!

I install OpenMPI version 1.10.6,
but I get other problems.

I build OpenMPI with java-bindings (enable-mpi-java),
but Segmentation Fault error ocurred randomly,
even for programs without any communications
(just MPI.init and MPI.finalize).

My test program in application.
This program fail with segfault on random iteration (100-300 usually),
even for single mpi-process (mpirun -np 1).
I don't use any arguments for mpirun or for java (only path for java-class).

Any ideas?
Ilchenko Evgeniy
2017-03-31 14:37:14 UTC
Permalink
Hi all!

Is there a version of OpenMPI without
java-bindings problems (random segfault problem)?

Loading...