Discussion:
[OMPI users] OSHMEM: shmem_ptr always returns NULL
marcin.krotkiewski
2018-04-18 08:01:43 UTC
Permalink
Hi,

I'm running the below example from the OpenMPI documentation:

#include <mpp/shmem.h>
#include <stdio.h>

main()
{
  static int bigd[100];
  int *ptr;
  int i;
  shmem_init();
  if (shmem_my_pe() == 0) {
    /* initialize PE 1’s bigd array */
    ptr = shmem_ptr(bigd, 1);
    if(!ptr){
      fprintf(stderr, "get external pointer failed!\n");
      shmem_global_exit(-1);
    }
    for (i=0; i<100; i++)
      *ptr++ = i+1;
  }
  shmem_barrier_all();
  if (shmem_my_pe() == 1) {
    printf("bigd on PE 1 is:\n");
    for (i=0; i<100; i++)
      printf(" %d\n",bigd[i]);
    printf("\n");
  }
}

but shmem_ptr always returns NULL for me. I tried with OpenMPI versions
from 2.0.1 up to 3.1.0rc4, compiled with HPCX 2.1, running on a
ConnectX-4 system. This is the command line:

$ shmemrun -mca spml ucx -mca spml_base_verbose 100 -np 2 -map-by node
-report-bindings ./a.out

[c11-1:36505] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]:
[BB/../../../../../../../../../../../../../../..][../../../../../../../../../../../../../../../..]
[c11-2:105580] MCW rank 1 bound to socket 0[core 0[hwt 0-1]]:
[BB/../../../../../../../../../../../../../../..][../../../../../../../../../../../../../../../..]
[c11-1:36522] mca: base: components_register: registering framework spml
components
[c11-1:36522] mca: base: components_register: found loaded component ucx
[c11-1:36522] mca: base: components_register: component ucx register
function successful
[c11-1:36522] mca: base: components_open: opening spml components
[c11-1:36522] mca: base: components_open: found loaded component ucx
[c11-2:105590] mca: base: components_register: registering framework
spml components
[c11-2:105590] mca: base: components_register: found loaded component ucx
[c11-2:105590] mca: base: components_register: component ucx register
function successful
[c11-2:105590] mca: base: components_open: opening spml components
[c11-2:105590] mca: base: components_open: found loaded component ucx
[c11-1:36522] mca: base: components_open: component ucx open function
successful
[c11-2:105590] mca: base: components_open: component ucx open function
successful
[c11-1:36522] base/spml_base_select.c:107 - mca_spml_base_select()
select: initializing spml component ucx
[c11-1:36522] spml_ucx_component.c:173 - mca_spml_ucx_component_init()
in ucx, my priority is 21
[c11-2:105590] base/spml_base_select.c:107 - mca_spml_base_select()
select: initializing spml component ucx
[c11-2:105590] spml_ucx_component.c:173 - mca_spml_ucx_component_init()
in ucx, my priority is 21
[c11-1:36522] spml_ucx_component.c:184 - mca_spml_ucx_component_init()
*** ucx initialized ****
[c11-1:36522] base/spml_base_select.c:119 - mca_spml_base_select()
select: init returned priority 21
[c11-1:36522] base/spml_base_select.c:160 - mca_spml_base_select()
selected ucx best priority 21
[c11-1:36522] base/spml_base_select.c:194 - mca_spml_base_select()
select: component ucx selected
[c11-1:36522] spml_ucx.c:82 - mca_spml_ucx_enable() *** ucx ENABLED ****
[c11-2:105590] spml_ucx_component.c:184 - mca_spml_ucx_component_init()
*** ucx initialized ****
[c11-2:105590] base/spml_base_select.c:119 - mca_spml_base_select()
select: init returned priority 21
[c11-2:105590] base/spml_base_select.c:160 - mca_spml_base_select()
selected ucx best priority 21
[c11-2:105590] base/spml_base_select.c:194 - mca_spml_base_select()
select: component ucx selected
[c11-2:105590] spml_ucx.c:82 - mca_spml_ucx_enable() *** ucx ENABLED ****
[c11-1:36522] spml_ucx.c:305 - mca_spml_ucx_add_procs() *** ADDED PROCS ***
[c11-2:105590] spml_ucx.c:305 - mca_spml_ucx_add_procs() *** ADDED PROCS ***
shared_mr flags are not supported
shared_mr flags are not supported
get external pointer failed!


So it looks like everything is fine - maybe except the 'shared_mr flags
are not supported' message.

Does anyone have ideas why I get NULL? The same happens if I start two
ranks on the same compute node, and if I use shmem_malloc'ed pointer
instead of a static array.

Thank you,

Marcin
Joshua Ladd
2018-06-01 19:16:48 UTC
Permalink
Hi, Marcin

Sorry for the late response (somehow this one got lost in the clutter). We
added support for shmem_ptr in the UCX SPML in Open MPI 3.0. However, in
order to use it, you must install the Knem kernel module (
https://github.com/hjelmn/xpmem).

Best,

Josh

On Wed, Apr 18, 2018 at 4:01 AM, marcin.krotkiewski <
Post by marcin.krotkiewski
Hi,
#include <mpp/shmem.h>
#include <stdio.h>
main()
{
static int bigd[100];
int *ptr;
int i;
shmem_init();
if (shmem_my_pe() == 0) {
/* initialize PE 1’s bigd array */
ptr = shmem_ptr(bigd, 1);
if(!ptr){
fprintf(stderr, "get external pointer failed!\n");
shmem_global_exit(-1);
}
for (i=0; i<100; i++)
*ptr++ = i+1;
}
shmem_barrier_all();
if (shmem_my_pe() == 1) {
printf("bigd on PE 1 is:\n");
for (i=0; i<100; i++)
printf(" %d\n",bigd[i]);
printf("\n");
}
}
but shmem_ptr always returns NULL for me. I tried with OpenMPI versions
from 2.0.1 up to 3.1.0rc4, compiled with HPCX 2.1, running on a ConnectX-4
$ shmemrun -mca spml ucx -mca spml_base_verbose 100 -np 2 -map-by node
-report-bindings ./a.out
[BB/../../../../../../../../../../../../../../..][../../../.
./../../../../../../../../../../../..]
[BB/../../../../../../../../../../../../../../..][../../../.
./../../../../../../../../../../../..]
[c11-1:36522] mca: base: components_register: registering framework spml
components
[c11-1:36522] mca: base: components_register: found loaded component ucx
[c11-1:36522] mca: base: components_register: component ucx register
function successful
[c11-1:36522] mca: base: components_open: opening spml components
[c11-1:36522] mca: base: components_open: found loaded component ucx
[c11-2:105590] mca: base: components_register: registering framework spml
components
[c11-2:105590] mca: base: components_register: found loaded component ucx
[c11-2:105590] mca: base: components_register: component ucx register
function successful
[c11-2:105590] mca: base: components_open: opening spml components
[c11-2:105590] mca: base: components_open: found loaded component ucx
[c11-1:36522] mca: base: components_open: component ucx open function
successful
[c11-2:105590] mca: base: components_open: component ucx open function
successful
initializing spml component ucx
[c11-1:36522] spml_ucx_component.c:173 - mca_spml_ucx_component_init() in
ucx, my priority is 21
[c11-2:105590] base/spml_base_select.c:107 - mca_spml_base_select()
select: initializing spml component ucx
[c11-2:105590] spml_ucx_component.c:173 - mca_spml_ucx_component_init() in
ucx, my priority is 21
[c11-1:36522] spml_ucx_component.c:184 - mca_spml_ucx_component_init() ***
ucx initialized ****
init returned priority 21
[c11-1:36522] base/spml_base_select.c:160 - mca_spml_base_select()
selected ucx best priority 21
component ucx selected
[c11-1:36522] spml_ucx.c:82 - mca_spml_ucx_enable() *** ucx ENABLED ****
[c11-2:105590] spml_ucx_component.c:184 - mca_spml_ucx_component_init()
*** ucx initialized ****
[c11-2:105590] base/spml_base_select.c:119 - mca_spml_base_select()
select: init returned priority 21
[c11-2:105590] base/spml_base_select.c:160 - mca_spml_base_select()
selected ucx best priority 21
[c11-2:105590] base/spml_base_select.c:194 - mca_spml_base_select()
select: component ucx selected
[c11-2:105590] spml_ucx.c:82 - mca_spml_ucx_enable() *** ucx ENABLED ****
[c11-1:36522] spml_ucx.c:305 - mca_spml_ucx_add_procs() *** ADDED PROCS ***
[c11-2:105590] spml_ucx.c:305 - mca_spml_ucx_add_procs() *** ADDED PROCS ***
shared_mr flags are not supported
shared_mr flags are not supported
get external pointer failed!
So it looks like everything is fine - maybe except the 'shared_mr flags
are not supported' message.
Does anyone have ideas why I get NULL? The same happens if I start two
ranks on the same compute node, and if I use shmem_malloc'ed pointer
instead of a static array.
Thank you,
Marcin
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Joshua Ladd
2018-06-01 19:19:00 UTC
Permalink
**xpmem kernel module.
Post by Joshua Ladd
Hi, Marcin
Sorry for the late response (somehow this one got lost in the clutter). We
added support for shmem_ptr in the UCX SPML in Open MPI 3.0. However, in
order to use it, you must install the Knem kernel module (
https://github.com/hjelmn/xpmem).
Best,
Josh
On Wed, Apr 18, 2018 at 4:01 AM, marcin.krotkiewski <
Post by marcin.krotkiewski
Hi,
#include <mpp/shmem.h>
#include <stdio.h>
main()
{
static int bigd[100];
int *ptr;
int i;
shmem_init();
if (shmem_my_pe() == 0) {
/* initialize PE 1’s bigd array */
ptr = shmem_ptr(bigd, 1);
if(!ptr){
fprintf(stderr, "get external pointer failed!\n");
shmem_global_exit(-1);
}
for (i=0; i<100; i++)
*ptr++ = i+1;
}
shmem_barrier_all();
if (shmem_my_pe() == 1) {
printf("bigd on PE 1 is:\n");
for (i=0; i<100; i++)
printf(" %d\n",bigd[i]);
printf("\n");
}
}
but shmem_ptr always returns NULL for me. I tried with OpenMPI versions
from 2.0.1 up to 3.1.0rc4, compiled with HPCX 2.1, running on a ConnectX-4
$ shmemrun -mca spml ucx -mca spml_base_verbose 100 -np 2 -map-by node
-report-bindings ./a.out
[BB/../../../../../../../../../../../../../../..][../../../.
./../../../../../../../../../../../..]
[BB/../../../../../../../../../../../../../../..][../../../.
./../../../../../../../../../../../..]
[c11-1:36522] mca: base: components_register: registering framework spml
components
[c11-1:36522] mca: base: components_register: found loaded component ucx
[c11-1:36522] mca: base: components_register: component ucx register
function successful
[c11-1:36522] mca: base: components_open: opening spml components
[c11-1:36522] mca: base: components_open: found loaded component ucx
[c11-2:105590] mca: base: components_register: registering framework spml
components
[c11-2:105590] mca: base: components_register: found loaded component ucx
[c11-2:105590] mca: base: components_register: component ucx register
function successful
[c11-2:105590] mca: base: components_open: opening spml components
[c11-2:105590] mca: base: components_open: found loaded component ucx
[c11-1:36522] mca: base: components_open: component ucx open function
successful
[c11-2:105590] mca: base: components_open: component ucx open function
successful
[c11-1:36522] base/spml_base_select.c:107 - mca_spml_base_select()
select: initializing spml component ucx
[c11-1:36522] spml_ucx_component.c:173 - mca_spml_ucx_component_init() in
ucx, my priority is 21
[c11-2:105590] base/spml_base_select.c:107 - mca_spml_base_select()
select: initializing spml component ucx
[c11-2:105590] spml_ucx_component.c:173 - mca_spml_ucx_component_init()
in ucx, my priority is 21
[c11-1:36522] spml_ucx_component.c:184 - mca_spml_ucx_component_init()
*** ucx initialized ****
[c11-1:36522] base/spml_base_select.c:119 - mca_spml_base_select()
select: init returned priority 21
[c11-1:36522] base/spml_base_select.c:160 - mca_spml_base_select()
selected ucx best priority 21
[c11-1:36522] base/spml_base_select.c:194 - mca_spml_base_select()
select: component ucx selected
[c11-1:36522] spml_ucx.c:82 - mca_spml_ucx_enable() *** ucx ENABLED ****
[c11-2:105590] spml_ucx_component.c:184 - mca_spml_ucx_component_init()
*** ucx initialized ****
[c11-2:105590] base/spml_base_select.c:119 - mca_spml_base_select()
select: init returned priority 21
[c11-2:105590] base/spml_base_select.c:160 - mca_spml_base_select()
selected ucx best priority 21
[c11-2:105590] base/spml_base_select.c:194 - mca_spml_base_select()
select: component ucx selected
[c11-2:105590] spml_ucx.c:82 - mca_spml_ucx_enable() *** ucx ENABLED ****
[c11-1:36522] spml_ucx.c:305 - mca_spml_ucx_add_procs() *** ADDED PROCS ***
[c11-2:105590] spml_ucx.c:305 - mca_spml_ucx_add_procs() *** ADDED PROCS ***
shared_mr flags are not supported
shared_mr flags are not supported
get external pointer failed!
So it looks like everything is fine - maybe except the 'shared_mr flags
are not supported' message.
Does anyone have ideas why I get NULL? The same happens if I start two
ranks on the same compute node, and if I use shmem_malloc'ed pointer
instead of a static array.
Thank you,
Marcin
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Jeff Hammond
2018-06-02 03:07:04 UTC
Permalink
Why do you need kernel support for interprocess shared memory? Just
allocate the symmetric heap as shared-memory. Sure, this does not support
other symmetric variables, but shmem_ptr can detect that and return NULL
for those cases.

shmem_ptr should behave similar to MPI_Win_shared_query...

Jeff
Post by Joshua Ladd
**xpmem kernel module.
Post by Joshua Ladd
Hi, Marcin
Sorry for the late response (somehow this one got lost in the clutter).
We added support for shmem_ptr in the UCX SPML in Open MPI 3.0. However, in
order to use it, you must install the Knem kernel module (
https://github.com/hjelmn/xpmem).
Best,
Josh
On Wed, Apr 18, 2018 at 4:01 AM, marcin.krotkiewski <
Post by marcin.krotkiewski
Hi,
#include <mpp/shmem.h>
#include <stdio.h>
main()
{
static int bigd[100];
int *ptr;
int i;
shmem_init();
if (shmem_my_pe() == 0) {
/* initialize PE 1’s bigd array */
ptr = shmem_ptr(bigd, 1);
if(!ptr){
fprintf(stderr, "get external pointer failed!\n");
shmem_global_exit(-1);
}
for (i=0; i<100; i++)
*ptr++ = i+1;
}
shmem_barrier_all();
if (shmem_my_pe() == 1) {
printf("bigd on PE 1 is:\n");
for (i=0; i<100; i++)
printf(" %d\n",bigd[i]);
printf("\n");
}
}
but shmem_ptr always returns NULL for me. I tried with OpenMPI versions
from 2.0.1 up to 3.1.0rc4, compiled with HPCX 2.1, running on a ConnectX-4
$ shmemrun -mca spml ucx -mca spml_base_verbose 100 -np 2 -map-by node
-report-bindings ./a.out
[BB/../../../../../../../../../../../../../../..][../../../.
./../../../../../../../../../../../..]
[BB/../../../../../../../../../../../../../../..][../../../.
./../../../../../../../../../../../..]
[c11-1:36522] mca: base: components_register: registering framework spml
components
[c11-1:36522] mca: base: components_register: found loaded component ucx
[c11-1:36522] mca: base: components_register: component ucx register
function successful
[c11-1:36522] mca: base: components_open: opening spml components
[c11-1:36522] mca: base: components_open: found loaded component ucx
[c11-2:105590] mca: base: components_register: registering framework
spml components
[c11-2:105590] mca: base: components_register: found loaded component ucx
[c11-2:105590] mca: base: components_register: component ucx register
function successful
[c11-2:105590] mca: base: components_open: opening spml components
[c11-2:105590] mca: base: components_open: found loaded component ucx
[c11-1:36522] mca: base: components_open: component ucx open function
successful
[c11-2:105590] mca: base: components_open: component ucx open function
successful
[c11-1:36522] base/spml_base_select.c:107 - mca_spml_base_select()
select: initializing spml component ucx
[c11-1:36522] spml_ucx_component.c:173 - mca_spml_ucx_component_init()
in ucx, my priority is 21
[c11-2:105590] base/spml_base_select.c:107 - mca_spml_base_select()
select: initializing spml component ucx
[c11-2:105590] spml_ucx_component.c:173 - mca_spml_ucx_component_init()
in ucx, my priority is 21
[c11-1:36522] spml_ucx_component.c:184 - mca_spml_ucx_component_init()
*** ucx initialized ****
[c11-1:36522] base/spml_base_select.c:119 - mca_spml_base_select()
select: init returned priority 21
[c11-1:36522] base/spml_base_select.c:160 - mca_spml_base_select()
selected ucx best priority 21
[c11-1:36522] base/spml_base_select.c:194 - mca_spml_base_select()
select: component ucx selected
[c11-1:36522] spml_ucx.c:82 - mca_spml_ucx_enable() *** ucx ENABLED ****
[c11-2:105590] spml_ucx_component.c:184 - mca_spml_ucx_component_init()
*** ucx initialized ****
[c11-2:105590] base/spml_base_select.c:119 - mca_spml_base_select()
select: init returned priority 21
[c11-2:105590] base/spml_base_select.c:160 - mca_spml_base_select()
selected ucx best priority 21
[c11-2:105590] base/spml_base_select.c:194 - mca_spml_base_select()
select: component ucx selected
[c11-2:105590] spml_ucx.c:82 - mca_spml_ucx_enable() *** ucx ENABLED ****
[c11-1:36522] spml_ucx.c:305 - mca_spml_ucx_add_procs() *** ADDED PROCS ***
[c11-2:105590] spml_ucx.c:305 - mca_spml_ucx_add_procs() *** ADDED PROCS ***
shared_mr flags are not supported
shared_mr flags are not supported
get external pointer failed!
So it looks like everything is fine - maybe except the 'shared_mr flags
are not supported' message.
Does anyone have ideas why I get NULL? The same happens if I start two
ranks on the same compute node, and if I use shmem_malloc'ed pointer
instead of a static array.
Thank you,
Marcin
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
--
Jeff Hammond
***@gmail.com
http://jeffhammond.github.io/
Loading...