Discussion:
[OMPI users] MPI-3 RMA on Cray XC40
Nathan Hjelm
2018-05-10 01:24:24 UTC
Permalink
Thanks for confirming that it works for you as well. I have a PR open on v3.1.x that brings osc/rdma up to date with master. I will also be bringing some code that greatly improves the multi-threaded RMA performance on Aries systems (at least with benchmarks— github.com/hpc/rma-mt). That will not make it into v3.1.x but will be in v4.0.0.

-Nathan
Nathan,
Thank you, I can confirm that it works as expected with master on our system. I will stick to this version then until 3.1.1 is out.
Joseph
Looks like it doesn't fail with master so at some point I fixed this bug. The current plan is to bring all the master changes into v3.1.1. This includes a number of bug fixes.
-Nathan
Nathan,
Thanks for looking into that. My test program is attached.
Best
Joseph
I will take a look today. Can you send me your test program?
-Nathan
All,
I have been experimenting with using Open MPI 3.1.0 on our Cray XC40 (Haswell-based nodes, Aries interconnect) for multi-threaded MPI RMA. Unfortunately, a simple (single-threaded) test case consisting of two processes performing an MPI_Rget+MPI_Wait hangs when running on two nodes. It succeeds if both processes run on a single node.
```
# this seems necessary to avoid a linker error during build
export CRAYPE_LINK_TYPE=dynamic
module swap PrgEnv-cray PrgEnv-intel
module sw craype-haswell craype-sandybridge
module unload craype-hugepages16M
module unload cray-mpich
```
```
mpirun --mca btl_base_verbose 100 --mca btl ^tcp -n 2 -N 1 ./mpi_test_loop
[nid03060:36184] mca: base: components_register: registering framework btl components
[nid03060:36184] mca: base: components_register: found loaded component self
[nid03060:36184] mca: base: components_register: component self register function successful
[nid03060:36184] mca: base: components_register: found loaded component sm
[nid03061:36208] mca: base: components_register: registering framework btl components
[nid03061:36208] mca: base: components_register: found loaded component self
[nid03060:36184] mca: base: components_register: found loaded component ugni
[nid03061:36208] mca: base: components_register: component self register function successful
[nid03061:36208] mca: base: components_register: found loaded component sm
[nid03061:36208] mca: base: components_register: found loaded component ugni
[nid03060:36184] mca: base: components_register: component ugni register function successful
[nid03060:36184] mca: base: components_register: found loaded component vader
[nid03061:36208] mca: base: components_register: component ugni register function successful
[nid03061:36208] mca: base: components_register: found loaded component vader
[nid03060:36184] mca: base: components_register: component vader register function successful
[nid03060:36184] mca: base: components_open: opening btl components
[nid03060:36184] mca: base: components_open: found loaded component self
[nid03060:36184] mca: base: components_open: component self open function successful
[nid03060:36184] mca: base: components_open: found loaded component ugni
[nid03060:36184] mca: base: components_open: component ugni open function successful
[nid03060:36184] mca: base: components_open: found loaded component vader
[nid03060:36184] mca: base: components_open: component vader open function successful
[nid03060:36184] select: initializing btl component self
[nid03060:36184] select: init of component self returned success
[nid03060:36184] select: initializing btl component ugni
[nid03061:36208] mca: base: components_register: component vader register function successful
[nid03061:36208] mca: base: components_open: opening btl components
[nid03061:36208] mca: base: components_open: found loaded component self
[nid03061:36208] mca: base: components_open: component self open function successful
[nid03061:36208] mca: base: components_open: found loaded component ugni
[nid03061:36208] mca: base: components_open: component ugni open function successful
[nid03061:36208] mca: base: components_open: found loaded component vader
[nid03061:36208] mca: base: components_open: component vader open function successful
[nid03061:36208] select: initializing btl component self
[nid03061:36208] select: init of component self returned success
[nid03061:36208] select: initializing btl component ugni
[nid03061:36208] select: init of component ugni returned success
[nid03061:36208] select: initializing btl component vader
[nid03061:36208] select: init of component vader returned failure
[nid03061:36208] mca: base: close: component vader closed
[nid03061:36208] mca: base: close: unloading component vader
[nid03060:36184] select: init of component ugni returned success
[nid03060:36184] select: initializing btl component vader
[nid03060:36184] select: init of component vader returned failure
[nid03060:36184] mca: base: close: component vader closed
[nid03060:36184] mca: base: close: unloading component vader
[nid03061:36208] mca: bml: Using self btl for send to [[54630,1],1] on node nid03061
[nid03060:36184] mca: bml: Using self btl for send to [[54630,1],0] on node nid03060
[nid03061:36208] mca: bml: Using ugni btl for send to [[54630,1],0] on node (null)
[nid03060:36184] mca: bml: Using ugni btl for send to [[54630,1],1] on node (null)
```
It looks like the UGNI btl is being initialized correctly but then fails to find the node to communicate with? Is there a way to get more information? There doesn't seem to be an MCA parameter to increase verbosity specifically of the UGNI btl.
Any help would be appreciated!
Cheers
Joseph
<config.log.tgz>
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Joseph Schuchart
2018-05-11 13:22:58 UTC
Permalink
Nathan,

That is good news! Are the improvements that are scheduled for 4.0.0
already stable enough to be tested? I'd be interested in trying them to
see whether and how they affect our use-cases.

Also, thanks for pointing me to the RMA-MT benchmark suite, I wasn't
aware of that project. I looked at the latency benchmarks and found that
request-based transfer completion (using MPI_Rget+MPI_Wait/MPI_Test) is
not covered. Would it make sense to add these cases or are there already
plans to add them? The overhead of transfers with individual
synchronization from multiple threads are of particular interest for my
use-case. I'd be happy to contribute to the RMA-BT benchmarks if that
was useful.

Thanks
Joseph
Thanks for confirming that it works for you as well. I have a PR open on v3.1.x that brings osc/rdma up to date with master. I will also be bringing some code that greatly improves the multi-threaded RMA performance on Aries systems (at least with benchmarks— github.com/hpc/rma-mt). That will not make it into v3.1.x but will be in v4.0.0.
-Nathan
Nathan,
Thank you, I can confirm that it works as expected with master on our system. I will stick to this version then until 3.1.1 is out.
Joseph
Looks like it doesn't fail with master so at some point I fixed this bug. The current plan is to bring all the master changes into v3.1.1. This includes a number of bug fixes.
-Nathan
Nathan,
Thanks for looking into that. My test program is attached.
Best
Joseph
I will take a look today. Can you send me your test program?
-Nathan
All,
I have been experimenting with using Open MPI 3.1.0 on our Cray XC40 (Haswell-based nodes, Aries interconnect) for multi-threaded MPI RMA. Unfortunately, a simple (single-threaded) test case consisting of two processes performing an MPI_Rget+MPI_Wait hangs when running on two nodes. It succeeds if both processes run on a single node.
```
# this seems necessary to avoid a linker error during build
export CRAYPE_LINK_TYPE=dynamic
module swap PrgEnv-cray PrgEnv-intel
module sw craype-haswell craype-sandybridge
module unload craype-hugepages16M
module unload cray-mpich
```
```
mpirun --mca btl_base_verbose 100 --mca btl ^tcp -n 2 -N 1 ./mpi_test_loop
[nid03060:36184] mca: base: components_register: registering framework btl components
[nid03060:36184] mca: base: components_register: found loaded component self
[nid03060:36184] mca: base: components_register: component self register function successful
[nid03060:36184] mca: base: components_register: found loaded component sm
[nid03061:36208] mca: base: components_register: registering framework btl components
[nid03061:36208] mca: base: components_register: found loaded component self
[nid03060:36184] mca: base: components_register: found loaded component ugni
[nid03061:36208] mca: base: components_register: component self register function successful
[nid03061:36208] mca: base: components_register: found loaded component sm
[nid03061:36208] mca: base: components_register: found loaded component ugni
[nid03060:36184] mca: base: components_register: component ugni register function successful
[nid03060:36184] mca: base: components_register: found loaded component vader
[nid03061:36208] mca: base: components_register: component ugni register function successful
[nid03061:36208] mca: base: components_register: found loaded component vader
[nid03060:36184] mca: base: components_register: component vader register function successful
[nid03060:36184] mca: base: components_open: opening btl components
[nid03060:36184] mca: base: components_open: found loaded component self
[nid03060:36184] mca: base: components_open: component self open function successful
[nid03060:36184] mca: base: components_open: found loaded component ugni
[nid03060:36184] mca: base: components_open: component ugni open function successful
[nid03060:36184] mca: base: components_open: found loaded component vader
[nid03060:36184] mca: base: components_open: component vader open function successful
[nid03060:36184] select: initializing btl component self
[nid03060:36184] select: init of component self returned success
[nid03060:36184] select: initializing btl component ugni
[nid03061:36208] mca: base: components_register: component vader register function successful
[nid03061:36208] mca: base: components_open: opening btl components
[nid03061:36208] mca: base: components_open: found loaded component self
[nid03061:36208] mca: base: components_open: component self open function successful
[nid03061:36208] mca: base: components_open: found loaded component ugni
[nid03061:36208] mca: base: components_open: component ugni open function successful
[nid03061:36208] mca: base: components_open: found loaded component vader
[nid03061:36208] mca: base: components_open: component vader open function successful
[nid03061:36208] select: initializing btl component self
[nid03061:36208] select: init of component self returned success
[nid03061:36208] select: initializing btl component ugni
[nid03061:36208] select: init of component ugni returned success
[nid03061:36208] select: initializing btl component vader
[nid03061:36208] select: init of component vader returned failure
[nid03061:36208] mca: base: close: component vader closed
[nid03061:36208] mca: base: close: unloading component vader
[nid03060:36184] select: init of component ugni returned success
[nid03060:36184] select: initializing btl component vader
[nid03060:36184] select: init of component vader returned failure
[nid03060:36184] mca: base: close: component vader closed
[nid03060:36184] mca: base: close: unloading component vader
[nid03061:36208] mca: bml: Using self btl for send to [[54630,1],1] on node nid03061
[nid03060:36184] mca: bml: Using self btl for send to [[54630,1],0] on node nid03060
[nid03061:36208] mca: bml: Using ugni btl for send to [[54630,1],0] on node (null)
[nid03060:36184] mca: bml: Using ugni btl for send to [[54630,1],1] on node (null)
```
It looks like the UGNI btl is being initialized correctly but then fails to find the node to communicate with? Is there a way to get more information? There doesn't seem to be an MCA parameter to increase verbosity specifically of the UGNI btl.
Any help would be appreciated!
Cheers
Joseph
<config.log.tgz>
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Joseph Schuchart
2018-05-17 09:50:48 UTC
Permalink
Nathan,

I am trying to track down some memory corruption that leads to crashes
in my application running on the Cray system using Open MPI
(git-6093f2d). Valgrind reports quite some invalid reads and writes
inside Open MPI when running the benchmark that I sent you earlier.

There are plenty of invalid reads in MPI_Init and MPI_Win_allocate.
Valgrind also reports some invalid writes during communication:

```
==42751== Invalid write of size 8
==42751== at 0x94C647D: GNII_POST_FMA_GET (in
/opt/cray/ugni/6.0.14-6.0.5.0_16.9__g19583bb.ari/lib64/libugni.so.0.6.0)
==42751== by 0x94C8D74: GNI_PostFma (in
/opt/cray/ugni/6.0.14-6.0.5.0_16.9__g19583bb.ari/lib64/libugni.so.0.6.0)
==42751== by 0x10FA21D0: mca_btl_ugni_get (in
/zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_btl_ugni.so)
==42751== by 0x134AF6C5: ompi_osc_get_data_blocking (in
/zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_osc_rdma.so)
==42751== by 0x134D0CC4: ompi_osc_rdma_peer_lookup (in
/zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_osc_rdma.so)
==42751== by 0x134B4A1F: ompi_osc_rdma_rget (in
/zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_osc_rdma.so)
==42751== by 0x46C1D52: PMPI_Rget (in
/zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libmpi.so.0.0.0)
==42751== by 0x20001EA9: main (in
/zhome/academic/HLRS/hlrs/hpcjschu/src/test/mpi_test_loop)
==42751== Address 0x2aaaaabc0000 is not stack'd, malloc'd or (recently)
free'd

==42751== Invalid write of size 8
==42751== at 0x94D76BC: GNII_SmsgSend (in
/opt/cray/ugni/6.0.14-6.0.5.0_16.9__g19583bb.ari/lib64/libugni.so.0.6.0)
==42751== by 0x94D9D5C: GNI_SmsgSendWTag (in
/opt/cray/ugni/6.0.14-6.0.5.0_16.9__g19583bb.ari/lib64/libugni.so.0.6.0)
==42751== by 0x10F9D9E6: mca_btl_ugni_sendi (in
/zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_btl_ugni.so)
==42751== by 0x11BE5DDF: mca_pml_ob1_isend (in
/zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_pml_ob1.so)
==42751== by 0x1201DC40: NBC_Progress (in
/zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_coll_libnbc.so)
==42751== by 0x1201DC91: NBC_Progress (in
/zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_coll_libnbc.so)
==42751== by 0x1201C692: ompi_coll_libnbc_progress (in
/zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_coll_libnbc.so)
==42751== by 0x631A503: opal_progress (in
/zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
==42751== by 0x632111C: ompi_sync_wait_mt (in
/zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
==42751== by 0x4669A4C: ompi_comm_nextcid (in
/zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libmpi.so.0.0.0)
==42751== by 0x4667ECC: ompi_comm_dup_with_info (in
/zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libmpi.so.0.0.0)
==42751== by 0x134C15AE: ompi_osc_rdma_component_select (in
/zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_osc_rdma.so)
==42751== Address 0x2aaaaabaf000 is not stack'd, malloc'd or (recently)
free'd

And some write-after-free during MPI_Finalize:
==42751== Invalid write of size 8
==42751== at 0x6316E64: opal_rb_tree_delete (in
/zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
==42751== by 0x1076BA03: mca_mpool_hugepage_seg_free (in
/zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_mpool_hugepage.so)
==42751== by 0x1015EB33: mca_allocator_bucket_cleanup (in
/zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_allocator_bucket.so)
==42751== by 0x1015DF5C: mca_allocator_bucket_finalize (in
/zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_allocator_bucket.so)
==42751== by 0x1076BAE6: mca_mpool_hugepage_finalize (in
/zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_mpool_hugepage.so)
==42751== by 0x1076C202: mca_mpool_hugepage_close (in
/zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_mpool_hugepage.so)
==42751== by 0x633CED9: mca_base_component_close (in
/zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
==42751== by 0x633CE01: mca_base_components_close (in
/zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
==42751== by 0x63C6F31: mca_mpool_base_close (in
/zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
==42751== by 0x634AEF7: mca_base_framework_close (in
/zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
==42751== by 0x4687B6A: ompi_mpi_finalize (in
/zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libmpi.so.0.0.0)
==42751== by 0x20001F35: main (in
/zhome/academic/HLRS/hlrs/hpcjschu/src/test/mpi_test_loop)
==42751== Address 0xa3aa348 is 16,440 bytes inside a block of size
16,568 free'd
==42751== at 0x4428CDA: free (vg_replace_malloc.c:530)
==42751== by 0x630FED2: opal_free_list_destruct (in
/zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
==42751== by 0x63160C1: opal_rb_tree_destruct (in
/zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
==42751== by 0x1076BACE: mca_mpool_hugepage_finalize (in
/zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_mpool_hugepage.so)
==42751== by 0x1076C202: mca_mpool_hugepage_close (in
/zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_mpool_hugepage.so)
==42751== by 0x633CED9: mca_base_component_close (in
/zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
==42751== by 0x633CE01: mca_base_components_close (in
/zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
==42751== by 0x63C6F31: mca_mpool_base_close (in
/zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
==42751== by 0x634AEF7: mca_base_framework_close (in
/zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
==42751== by 0x4687B6A: ompi_mpi_finalize (in
/zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libmpi.so.0.0.0)
==42751== by 0x20001F35: main (in
/zhome/academic/HLRS/hlrs/hpcjschu/src/test/mpi_test_loop)
```

I'm not sure whether the invalid writes (and reads) during
initialization and communication are caused by Open MPI or uGNI itself
and whether they are critical (the addresses seem to be "special"). The
write-after-free in MPI_Finalize seems suspicious though. I cannot say
whether that causes the memory corruption I am seeing but I thought I
report it. I will dig further into this to try to figure out what causes
the crashes (they are not deterministically reproducible, unfortunately).

Cheers,
Joseph
Thanks for confirming that it works for you as well. I have a PR open on v3.1.x that brings osc/rdma up to date with master. I will also be bringing some code that greatly improves the multi-threaded RMA performance on Aries systems (at least with benchmarks— github.com/hpc/rma-mt). That will not make it into v3.1.x but will be in v4.0.0.
-Nathan
Nathan,
Thank you, I can confirm that it works as expected with master on our system. I will stick to this version then until 3.1.1 is out.
Joseph
Looks like it doesn't fail with master so at some point I fixed this bug. The current plan is to bring all the master changes into v3.1.1. This includes a number of bug fixes.
-Nathan
Nathan,
Thanks for looking into that. My test program is attached.
Best
Joseph
I will take a look today. Can you send me your test program?
-Nathan
All,
I have been experimenting with using Open MPI 3.1.0 on our Cray XC40 (Haswell-based nodes, Aries interconnect) for multi-threaded MPI RMA. Unfortunately, a simple (single-threaded) test case consisting of two processes performing an MPI_Rget+MPI_Wait hangs when running on two nodes. It succeeds if both processes run on a single node.
```
# this seems necessary to avoid a linker error during build
export CRAYPE_LINK_TYPE=dynamic
module swap PrgEnv-cray PrgEnv-intel
module sw craype-haswell craype-sandybridge
module unload craype-hugepages16M
module unload cray-mpich
```
```
mpirun --mca btl_base_verbose 100 --mca btl ^tcp -n 2 -N 1 ./mpi_test_loop
[nid03060:36184] mca: base: components_register: registering framework btl components
[nid03060:36184] mca: base: components_register: found loaded component self
[nid03060:36184] mca: base: components_register: component self register function successful
[nid03060:36184] mca: base: components_register: found loaded component sm
[nid03061:36208] mca: base: components_register: registering framework btl components
[nid03061:36208] mca: base: components_register: found loaded component self
[nid03060:36184] mca: base: components_register: found loaded component ugni
[nid03061:36208] mca: base: components_register: component self register function successful
[nid03061:36208] mca: base: components_register: found loaded component sm
[nid03061:36208] mca: base: components_register: found loaded component ugni
[nid03060:36184] mca: base: components_register: component ugni register function successful
[nid03060:36184] mca: base: components_register: found loaded component vader
[nid03061:36208] mca: base: components_register: component ugni register function successful
[nid03061:36208] mca: base: components_register: found loaded component vader
[nid03060:36184] mca: base: components_register: component vader register function successful
[nid03060:36184] mca: base: components_open: opening btl components
[nid03060:36184] mca: base: components_open: found loaded component self
[nid03060:36184] mca: base: components_open: component self open function successful
[nid03060:36184] mca: base: components_open: found loaded component ugni
[nid03060:36184] mca: base: components_open: component ugni open function successful
[nid03060:36184] mca: base: components_open: found loaded component vader
[nid03060:36184] mca: base: components_open: component vader open function successful
[nid03060:36184] select: initializing btl component self
[nid03060:36184] select: init of component self returned success
[nid03060:36184] select: initializing btl component ugni
[nid03061:36208] mca: base: components_register: component vader register function successful
[nid03061:36208] mca: base: components_open: opening btl components
[nid03061:36208] mca: base: components_open: found loaded component self
[nid03061:36208] mca: base: components_open: component self open function successful
[nid03061:36208] mca: base: components_open: found loaded component ugni
[nid03061:36208] mca: base: components_open: component ugni open function successful
[nid03061:36208] mca: base: components_open: found loaded component vader
[nid03061:36208] mca: base: components_open: component vader open function successful
[nid03061:36208] select: initializing btl component self
[nid03061:36208] select: init of component self returned success
[nid03061:36208] select: initializing btl component ugni
[nid03061:36208] select: init of component ugni returned success
[nid03061:36208] select: initializing btl component vader
[nid03061:36208] select: init of component vader returned failure
[nid03061:36208] mca: base: close: component vader closed
[nid03061:36208] mca: base: close: unloading component vader
[nid03060:36184] select: init of component ugni returned success
[nid03060:36184] select: initializing btl component vader
[nid03060:36184] select: init of component vader returned failure
[nid03060:36184] mca: base: close: component vader closed
[nid03060:36184] mca: base: close: unloading component vader
[nid03061:36208] mca: bml: Using self btl for send to [[54630,1],1] on node nid03061
[nid03060:36184] mca: bml: Using self btl for send to [[54630,1],0] on node nid03060
[nid03061:36208] mca: bml: Using ugni btl for send to [[54630,1],0] on node (null)
[nid03060:36184] mca: bml: Using ugni btl for send to [[54630,1],1] on node (null)
```
It looks like the UGNI btl is being initialized correctly but then fails to find the node to communicate with? Is there a way to get more information? There doesn't seem to be an MCA parameter to increase verbosity specifically of the UGNI btl.
Any help would be appreciated!
Cheers
Joseph
<config.log.tgz>
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Nathan Hjelm
2018-05-17 13:49:43 UTC
Permalink
The invalid writes in uGNI are nothing. I suggest adding any GNI_ call to a suppression file. The RB tree invalid write looks like a bug. I will take a look and see what might be causing it.

BTW, you can add --with-valgrind(=DIR) to configure. This will suppress some uninitialized value errors with btl/vader and other components. It won’t help with btl/ugni right now though.

-Nathan
Nathan,
I am trying to track down some memory corruption that leads to crashes in my application running on the Cray system using Open MPI (git-6093f2d). Valgrind reports quite some invalid reads and writes inside Open MPI when running the benchmark that I sent you earlier.
```
==42751== Invalid write of size 8
==42751== at 0x94C647D: GNII_POST_FMA_GET (in /opt/cray/ugni/6.0.14-6.0.5.0_16.9__g19583bb.ari/lib64/libugni.so.0.6.0)
==42751== by 0x94C8D74: GNI_PostFma (in /opt/cray/ugni/6.0.14-6.0.5.0_16.9__g19583bb.ari/lib64/libugni.so.0.6.0)
==42751== by 0x10FA21D0: mca_btl_ugni_get (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_btl_ugni.so)
==42751== by 0x134AF6C5: ompi_osc_get_data_blocking (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_osc_rdma.so)
==42751== by 0x134D0CC4: ompi_osc_rdma_peer_lookup (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_osc_rdma.so)
==42751== by 0x134B4A1F: ompi_osc_rdma_rget (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_osc_rdma.so)
==42751== by 0x46C1D52: PMPI_Rget (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libmpi.so.0.0.0)
==42751== by 0x20001EA9: main (in /zhome/academic/HLRS/hlrs/hpcjschu/src/test/mpi_test_loop)
==42751== Address 0x2aaaaabc0000 is not stack'd, malloc'd or (recently) free'd
==42751== Invalid write of size 8
==42751== at 0x94D76BC: GNII_SmsgSend (in /opt/cray/ugni/6.0.14-6.0.5.0_16.9__g19583bb.ari/lib64/libugni.so.0.6.0)
==42751== by 0x94D9D5C: GNI_SmsgSendWTag (in /opt/cray/ugni/6.0.14-6.0.5.0_16.9__g19583bb.ari/lib64/libugni.so.0.6.0)
==42751== by 0x10F9D9E6: mca_btl_ugni_sendi (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_btl_ugni.so)
==42751== by 0x11BE5DDF: mca_pml_ob1_isend (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_pml_ob1.so)
==42751== by 0x1201DC40: NBC_Progress (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_coll_libnbc.so)
==42751== by 0x1201DC91: NBC_Progress (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_coll_libnbc.so)
==42751== by 0x1201C692: ompi_coll_libnbc_progress (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_coll_libnbc.so)
==42751== by 0x631A503: opal_progress (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
==42751== by 0x632111C: ompi_sync_wait_mt (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
==42751== by 0x4669A4C: ompi_comm_nextcid (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libmpi.so.0.0.0)
==42751== by 0x4667ECC: ompi_comm_dup_with_info (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libmpi.so.0.0.0)
==42751== by 0x134C15AE: ompi_osc_rdma_component_select (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_osc_rdma.so)
==42751== Address 0x2aaaaabaf000 is not stack'd, malloc'd or (recently) free'd
==42751== Invalid write of size 8
==42751== at 0x6316E64: opal_rb_tree_delete (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
==42751== by 0x1076BA03: mca_mpool_hugepage_seg_free (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_mpool_hugepage.so)
==42751== by 0x1015EB33: mca_allocator_bucket_cleanup (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_allocator_bucket.so)
==42751== by 0x1015DF5C: mca_allocator_bucket_finalize (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_allocator_bucket.so)
==42751== by 0x1076BAE6: mca_mpool_hugepage_finalize (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_mpool_hugepage.so)
==42751== by 0x1076C202: mca_mpool_hugepage_close (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_mpool_hugepage.so)
==42751== by 0x633CED9: mca_base_component_close (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
==42751== by 0x633CE01: mca_base_components_close (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
==42751== by 0x63C6F31: mca_mpool_base_close (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
==42751== by 0x634AEF7: mca_base_framework_close (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
==42751== by 0x4687B6A: ompi_mpi_finalize (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libmpi.so.0.0.0)
==42751== by 0x20001F35: main (in /zhome/academic/HLRS/hlrs/hpcjschu/src/test/mpi_test_loop)
==42751== Address 0xa3aa348 is 16,440 bytes inside a block of size 16,568 free'd
==42751== at 0x4428CDA: free (vg_replace_malloc.c:530)
==42751== by 0x630FED2: opal_free_list_destruct (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
==42751== by 0x63160C1: opal_rb_tree_destruct (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
==42751== by 0x1076BACE: mca_mpool_hugepage_finalize (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_mpool_hugepage.so)
==42751== by 0x1076C202: mca_mpool_hugepage_close (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/openmpi/mca_mpool_hugepage.so)
==42751== by 0x633CED9: mca_base_component_close (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
==42751== by 0x633CE01: mca_base_components_close (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
==42751== by 0x63C6F31: mca_mpool_base_close (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
==42751== by 0x634AEF7: mca_base_framework_close (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libopen-pal.so.0.0.0)
==42751== by 0x4687B6A: ompi_mpi_finalize (in /zhome/academic/HLRS/hlrs/hpcjschu/opt-cray/openmpi-6093f2d-intel/lib/libmpi.so.0.0.0)
==42751== by 0x20001F35: main (in /zhome/academic/HLRS/hlrs/hpcjschu/src/test/mpi_test_loop)
```
I'm not sure whether the invalid writes (and reads) during initialization and communication are caused by Open MPI or uGNI itself and whether they are critical (the addresses seem to be "special"). The write-after-free in MPI_Finalize seems suspicious though. I cannot say whether that causes the memory corruption I am seeing but I thought I report it. I will dig further into this to try to figure out what causes the crashes (they are not deterministically reproducible, unfortunately).
Cheers,
Joseph
Post by Nathan Hjelm
Thanks for confirming that it works for you as well. I have a PR open on v3.1.x that brings osc/rdma up to date with master. I will also be bringing some code that greatly improves the multi-threaded RMA performance on Aries systems (at least with benchmarks— github.com/hpc/rma-mt). That will not make it into v3.1.x but will be in v4.0.0.
-Nathan
Nathan,
Thank you, I can confirm that it works as expected with master on our system. I will stick to this version then until 3.1.1 is out.
Joseph
Looks like it doesn't fail with master so at some point I fixed this bug. The current plan is to bring all the master changes into v3.1.1. This includes a number of bug fixes.
-Nathan
Nathan,
Thanks for looking into that. My test program is attached.
Best
Joseph
I will take a look today. Can you send me your test program?
-Nathan
All,
I have been experimenting with using Open MPI 3.1.0 on our Cray XC40 (Haswell-based nodes, Aries interconnect) for multi-threaded MPI RMA. Unfortunately, a simple (single-threaded) test case consisting of two processes performing an MPI_Rget+MPI_Wait hangs when running on two nodes. It succeeds if both processes run on a single node.
```
# this seems necessary to avoid a linker error during build
export CRAYPE_LINK_TYPE=dynamic
module swap PrgEnv-cray PrgEnv-intel
module sw craype-haswell craype-sandybridge
module unload craype-hugepages16M
module unload cray-mpich
```
```
mpirun --mca btl_base_verbose 100 --mca btl ^tcp -n 2 -N 1 ./mpi_test_loop
[nid03060:36184] mca: base: components_register: registering framework btl components
[nid03060:36184] mca: base: components_register: found loaded component self
[nid03060:36184] mca: base: components_register: component self register function successful
[nid03060:36184] mca: base: components_register: found loaded component sm
[nid03061:36208] mca: base: components_register: registering framework btl components
[nid03061:36208] mca: base: components_register: found loaded component self
[nid03060:36184] mca: base: components_register: found loaded component ugni
[nid03061:36208] mca: base: components_register: component self register function successful
[nid03061:36208] mca: base: components_register: found loaded component sm
[nid03061:36208] mca: base: components_register: found loaded component ugni
[nid03060:36184] mca: base: components_register: component ugni register function successful
[nid03060:36184] mca: base: components_register: found loaded component vader
[nid03061:36208] mca: base: components_register: component ugni register function successful
[nid03061:36208] mca: base: components_register: found loaded component vader
[nid03060:36184] mca: base: components_register: component vader register function successful
[nid03060:36184] mca: base: components_open: opening btl components
[nid03060:36184] mca: base: components_open: found loaded component self
[nid03060:36184] mca: base: components_open: component self open function successful
[nid03060:36184] mca: base: components_open: found loaded component ugni
[nid03060:36184] mca: base: components_open: component ugni open function successful
[nid03060:36184] mca: base: components_open: found loaded component vader
[nid03060:36184] mca: base: components_open: component vader open function successful
[nid03060:36184] select: initializing btl component self
[nid03060:36184] select: init of component self returned success
[nid03060:36184] select: initializing btl component ugni
[nid03061:36208] mca: base: components_register: component vader register function successful
[nid03061:36208] mca: base: components_open: opening btl components
[nid03061:36208] mca: base: components_open: found loaded component self
[nid03061:36208] mca: base: components_open: component self open function successful
[nid03061:36208] mca: base: components_open: found loaded component ugni
[nid03061:36208] mca: base: components_open: component ugni open function successful
[nid03061:36208] mca: base: components_open: found loaded component vader
[nid03061:36208] mca: base: components_open: component vader open function successful
[nid03061:36208] select: initializing btl component self
[nid03061:36208] select: init of component self returned success
[nid03061:36208] select: initializing btl component ugni
[nid03061:36208] select: init of component ugni returned success
[nid03061:36208] select: initializing btl component vader
[nid03061:36208] select: init of component vader returned failure
[nid03061:36208] mca: base: close: component vader closed
[nid03061:36208] mca: base: close: unloading component vader
[nid03060:36184] select: init of component ugni returned success
[nid03060:36184] select: initializing btl component vader
[nid03060:36184] select: init of component vader returned failure
[nid03060:36184] mca: base: close: component vader closed
[nid03060:36184] mca: base: close: unloading component vader
[nid03061:36208] mca: bml: Using self btl for send to [[54630,1],1] on node nid03061
[nid03060:36184] mca: bml: Using self btl for send to [[54630,1],0] on node nid03060
[nid03061:36208] mca: bml: Using ugni btl for send to [[54630,1],0] on node (null)
[nid03060:36184] mca: bml: Using ugni btl for send to [[54630,1],1] on node (null)
```
It looks like the UGNI btl is being initialized correctly but then fails to find the node to communicate with? Is there a way to get more information? There doesn't seem to be an MCA parameter to increase verbosity specifically of the UGNI btl.
Any help would be appreciated!
Cheers
Joseph
<config.log.tgz>
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Loading...