Discussion:
[OMPI users] problem
Ankita m
2018-05-09 15:39:52 UTC
Permalink
yes. Because previously i was using intel-mpi. That time the program was
running perfectly. Now when i use openmpi this shows this error
files...Though i am not quite sure. I just thought if the issue will be for
Openmpi then i could get some help here.

On Wed, May 9, 2018 at 6:47 PM, Gilles Gouaillardet <
Ankita,
Do you have any reason to suspect the root cause of the crash is Open MPI ?
Cheers,
Gilles
MPI "Hello World" program is also working
please see this error file attached below. its of a different program
On Wed, May 9, 2018 at 4:10 PM, John Hearns via users <
Ankita, looks like your program is not launching correctly.
define two hosts in a machinefile. Use mpirun -np 2 machinefile date
Ie can you use mpirun just to run the command 'date'
Secondly compile up and try to run an MPI 'Hello World' program
I am using ompi -3.1.0 version in my program and compiler is mpicc
its a parallel program which uses multiple nodes with 16 cores in each
node.
but its not working and generates a error file . i Have attached the
error file below.
can anyone please tell what is the issue actually
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Ankita m
2018-05-10 09:47:00 UTC
Permalink
ok...Thank you so much sir
It looks like you're getting a segv when calling MPI_Comm_rank().
This is quite unusual -- MPI_Comm_rank() is just a local lookup / return
of an integer. If MPI_Comm_rank() is seg faulting, it usually indicates
that there's some other kind of memory error in the application, and this
seg fault you're seeing is just a symptom -- it's not the real problem. It
may have worked with Intel MPI by chance, or for some reason, Intel MPI has
a different memory pattern than Open MPI and it didn't happen to trigger
this exact problem.
You might want to run your application through a memory-checking debugger.
Post by Ankita m
yes. Because previously i was using intel-mpi. That time the program was
running perfectly. Now when i use openmpi this shows this error
files...Though i am not quite sure. I just thought if the issue will be for
Openmpi then i could get some help here.
Post by Ankita m
On Wed, May 9, 2018 at 6:47 PM, Gilles Gouaillardet <
Ankita,
Do you have any reason to suspect the root cause of the crash is Open
MPI ?
Post by Ankita m
Cheers,
Gilles
MPI "Hello World" program is also working
please see this error file attached below. its of a different program
On Wed, May 9, 2018 at 4:10 PM, John Hearns via users <
Ankita, looks like your program is not launching correctly.
define two hosts in a machinefile. Use mpirun -np 2 machinefile date
Ie can you use mpirun just to run the command 'date'
Secondly compile up and try to run an MPI 'Hello World' program
I am using ompi -3.1.0 version in my program and compiler is mpicc
its a parallel program which uses multiple nodes with 16 cores in each
node.
Post by Ankita m
but its not working and generates a error file . i Have attached the
error file below.
Post by Ankita m
can anyone please tell what is the issue actually
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
--
Jeff Squyres
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
dpchoudh
2018-05-10 12:53:28 UTC
Permalink
What Jeff is suggesting is probably valgrind. However, in my
experience, which is much less than most OpenMPI developers, a simple
code inspection often is adequate. Here are the steps:

1. If you don't already have it, build a debug version of your code.
If you are using gcc, you'd use a -g to CFLAGS on your makefile for C
programs (adding -g3, taking out any -O flags is better)
2. Have your shell generate a core dump when the crash happens.
3. Launch gdb with the debug image and core file

I have had near 100% luck in detecting sources of SEGV-type crash
using the steps above, but your mileage may vary. If you are not
familiar with gdb, you may be able to enlist someone local who does.


We learn from history that we never learn from history.
Post by Ankita m
ok...Thank you so much sir
On Wed, May 9, 2018 at 11:13 PM, Jeff Squyres (jsquyres)
It looks like you're getting a segv when calling MPI_Comm_rank().
This is quite unusual -- MPI_Comm_rank() is just a local lookup / return
of an integer. If MPI_Comm_rank() is seg faulting, it usually indicates
that there's some other kind of memory error in the application, and this
seg fault you're seeing is just a symptom -- it's not the real problem. It
may have worked with Intel MPI by chance, or for some reason, Intel MPI has
a different memory pattern than Open MPI and it didn't happen to trigger
this exact problem.
You might want to run your application through a memory-checking debugger.
Post by Ankita m
yes. Because previously i was using intel-mpi. That time the program was
running perfectly. Now when i use openmpi this shows this error
files...Though i am not quite sure. I just thought if the issue will be for
Openmpi then i could get some help here.
On Wed, May 9, 2018 at 6:47 PM, Gilles Gouaillardet
Ankita,
Do you have any reason to suspect the root cause of the crash is Open MPI ?
Cheers,
Gilles
MPI "Hello World" program is also working
please see this error file attached below. its of a different program
On Wed, May 9, 2018 at 4:10 PM, John Hearns via users
Ankita, looks like your program is not launching correctly.
define two hosts in a machinefile. Use mpirun -np 2 machinefile date
Ie can you use mpirun just to run the command 'date'
Secondly compile up and try to run an MPI 'Hello World' program
I am using ompi -3.1.0 version in my program and compiler is mpicc
its a parallel program which uses multiple nodes with 16 cores in each
node.
but its not working and generates a error file . i Have attached the
error file below.
can anyone please tell what is the issue actually
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
--
Jeff Squyres
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
https://lists.open-mpi.org/mailman/listinfo/users
Loading...