Discussion:
[OMPI users] users Digest, Vol 3729, Issue 2
g***@buaa.edu.cn
2017-03-03 04:01:38 UTC
Permalink
Hi Jeff:
Thanks for your suggestions.
1. I have execute the command " find / -name core*" in each node, no coredump file was found. and no coredump files in the /home, /core and pwd file path as well.
2. I have changed core_pattern follow your advice, still no expected coredump file.
3. I didn't use any resource scheduler, but just ssh.
At last I tried to add setrlimit(2) in my code, it worked! I got the coredump file which I want. But I don't know why.
If I don't want to modify my code, how can I to setup the config or something to achieve coredump?
Here is the result of "ulimit -a"
--------------------------------------------------------------
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 256511
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 65535
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
------------------------------------------------------
Regards!



Eric

From: users-request
Date: 2017-03-03 03:00
To: users
Subject: users Digest, Vol 3729, Issue 2
Send users mailing list submissions to
***@lists.open-mpi.org

To subscribe or unsubscribe via the World Wide Web, visit
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
or, via email, send a message with subject or body 'help' to
users-***@lists.open-mpi.org

You can reach the person managing the list at
users-***@lists.open-mpi.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of users digest..."


Today's Topics:

1. coredump about MPI (***@buaa.edu.cn)
2. Re: coredump about MPI (Jeff Squyres (jsquyres))


----------------------------------------------------------------------

Message: 1
Date: Thu, 2 Mar 2017 22:19:51 +0800
From: "***@buaa.edu.cn" <***@buaa.edu.cn>
To: users <***@lists.open-mpi.org>
Subject: [OMPI users] coredump about MPI
Message-ID: <***@buaa.edu.cn>
Content-Type: text/plain; charset="us-ascii"

hi developers and users:
I have a question about the coredump of MPI programs. I have two nodes, when the program was runned on the single node respectively,
It can get the corefile correctly(In order to make a coredump, there is a divide-by-zero operation in this program).
But when I runned the program on two nodes, if the illegle operation happened in the node which is different from the node used to execute
this "mpirun" command, there is no coredump file.
I have checked "ulimit -c" and so on,but still can not figure out.
thanks a lot for your help and best regards!

-------------------------------------
Eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://rfd.newmexicoconsortium.org/mailman/private/users/attachments/20170302/03b69a4d/attachment.html>

------------------------------

Message: 2
Date: Thu, 2 Mar 2017 15:34:56 +0000
From: "Jeff Squyres (jsquyres)" <***@cisco.com>
To: "Open MPI User's List" <***@lists.open-mpi.org>
Subject: Re: [OMPI users] coredump about MPI
Message-ID: <F84163BA-87C9-4747-8BA2-***@cisco.com>
Content-Type: text/plain; charset="us-ascii"

A few suggestions:

1. Look for the core files in directories where you might not expect:
- your $HOME (particularly if your $HOME is not a networked filesystem)
- in /cores
- in the pwd where the executable was launched on that machine

2. If multiple processes will be writing core files in the same directory, make sure that they don't write to the same filename (you'll likely end up with a single corrupt corefile). For example, on Linux, you can (as root) "echo "core.%e-%t-%p" >/proc/sys/kernel/core_pattern" to get a unique corefile for each process and host (this is what I use on my development cluster).

3. If you are launching via a resource scheduler (e.g., SLURM, Torque, etc.), the scheduler may be resetting the corefile limit back down to zero before launching your job. If this is what is happening, it may be a little tricky to override this because the scheduler will likely do it *on each node*, and therefore you likely need to override it *in each MPI process* (via setrlimit(2)).
I have a question about the coredump of MPI programs. I have two nodes, when the program was runned on the single node respectively,
It can get the corefile correctly(In order to make a coredump, there is a divide-by-zero operation in this program).
But when I runned the program on two nodes, if the illegle operation happened in the node which is different from the node used to execute
this "mpirun" command, there is no coredump file.
I have checked "ulimit -c" and so on,but still can not figure out.
thanks a lot for your help and best regards!
-------------------------------------
Eric
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
--
Jeff Squyres
***@cisco.com



------------------------------

Subject: Digest Footer

_______________________________________________
users mailing list
***@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

------------------------------

End of users Digest, Vol 3729, Issue 2
**************************************
Gilles Gouaillardet
2017-03-03 04:24:08 UTC
Permalink
Hi,


there is likely something wrong in Open MPI (i will follow up in the
devel ML)


meanwhile, you can

mpirun --mca opal_set_max_sys_limits core:unlimited ...


Cheers,


Gilles
Post by g***@buaa.edu.cn
Thanks for your suggestions.
1. I have execute the command " find / -name core*" in each node,
no coredump file was found. and no coredump files in the /home, /core
and pwd file path as well.
2. I have changed core_pattern follow your advice, still no expected coredump file.
3. I didn't use any resource scheduler, but just ssh.
At last I tried to add setrlimit(2) in my code, it worked! I got the
coredump file which I want. But I don't know why.
If I don't want to modify my code, how can I to setup the config or
something to achieve coredump?
Here is the result of "ulimit -a"
--------------------------------------------------------------
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 256511
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 65535
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
------------------------------------------------------
Regards!
------------------------------------------------------------------------
Eric
*Date:* 2017-03-03 03:00
*Subject:* users Digest, Vol 3729, Issue 2
Send users mailing list submissions to
To subscribe or unsubscribe via the World Wide Web, visit
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
or, via email, send a message with subject or body 'help' to
You can reach the person managing the list at
When replying, please edit your Subject line so it is more specific
than "Re: Contents of users digest..."
2. Re: coredump about MPI (Jeff Squyres (jsquyres))
----------------------------------------------------------------------
Message: 1
Date: Thu, 2 Mar 2017 22:19:51 +0800
Subject: [OMPI users] coredump about MPI
Content-Type: text/plain; charset="us-ascii"
I have a question about the coredump of MPI programs. I have
two nodes, when the program was runned on the single node
respectively,
It can get the corefile correctly(In order to make a coredump,
there is a divide-by-zero operation in this program).
But when I runned the program on two nodes, if the illegle
operation happened in the node which is different from the node
used to execute
this "mpirun" command, there is no coredump file.
I have checked "ulimit -c" and so on,but still can not figure out.
thanks a lot for your help and best regards!
-------------------------------------
Eric
-------------- next part --------------
An HTML attachment was scrubbed...
<https://rfd.newmexicoconsortium.org/mailman/private/users/attachments/20170302/03b69a4d/attachment.html>
------------------------------
Message: 2
Date: Thu, 2 Mar 2017 15:34:56 +0000
Subject: Re: [OMPI users] coredump about MPI
Content-Type: text/plain; charset="us-ascii"
- your $HOME (particularly if your $HOME is not a networked filesystem)
- in /cores
- in the pwd where the executable was launched on that machine
2. If multiple processes will be writing core files in the same
directory, make sure that they don't write to the same filename
(you'll likely end up with a single corrupt corefile). For
example, on Linux, you can (as root) "echo "core.%e-%t-%p"
Post by g***@buaa.edu.cn
/proc/sys/kernel/core_pattern" to get a unique corefile for each
process and host (this is what I use on my development cluster).
3. If you are launching via a resource scheduler (e.g., SLURM,
Torque, etc.), the scheduler may be resetting the corefile limit
back down to zero before launching your job. If this is what is
happening, it may be a little tricky to override this because the
scheduler will likely do it *on each node*, and therefore you
likely need to override it *in each MPI process* (via setrlimit(2)).
Post by g***@buaa.edu.cn
I have a question about the coredump of MPI programs. I have
two nodes, when the program was runned on the single node
respectively,
Post by g***@buaa.edu.cn
It can get the corefile correctly(In order to make a coredump,
there is a divide-by-zero operation in this program).
Post by g***@buaa.edu.cn
But when I runned the program on two nodes, if the illegle
operation happened in the node which is different from the node
used to execute
Post by g***@buaa.edu.cn
this "mpirun" command, there is no coredump file.
I have checked "ulimit -c" and so on,but still can not figure out.
thanks a lot for your help and best regards!
-------------------------------------
Eric
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
--
Jeff Squyres
------------------------------
Subject: Digest Footer
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
------------------------------
End of users Digest, Vol 3729, Issue 2
**************************************
_______________________________________________
users mailing list
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Loading...