receive signal before process is being killed by OOM killer / cgroups

15,480

Solution 1

It's possible to register for a notification for when a cgroup's memory usage goes above a threshold. In principle, setting the threshold at a suitable point below the actual limit would let you send a signal or take other action.

See:

https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt

Solution 2

OOM killer does send a SIGKILL as it would otherwise be counter-productive to let the problematic program the choice of continuing.

This means that there is absolutely no way for a process to know when it is about to get killed by it.

Managing such issues usually imply making corrections to the programs or their configuration. Sometimes, depending on the system's configuration, simply increasing swap space can give the OS more memory management flexibility to avoid such drastic measures.

Solution 3

It looks like you already use cgroups, this helps.

If your process is the only process in the cgroup (i.e. the only process that can be killed) and you own the program you execute, then you can modify your program to spawn a child, and adjust its oom score to some high value. This process, thus, will become a decoy: when you are reaching cgroup memory limit, OOM killer will kill this decoy process instead of the main process. The main process can wait on its child decoy to know the exact moment when OOM killer is triggered.

IMO, it's easier than monitoring script with some threshold. Here's example in Bash

#!/usr/bin/bash

self_pid=$$

(
    /usr/bin/sleep infinity &
    oom_decoy_pid=$!
    echo "1000" > "/proc/${oom_decoy_pid}/oom_score_adj"
    echo "Launched oom decoy ${oom_decoy_pid} for parent process ${self_pid}"

    wait $oom_decoy_pid

    echo "OOM decoy is killed. Likely OOM is coming!"
    echo "Signalling parent..."
    kill -SIGTERM $self_pid
)&

while true; do
    sleep 1
    echo "Doing something important and memory heavy"
done
Share:
15,480

Related videos on Youtube

Albert
Author by

Albert

I am postgraduate of RWTH Aachen, Germany and received a M.S. Math and a M.S. CompSci. My main interests are Machine Learning, Neural Networks, Artificial Intelligence, Logic, Automata Theory and Programming Languages. And I'm an enthusiastic hobby programmer with a wide range of side projects, mostly in C++ and Python. Homepage GitHub SourceForge HackerNewsers profile page MetaOptimize Q+A

Updated on September 18, 2022

Comments

  • Albert
    Albert almost 2 years

    In our cluster, we are restricting our processes resources, e.g. memory (memory.limit_in_bytes).

    I think, in the end, this is also handled via the OOM killer in the Linux kernel (looks like it by reading the source code).

    Is there any way to get a signal before my process is being killed? (Just like the -notify option for SGE's qsub, which will send SIGUSR1 before the process is killed.)

    I read about /dev/mem_notify here but I don't have it - is there something else nowadays? I also read this which seems somewhat relevant.

    I want to be able to at least dump a small stack trace and maybe some other useful debug info - but maybe I can even recover by freeing some memory.

    One workaround I'm currently using is this small script which frequently checks if I'm close (95%) to the limit and if so, it sends the process a SIGUSR1. In Bash, I'm starting this script in background (cgroup-mem-limit-watcher.py &) so that it watches for other procs in the same cgroup and it quits automatically when the parent Bash process dies.

    • Hi-Angel
      Hi-Angel over 8 years
      I couldn't find any authority sources, nor I could find a way to invoke OOM killer for specific process manually (to test the idea), but from what I found it seems that OOM killer is simply sends SIGTERM, so you have to set a handler for this signal.
    • Albert
      Albert over 8 years
      @Hi-Angel: From the Linux source code, it seems that it sends SIGKILL.
    • andy
      andy over 8 years
      @Albert After reading the source code, i also think that OOM Killer will direct send a SIGKILL signal.