Is it possible to make the OOM killer intervent earlier?

11,406

Solution 1

I also struggled with that issue. I just want my system to stay responsive, no matter what, and I prefer losing processes to waiting a few minutes. There seems to be no way to achieve this using the kernel oom killer.

However, in the user space, we can do whatever we want. So I wrote the Early OOM Daemon ( https://github.com/rfjakob/earlyoom ) that will kill the largest process (by RSS) once the available RAM goes below 10%.

Without earlyoom, it has been easy to lock up my machine (8GB RAM) by starting http://www.unrealengine.com/html5/ a few times. Now, the guilty browser tabs get killed before things get out of hand.

Solution 2

The default policy of the kernel is to allow applications to keep allocating virtual memory as long as there is free physical memory. The physical memory isn't actually used until the applications touch the virtual memory they allocated, so an application can allocate much more memory than the system has, then start touching it later, causing the kernel to run out of memory, and trigger the out of memory (OOM) killer. Before the hogging process is killed though, it has caused the disk cache to be emptied, which makes the system slow to respond for a while until the cache refills.

You can change the default policy to disallow memory overcommit by writing a value of 2 to /proc/sys/vm/overcommit_memory. The default value of /proc/sys/vm/overcommit_ratio is 50, so the kernel will not allow applications to allocate more than 50% of ram+swap. If you have no swap, then the kernel will not allow applications to allocate more than 50% of your ram, leaving the other 50% free for the cache. That may be a bit excessive, so you may want to increase this value to say, 85% or so, so applications can allocate up to 85% of your ram, leaving 15% for the cache.

Solution 3

For me setting vm.admin_reserve_kbytes=262144 does exactly this thing. OOM killer intervents before system goes completely unresponsive.

Solution 4

The other answers have good automatic solutions, but I find it can be helpful to also enable the SysRq key for when things get out of hand. With the SysRq key, you'd be manually messaging the kernel, and you can do things like a safe reboot (with SysRQ + REISUB) even if userspace has completely frozen.

To allow the kernel to listen to requests, set kernel.sysrq = 1, or enable just the functions you're likely to use with a bitmask (documented here). For example kernel.sysrq = 244 will enable all the combos needed for the safe reboot above as well as manual invocation of the OOM killer with SysRq + F.

Share:
11,406

Related videos on Youtube

dronus
Author by

dronus

Updated on September 18, 2022

Comments

  • dronus
    dronus over 1 year

    I try to tweak my development system to maximal reliability. I disabled swap, because for GUI usage it mostly renders the machine unresponsive in such a way not useable anymore. Nevertheless, if agressive appications eat up the memory, some mechanisms seem to kick in that making the most out of it on cost of speed. There is no harddrive swap operation, but the system is getting unresponsive likewise. So I want to let the OOM killer kick in before the system make any special efforts on memory gain. Is it possible to configure the OOM killer to act if there is less than 100 MB free physical memory for example?

    • Thalys
      Thalys about 12 years
      I think the real issue here is, there's not enough ram to start with. You won't use swap unless there's no ram. By turning off swap... you run out of ram and have no where to page it to. Which causes ugly things to happen. Your system seems to be set up badly, and no amount of tweaking will fix that.
    • dronus
      dronus about 12 years
      I don't agree. Development and 'power use' often involves experimental usage. For example, when using a command line image processing tool, there are no specs how much memory it's operation take in relation to the image size. So I just give it a run. And I don't expect it to render my whole machine useless. For a single experiment, I could use ulimit to keep it secured, but for whole system operation with sometimes plenty of operations, the containment of one process is not so usefull but a 'life insurance' for the whole machine definetly is.
    • dronus
      dronus about 12 years
      Simply said, in my wide field of everyday usage, there are plenty of tasks my machine is able to handle, but some it is not able.
    • Thalys
      Thalys about 12 years
      The fact that your system grids to a halt when using swap is suspect. Your computer is using swap cause its out of memory. Swap is slowing down cause disk access is slow. Disk access is slow due to ???. Its problems all the way down. Its not just that you're low on ram. Its that you can't use the one way to mitgate that due to something else.
    • psusi
      psusi about 12 years
      @JourneymanGeek, you are off in left field. Disks are slow compared to ram, period, hence heavy swapping always grinds the system to a halt. Of course he is out of memory because he tried running a program that uses a lot of memory. The question is what to do when out of memory? Kill the hog, or slow down due to having no memory left for the disk cache.
    • Tamara Wijsman
      Tamara Wijsman about 12 years
      @psusi: You don't understand how swap space works, with slow down being an assumption. We are all running page files and swap spaces world wide; and then you come with the suggestion to disable it, you have to explain a little bit more than that...
    • psusi
      psusi about 12 years
      @TomWijsman, Disk IO is many orders of magnitude slower than memory IO, so using disk swap has always meant a huge slow down. Sometimes ( especially in the old days where ram was expensive and so most people didn't have much ) that's preferable to not being able to do what you were trying at all. These days the disk is SO much slower than ram, and ram is cheap enough that most people have plenty, so on the rare occasion where they accidentally run something that uses more ram than they have, it is often better to give up than take 1000 times as long to do it.
    • Tamara Wijsman
      Tamara Wijsman about 12 years
      @psusi: You don't understand how disk swap works. If you think that "your memory is waiting on your disk" you either have a badly implemented kernel at your disposal or don't know what you are talking about. Your entire comment makes no sense as a result, please come with a theoretically backed up explanation instead of guessing...
    • joeytwiddle
      joeytwiddle over 11 years
      I would also like OOM killer to trigger a little earlier. Sometimes it is obvious that the I have overloaded my system, but it takes minutes of swapping and mouse/keyboard jitter before the killer acts.
  • psusi
    psusi about 12 years
    Of course disabling swap improves the behavior because instead of thrashing the disk, the OOM kicks in and kills the memory hog. Running out of ram isn't the problem ( and adding more just means you have to try harder to run out ). The problem is what to do when you DO run out. You want the OOM to kill the hog, and thus relieve the low memory condition.
  • Tamara Wijsman
    Tamara Wijsman about 12 years
    @psusi: How is limiting favor such that an application I could need is killed an improvement of behavior? I'd rather just have some more memory and/or swap space such that my application keeps on running. You'll want to have some space for your other applications (either by moving them to swap or by extra memory), not your current application getting killed. That's a bad UX...
  • psusi
    psusi about 12 years
    Because killing an application that is trying to use more memory than you have is preferable to bringing the entire system to its knees. In a perfect world you would have unlimited memory and never run out, but in reality, sometimes you run out by accident and would rather be told "not enough memory" than have the system grind to a halt.
  • psusi
    psusi about 12 years
    I'm assuming no such thing; it was explicitly stated in the question.
  • Tamara Wijsman
    Tamara Wijsman about 12 years
    Changing these values from there defaults without theoretical background is not going to reach in a more reliable system, you can only justify that change with proper statistics. Just because you can change it doesn't mean you should. If you constantly in low memory conditions that means that you are using more memory than you have and should buy more memory, it doesn't mean you should fiddle with your settings and kill random applications. Interrupting with your daily working or introducing corruption, that's really not the way to go...
  • psusi
    psusi about 12 years
    @TomWijsman, the question makes it clear that he isn't constantly in low memory conditions; he just sometimes runs a command that takes an unexpectedly large amount of memory. Buying more memory is not the the only solution when you run out. Other potential solutions include finding better ways to make use of the memory you have, or just not doing whatever needs that much memory. The question makes it clear that the latter is more acceptable than going out and buying more ram.
  • Tamara Wijsman
    Tamara Wijsman about 12 years
    Which line in the question makes this clear? I see the opposite given in I disabled swap, because for GUI usage it mostly renders the machine unresponsive in such a way not useable anymore.. He mentioned GUI, while you are assuming he runs a command. Buying more memory is the first solution, using less memory yourself is the second solution, making your system unstable by fiddling with the stable defaults is the last solution. The question doesn't have to be answered literally, so I don't see what's your problem that you have to bother both of us in the comments. Rant doesn't help...
  • Tamara Wijsman
    Tamara Wijsman about 12 years
    @psusi: You are just assuming it brings the system to its knees. You are also assuming that people hit their memory limit very often, I'm a quite heavy developer / gamer / ... and am yet to hit my memory limit. My page file gets many thanks from me! I'm talking about people, not about the low-memory OP. Hence it's just an assumption...
  • dronus
    dronus about 12 years
    Buying some extra memory might solve some problems, depending on the amount bought. But it doesn't change the fact that there may be inexpected usages by orders of magnitude. So I want the application to fail, but NOT the system under those conditions. Some examples: Process a folder full of compressed images, most of them "normal" size, but some of them really large. A small mistake could make a dead loop with memory runaway eating 1GB/s. Accidentally open a video file in a text editor. Usually this ends with symptoms like jerky mouse and almost dead UI until the OOM kicks in.
  • Tamara Wijsman
    Tamara Wijsman about 12 years
    @dronus: You can just send a kill signal. Dead loops eating memory mean you aren't doing defensive programming and not using proper patterns, and aren't preferring for and foreach over while. ;)
  • dronus
    dronus about 12 years
    Hey, this answer sounded quite cool. Unfortunately, the 'commit' refers to virtual memory demand it seems, which is quite bad estimated by application programmers. For example with my (no swap) desktop running, there is about 400 of 2000mb physical memory used, but 1600mb 'commit'ted as /proc/meminfo's Committed_AS states. With some applications running, this value easily exceeds the physical memory so it's hard to set a feasable limit by this.
  • dronus
    dronus about 12 years
    I have no problems with killing applications that would run dead either. Consider a system with 2GB physical + 2GB swap. An application that quickly runs out the physical memory can easily eat the swap too. It would just die later, after rendering the system unresponsive for minutes to hours. So why not kill it quickly before GUI operation get flaky? Much processes do all their work with 10mb, some take 1gb, and some rare would need 10gb, that's life.
  • Tamara Wijsman
    Tamara Wijsman about 12 years
    @dronus: This article might also be interesting if you want to look further into OOM: Taming the OOM killer. Although the more simple solution would be something like ulimit -v.
  • psusi
    psusi about 12 years
    @dronus, yes, it is tricky trying to get it "just right".
  • dronus
    dronus about 12 years
    @TomWijsman: I don't like introducing anything into the application because this is a general problem that can be caused by several applications. That's why I still see the OOM as the apropriate tool. ulimit -v is unrealiable because it can only limit one application and doesn't care about the total available ressources.
  • dronus
    dronus over 10 years
    I think this answer is a bit like 'don't do anything real bad to solve a problem' but in my opinion a system that can made unresponsive by a bunch of seemingly harmless user actions is worse than a system that kills them early.
  • Tamara Wijsman
    Tamara Wijsman over 10 years
    @dronus: Compared to back in April 2012, these days we have cgroups to manage this; so, you might be able to set up cgroups in a way that does what you want. I still feel like dealing more active with the program will yield you better results than dealing passively with it; the goal would be to get the programs to run in the memory conditions you can provide, and programs that do not fit into it could be configured to be less needy or could have a bug reported (or feature request) so the developers could look into the high memory usage. It's just, OOM doesn't feel reliable to me...
  • jozxyqk
    jozxyqk about 9 years
    Save your work before trying this! :P I had immediate failures from everything (bash, window manager etc).
  • Thomas Ferris Nicolaisen
    Thomas Ferris Nicolaisen over 8 years
    Thanks for scratching this itch! Loving earlyoom so far.
  • dronus
    dronus about 8 years
    Just figured out Android does the same for a long time. I am not sure if it is using custom code like yours for that.
  • dronus
    dronus about 8 years
    I am testing earlyoom now, it does well in a first trigger test. I just wonder why this can't be implemented by kernel configuration or system tools.
  • Jérôme Pouiller
    Jérôme Pouiller almost 6 years
    I like idea, but does it means you have 256MiB of physical memory never used?
  • Michael Vigovsky
    Michael Vigovsky almost 6 years
    256MiB will be used for caches. Caches are really important, it's not about just running faster, system wouldn't work at all if there's no enough memory for caches. Code of every running program can be unloaded from memory because it's mmaped and can be read back from disk. Without caches every task switch will require disk read and system will become completely unresponsive.
  • PF4Public
    PF4Public over 4 years
    @MichaelVigovsky Do you have any proof of caching statement? It isn't mentioned at all in kernel docs: The amount of free memory in the system that should be reserved for users with the capability cap_sys_admin.
  • peterh
    peterh about 4 years
    @psusi Bad solution, if you disable swap, you will have disabled block cache in near-oom cases. You will have also degraded performance and overuse of the disks, exactly the opposite what the no-swap fans expect. It is because having no swap, you can not swap out not used memory to use as block cache. But reducing swap might be a solution to make the oom to kick earlier in.