Computer freezing on almost full RAM, possibly disk cache problem

77,319

Solution 1

To fix this problem I have found that you need to set the following setting to something around 5%-6% of your total physical RAM, divided by the number of cores in the computer:

sysctl -w vm.min_free_kbytes=65536

Keep in mind that this is a per-core setting, so if I have 2GB RAM and two Cores, then I calculated 6% of only 1 GB and added a little extra just to be safe.

This forces the computer to try to keep this amount of RAM free, and in doing so limits the ability to cache disk files. Of course it still tries to cache them and immediately swap them out, so you should probably limit your swapping as well:

sysctl -w vm.swappiness=5

(100 = swap as often as possible, 0= swap only on total necessity)

The result is that linux no longer randomly decides to load a whole movie file of approx 1GB in ram while watching it, and killing the machine in doing so.

Now there is enough reserved space to avoid memory starvation, which aparrently was the problem (seeing as there are no more freezes like before).

After testing for a day - lockups are gone, sometimes there are minor slowdowns, because stuff gets cached more often, but I can live with that if I dont have to restart computer every few hours.

The lesson here is - default memory management is just one of use cases and is not allways the best, even though some people try to suggest otherwise - home entertainment ubuntu should be configured differently than server.


You probably want to make these settings permanent by adding them to your /etc/sysctl.conf like this:

vm.swappiness=5
vm.min_free_kbytes=65536

Solution 2

This happened for me in a new install of Ubuntu 14.04.

In my case, it had nothing to do with sysctl issues mentioned.

Instead, the problem was that the swap partition's UUID was different during installation than it was after installation. So my swap was never enabled, and my machine would lock up after a few hours use.

The solution was to check the current UUID of the swap partition with

sudo blkid

and then sudo nano /etc/fstab to replace the incorrect swap's UUID value with the one reported by blkid.

A simple reboot to affect the changes, and voila.

Solution 3

Nothing worked for me!!

So I wrote a script to monitor memory usage. It will first try to clear RAM cache if the memory consumption increases a threshold. You can configure this threshold on the script. If memory consumption doesn't come below the threshold even then, it will start killing processes on by one in decreasing order of memory consumption until the memory consumption is below the threshold. I have set it to 96% by default. You can configure it by changing the value of variable RAM_USAGE_THRESHOLD in the script.

I agree that killing processes which consume high memory is not the perfect solution, but it's better to kill ONE application instead of losing ALL the work!! the script will send you desktop notification if RAM usage increases the threshold. It will also notify you if it kills any process.

#!/usr/bin/env python
import psutil, time
import tkinter as tk
from subprocess import Popen, PIPE
import tkinter
from tkinter import messagebox
root = tkinter.Tk()
root.withdraw()

RAM_USAGE_THRESHOLD = 96
MAX_NUM_PROCESS_KILL = 100

def main():
    if psutil.virtual_memory().percent >= RAM_USAGE_THRESHOLD:
        # Clear RAM cache
        mem_warn = "Memory usage critical: {}%\nClearing RAM Cache".\
            format(psutil.virtual_memory().percent)
        print(mem_warn)
        Popen("notify-send \"{}\"".format(mem_warn), shell=True)
        print("Clearing RAM Cache")
        print(Popen('echo 1 > /proc/sys/vm/drop_caches',
                    stdout=PIPE, stderr=PIPE,
                    shell=True).communicate())
        post_cache_mssg = "Memory usage after clearing RAM cache: {}%".format(
                            psutil.virtual_memory().percent)
        Popen("notify-send \"{}\"".format(post_cache_mssg), shell=True)
        print(post_cache_mssg)

        if psutil.virtual_memory().percent < RAM_USAGE_THRESHOLD:
            print("Clearing RAM cache saved the day")
            return
        # Kill top C{MAX_NUM_PROCESS_KILL} highest memory consuming processes.
        ps_killed_notify = ""
        for i, ps in enumerate(sorted(psutil.process_iter(),
                                      key=lambda x: x.memory_percent(),
                                      reverse=True)):
            # Do not kill root
            if ps.pid == 1:
                continue
            elif (i > MAX_NUM_PROCESS_KILL) or \
                    (psutil.virtual_memory().percent < RAM_USAGE_THRESHOLD):
                messagebox.showwarning('Killed proccess - save_hang',
                                       ps_killed_notify)
                Popen("notify-send \"{}\"".format(ps_killed_notify), shell=True)
                return
            else:
                try:
                    ps_killed_mssg = "Killed {} {} ({}) which was consuming {" \
                                     "} % memory (memory usage={})". \
                        format(i, ps.name(), ps.pid, ps.memory_percent(),
                               psutil.virtual_memory().percent)
                    ps.kill()
                    time.sleep(1)
                    ps_killed_mssg += "Current memory usage={}".\
                        format(psutil.virtual_memory().percent)
                    print(ps_killed_mssg)
                    ps_killed_notify += ps_killed_mssg + "\n"
                except Exception as err:
                    print("Error while killing {}: {}".format(ps.pid, err))
    else:
        print("Memory usage = " + str(psutil.virtual_memory().percent))
    root.update()


if __name__ == "__main__":
    while True:
        try:
            main()
        except Exception as err:
            print(err)
        time.sleep(1)

Save the code in a file say save_hang.py. Run the script as:

sudo python save_hang.py

Please note that this script is compatible for Python 3 only and requires you to install tkinter package. you can install it as:

sudo apt-get install python3-tk

Hope this helps...

Solution 4

I know this question is old, but I had this problem in Ubuntu (Chrubuntu) 14.04 on an Acer C720 Chromebook. I tried Krišjānis Nesenbergs solution, and it worked somewhat, but still crashed sometimes.

I finally found a solution that worked by installing zram instead of using physical swap on the SSD. To install it I just followed the instructions here, like this:

sudo apt-get install zram-config

Afterwards I was able to configure the size of the zram swap by modifying /etc/init/zram-config.conf on line 21.

20: # Calculate the memory to user for zram (1/2 of ram)
21: mem=$(((totalmem / 2 / ${NRDEVICES}) * 1024))

I replaced the 2 with a 1 in order to make the zram size the same size as the amount of ram I have. Since doing so, I have had no more freezes or system unresponsiveness.

Solution 5

My guess is that you've set your vm.swappiness to a very low value, which causes the kernel to swap too late, leaving too low RAM for the system to work with.

You can show your current swappiness setting by executing:

sysctl vm.swappiness

By default, this is set to 60. The Ubuntu Wiki recommends to set it to 10, but feel free to set it to a higher value. You can change it by running:

sudo sysctl vm.swappiness=10

This will change it for the current session only, to make it persistent, you need to add vm.swappiness = 10 to the /etc/sysctl.conf file.

If your disk is slow, consider buying a new one.

Share:
77,319

Related videos on Youtube

Krišjānis Nesenbergs
Author by

Krišjānis Nesenbergs

Updated on September 18, 2022

Comments

  • Krišjānis Nesenbergs
    Krišjānis Nesenbergs over 1 year

    The problem I think is somewhat similar to this thread.

    It doesn't matter if I have swap enabled or disabled, whenever the real used RAM amount starts going close to maximum and there is almost no space left for disk cache, system becomes totally unresponsive.

    Disk is spinning wildly, and sometimes after long waits 10-30 minutes it will unfreeze, and sometimes not (or I run out of patience). Sometimes if I act quickly I can manage to slowly open console and kill some of ram eating applications like browser, and the system unfreezes almost instantly.

    Because of this problem I almost never see anything in the swap, only sometimes there are some few MB there, and then soon after this problem appears. My not so educated guess would be that it is connected somehow to the disk cache being too greedy, or memory management too lenient, so when the memory is needed it is not freed quickly enough and starves the system.

    Problem can be achieved really fast if working with large files (500MB+) which are loaded in disk cache and apparently afterwards system is unable to unload them fast enough.

    Any help or Ideas will be greatly appreciated.

    For now I have to live in constant fear, when doing something computer can just freeze and I usually have to restart It, if it is really running out of ram I would much more like it to just kill some of userspace applications, like browser (preferably if I could somehow mark which to kill first)

    Although the mystery is why doesn't swap save me in this situation.

    UPDATE: It didn't hang for some time, but now I got several occurrences again. I am now keeping ram monitor on my screen at all times and when the hang happened it still showed ~30% free (Used by disk cache probably). Additional symptoms: If at the time I am watching video (VLC player) the sound stops first, after a few seconds the image stops. While the sound has stopped I still have some control over PC, but when the image stops I cannot even move the mouse anymore, so I restarted it after some waiting. Btw, this didn't happen when I started to watch the video but some time in (20min) and I didn't actively do anything else at the time, even though browser and oowrite were open on the second screen the whole time. Basically something just decides to happen at one point and hangs the system.

    As per request in the comments I ran dmesg right after the hang. I didn't notice anything weird, but didn't know for what to look, so here it is: https://docs.google.com/document/d/1iQih0Ee2DwsGd3VuQZu0bPbg0JGjSOCRZhu0B05CMYs/edit?hl=en_US&authkey=CPzF7bcC

    • n3rd
      n3rd over 8 years
      This needs to get more attention. I know that there are bugs filed for many many years.
    • Dan Dascalescu
      Dan Dascalescu almost 8 years
      @n3rd: This is the bug.
    • Rick2047
      Rick2047 over 6 years
      @Krišjānis Nesenbergs: Please correct me if I am wrong copy pasting a long file also makes it hang.
    • Beto Aveiga
      Beto Aveiga almost 6 years
      Thanks for asking this question and finding a solution. Please add a date to the update, otherwise it is not clear what worked and what did not work. I'm having the same problem, I'm always checking memory levels, and I have 16GB, planning to have 32GB, to see if I can fix it that way...
  • Krišjānis Nesenbergs
    Krišjānis Nesenbergs almost 13 years
    Actually reducing swapiness reduced the problem (it happened more rarely). I am keeping it at 5 now. Although maybe it was another problem with higer swapinness, because, when it was 60, and I decided to watch a movie or edit a big file, whole file og almost a GB was loaded in memory and then instantly system started swapping out programs I was actively using and even user interface itself. The thing is I think I understand the swapping part, what I want is killing greedy user applications instead of freezing the machine when running out of ram. (And preferably limit file size in cache)
  • Lekensteyn
    Lekensteyn almost 13 years
    @Krisa: when the system runs out of memory (RAM and swap), the kernel calls oom_kill which kills processes to save memory. Unfortunately, you cannot control target processes. To trigger it manually, press the Alt + SysRq + F. When running the dmesg command, you should see some information (and the process name + id) of the process. I think you'd better off with buying a new, faster disk. Or upgrade your RAM.
  • Krišjānis Nesenbergs
    Krišjānis Nesenbergs almost 13 years
    The problem is, that oom_kill just doesn't get called before the computer has locked up for some 30 minues. Also - is there at least a way to know which process will be killed first?
  • Krišjānis Nesenbergs
    Krišjānis Nesenbergs almost 13 years
    I have 2GB Ram and the HDD is 5400rpm. I really dont think that it is such an old system which justifies half an hour freezes while watching some video on one monitor and browsing some 20-30 tabs in the other. Actually I would be quite happy if I just could allways access console and kill some processes - is there a way to make user input and terminal super high priority so it works while system freezes?
  • Lekensteyn
    Lekensteyn almost 13 years
    @Krisa: Hmm, I've always been running 7200rpm+ even on notebooks (and now running 0rpm). oom_kill is only called if you run out of memory (both swap and RAM). If the system is in the process of swapping, the system may become unresponsible with such a slow disk. You might be better off with disabling the swap (or setting a low swappiness) and uppgrade your RAM. 20-30 open tabs is a pretty high amount, I usually have about 10 tabs open (Firefox). 2GB on old P4 desktop, 8GB on notebook.
  • Lekensteyn
    Lekensteyn almost 13 years
    Can you run dmesg after such a hang? If oom_kill is called, a call trace is shown, CPU and memory usage information and the process name + PID of the killed program. Regarding priority, I'm afraid that I've not enough experience with that. You might be interested in What does 'Nice' mean on the processes tab.
  • Oxwivi
    Oxwivi almost 13 years
    Good find, try to report bugs about it so there's more awareness of the issue and hopefully someone will come up with a solution to not randomly load the whole movie,
  • odedbd
    odedbd over 10 years
    thanks, great detail and explains my issue. Much appreciated!
  • crazy2be
    crazy2be over 8 years
    Thank you so much! I have been struggling with this incredibly infuriating bug for something close to a year now, and had tried everything to fix it. Why does Linux have this behaviour? It seems like it ought to act like there is no swap, and just invoke the OOM-killer. Instead, it seems to pretend like there is swap, but then fail to actually swap things out (because there isn't actually, since it's improperly configured).
  • Mikko Rantalainen
    Mikko Rantalainen over 6 years
    Running without swap should not require increasing min_free_kbytes because that only affects when the kernel OOM Killer comes. The swappiness does nothing if you have zero swap space. You can think swappiness as a slider between sacrificing applications vs file cache if the RAM is needed - zero means sacrifice file cache always, 100 means try to keep files in cache and push running applications to swap if needed. Note that sacrificing file cache is practically same as swapping if the same files are needed later. The only difference is when the read from block device is needed.
  • Mikko Rantalainen
    Mikko Rantalainen over 6 years
    zram is viable option only if you cannot install more RAM. If the system is too slow when swapping to SSD and goes out of RAM without swap, then zram may help a little bit until you try to do a little bit more and the result is the same as out of RAM with no swap.
  • Mikko Rantalainen
    Mikko Rantalainen over 6 years
    You will probably experience better results with vfs_cache_pressure closer to 10 (that is, much less than 100) and setting min_free_kbytes higher. Be warned that if you set min_free_kbytes too high, kernel OOM killer will kill everyone!
  • Hitechcomputergeek
    Hitechcomputergeek over 6 years
    @MikkoRantalainen I've already raised min_free_kbytes to 262144, and I've observed that lowering vfs_cache_pressure has the opposite effect - lowering it below 100 makes the system become unresponsive much faster. I'm not sure why exactly.
  • Mikko Rantalainen
    Mikko Rantalainen over 6 years
    In general increasing vfs_cache_pressure will cause direntries to be thrown before cached file contents and as a result, overall performance is usually going to suffer with values over 100. If you can figure out steps to reproduce to crash/hang the system starting with e.g. Ubuntu Live CD then kernel developers can figure out the root cause. For me, the hang occurs without any warning. My best guess is that the kernel hangs due to OOM before OOM Killer has freed enough RAM. I'm now running min_free_kbytes=100000, admin_reserve_kbytes=250000 and user_reserve_kbytes=500000.
  • Mikko Rantalainen
    Mikko Rantalainen over 6 years
    (cont) I haven't yet crashed with above config even though I have swappiness=5 and vfs_cache_pressure=20. The system has 16 GB of RAM and 8 GB of swap on SSD. Another system has 32 GB of RAM and zero swap and it randomly seems to suffer from same issue - there pressing Alt+SysRq+f after system feels slow seems to help so I guess if OOM Killer were acting fast enough the system would not hang.
  • Migwell
    Migwell over 5 years
    To speed up this calculation, your total memory can be obtained from free --kilo, your number of processors can be obtained using nproc, and you can add them up using bash, e.g. echo $((8242745 * 0.06 / 4))
  • Chris Down
    Chris Down over 5 years
    "[vm.min_free_kbytes] forces the computer to try to keep this amount of RAM free, and in doing so limits the ability to cache disk files." -- sorry to bother, but this isn't related to what vm.min_free_kbytes does at all. It acts as a block of pages reserved to ease atomic (ie. fill or kill/non-__GFP_WAIT) allocations when under high system memory contention. It could indeed make sense to raise it here (as probably these stalls are related to system memory contention), but it would certainly not be for the reason described in this answer.
  • Martin Thornton
    Martin Thornton over 4 years
    @crazy2be It's not failing, it's succeeding endlessly. Even without any swap, Linux can still page out programs and unmodified files in memory and re-read them from disk.
  • Dietrich Epp
    Dietrich Epp over 4 years
    what about CPUs with hyperthreading? I have 4 physical cores and 8 Logical cores
  • Jeff Ward
    Jeff Ward over 4 years
    Awesome, thanks! FYI for others, I changed the top-line to #!/usr/bin/env python3 as I have both python2 and 3 installed, and default was 2.
  • Jeff Ward
    Jeff Ward over 4 years
    Just back to note that this script works fantastically! My desktop system (no swap... don't ask) hasn't frozen in the past month, and I get a handy notification every time something gets killed. 👍
  • Saim Raza
    Saim Raza over 4 years
    @JeffWard Glad to help :)
  • Michal Skop
    Michal Skop about 4 years
    I changed the if condition from memory only to swap and memory (to prevent killing processes when there is still swap available): if psutil.swap_memory().percent >= SWAP_USAGE_THRESHOLD and psutil.virtual_memory().percent >= RAM_USAGE_THRESHOLD and set experimentally SWAP_USAGE_THRESHOLD = 98
  • TheMSG
    TheMSG about 4 years
    I just stopped by to say that this solution fixed my problem on Ubuntu Budgie 19.10. Thanks a lot! @noone: My CPU has 6 cores and HT. I made the calculations for 12 "cores", and it's been working great so far.
  • amir beygi
    amir beygi over 3 years
    HAHA, I bought a 2500$ laptop with shit ton of RAM to fix that problem and it did not fix it, and now testing this solution on a 700$ laptop and it is working better than the 2500$ one (I am running 2 ubuntu server + couple of videos streaming and many more tabs open)