vm/min_free_kbytes - Why Keep Minimum Reserved Memory?

53,731

Solution 1

(link is dead, looks like it's now here)

That text is referring to atomic allocations, which are requests for memory that must be satisfied without giving up control (i.e. the current thread can not be suspended). This happens most often in interrupt routines, but it applies to all cases where memory is needed while holding an essential lock. These allocations must be immediate, as you can't afford to wait for the swapper to free up memory.

See Linux-MM for a more thorough explanation, but here is the memory allocation process in short:

  • _alloc_pages first iterates over each memory zone looking for the first one that contains eligible free pages
  • _alloc_pages then wakes up the kswapd task [..to..] tap into the reserve memory pools maintained for each zone.
  • If the memory allocation still does not succeed, _alloc pages will either give up [..] In this process _alloc_pages executes a cond_resched() which may cause a sleep, which is why this branch is forbidden to allocations with GFP_ATOMIC.

min_free_kbytes is unlikely to help much with the described "ls -l takes 10-15 seconds to execute"; that is likely caused by general memory pressure and swapping rather than zone exhaustion. The min_free_kbytes setting only needs to allow enough free pages to handle immediate requests. As soon as normal operation is resumed, the swapper process can be run to rebalance the memory zones. The only time I've had to increase min_free_kbytes is after enabling jumbo frames on a network card that didn't support dma scattering.

To expand on your second question a bit, you will have better results tuning vm.swappiness and the dirty ratios mentioned in the linked article. However, be aware that optimizing for "ls -l" performance may cause other processes to become slower. Never optimize for a non-primary usecase.

Solution 2

All linux systems will attempt to make use of all physical memory available to the system, often through the creation of a filesystem buffer cache, which put simply is an I/O buffer to help improve system performance. Technically this memory is not in use, even though it is allocated for caching.

"wait for reclaim", in your question, refers to the process of reclaiming that cache memory that is "not in use" so that it can be allocated to a process. This is supposed to be transparent but in the real world there are many processes that do not wait for this memory to become available. Java is a good example, especially where a large minimum heap size has been set. The process tries to allocate the memory and if it is not instantly available in one large contiguous (atomic?) chunk, the process dies.

Reserving a certain amount of memory with min_free_kbytes allows this memory to be instantly available and reduces the memory pressure when new processes need to start, run and finish while there is a high memory load and a full buffer cache.

4MB does seem rather low because if the buffer cache is full, any process that wants an immediate allocation of more than 4MB will likely fail. The setting is very tunable and system-specific, but if you have a few GB of memory available it can't hurt to bump up the reserve memory to 128MB. I'm not sure what effect it will have on shell interactivity, but likely positive.

Solution 3

This memory is kept free from use by normal processes. As @Arno mentioned, the special processes that can run include interrupt routines, which must be run now (as it's an interrupt), and finish before any other processes can run (atomic). This can include things like swapping out memory to disk when memory is full.

If the memory is filled an interrupt (memory management) process runs to swap some memory into disk so it can free some memory for use by normal processes. But if vm.min_free_kbytes is too small for it to run, then it locks up the system. This is because this interrupt process must run first to free memory so others can run, but then it's stuck because it doesn't have enough reserved memory vm.min_free_kbytes to do its task resulting in a deadlock.

Also see:

Share:
53,731

Related videos on Youtube

user3063877
Author by

user3063877

Updated on July 09, 2022

Comments

  • user3063877
    user3063877 almost 2 years

    According to this article:

    /proc/sys/vm/min_free_kbytes: This controls the amount of memory that is kept free for use by special reserves including “atomic” allocations (those which cannot wait for reclaim)

    My question is that what does it mean by "those which cannot wait for reclaim"? In other words, I would like to understand why there's a need to tell the system to always keep a certain minimum amount of memory free and under what circumstances will this memory be used? [It must be used by something; don't see the need otherwise]

    My second question: does setting this memory to something higher than 4MB (on my system) leads to better performance? We have a server which occasionally exhibit very poor shell performance (e.g. ls -l takes 10-15 seconds to execute) when certain processes get going and if setting this number to something higher will lead to better shell performance?

  • Kalle Richter
    Kalle Richter over 7 years
    Link is dead again (redirecting to homepage).
  • grin
    grin over 7 years
    Since I cannot comment, here's the updated link of updated link of updated link of the mentioned page in the question and the top reply: - doc.opensuse.org/documentation/leap/tuning/html/book.sle.tun‌​ing/…
  • Hitechcomputergeek
    Hitechcomputergeek almost 7 years
    When almost all of the memory is taken and no swap is available, Linux performance tends to fall off a cliff for me (such that just getting to TTY1 to kill Firefox takes 15 minutes); raising /proc/sys/vm/vfs_cache_pressure to 6000 (default is 100) seems to help prevent this. I would not do that on a production server, however; the kernel documentation warns that "Increasing vfs_cache_pressure significantly beyond 100 may have negative performance impact."
  • Admin
    Admin over 5 years
    @Hitechcomputergeek you could experimentally try this kernel patch (as long as you're sure your swap is disabled) just to see if the 15minutes thing is reduced to 1-2 seconds, or actually to see if firefox gets OOM-killed automatically in 2 sec when you hit out of memory.