Linux not freeing large disk cache when memory demand goes up

linux memory memory-usage disk-cache

36,687

Solution 1

I have discovered the answer to my own question - thanks to womble's help (submit an answer if you like).

lsof -s shows file handles in use, and turns out there were several gigabytes of mmap'd log files taking up the cache.

Implementing a logrotate should resolve the issue completely and allow me to take advantage of more memory.

I will also re-enable swap so we have no problems with the OOM killer in the future. Thanks.

Solution 2

Apparently, postgres' shared_buffers can show up in cached, while not really being easily discardable... See OOM despite available memory (cache)

36,687

trisweb

User Experience Manager in Boston. I'm a designer who can grok assembly, and a UC Berkeley CS grad, Go Bears!

Updated on September 18, 2022

Comments

trisweb over 1 year
Running Ubuntu on a 2.6.31-302 x86-64 kernel. The overall problem is that I have memory in the 'cached' category that keeps on going up and will not be freed or used even when our application needs it.

So here's what I get out of the 'free' command. None of this looks out of the ordinary at first glance.
```
# free
             total       used       free     shared    buffers     cached
Mem:       7358492    5750320    1608172          0       7848    1443820
-/+ buffers/cache:    4298652    3059840
Swap:            0          0          0
```
The first thing someone's going to say is "Don't worry, linux manages that memory automatically." Yes, I know how the memory manager is supposed to work; the problem is that it's not doing the right thing. The "cached" 1.4 GB here appears to be reserved and unusable.

My knowledge of Linux tells me that 3 GB is "free"; but the behavior of the system says otherwise. When the 1.6 GB of real free memory is used up during peak usage, as soon as more memory is demanded (and the 'free' in the first column approaches 0) the OOM killer is invoked, processes are killed, and problems start to arise even though the 'free' in the -/+ buffers/cache row still has about 1.4 GB 'free'.

I've tuned the oom_adj values on key processes so it doesn't bring the system to its knees, but even then important processes will be killed, and we never want to reach that point. Especially when, theoretically, 1.4GB is still "free" if it would only evict the disk cache.

Does anyone have any idea what's going on here? The internet is flooded with the dumb questions about the Linux 'free' command and "why don't I have any free memory" and I can't find anything about this issue because of that.

The first thing that pops into my head is that swap is off. We have a sysadmin that is adamant about it; I am open to explanations if they're backed up. Could this cause problems?

Here's free after running echo 3 > /proc/sys/vm/drop_caches :
```
# free
             total       used       free     shared    buffers     cached
Mem:       7358492    5731688    1626804          0        524    1406000
-/+ buffers/cache:    4325164    3033328
Swap:            0          0          0
```
As you can see, some minuscule amount of cache is actually freed up, but around 1.4 GB appears to be "stuck." The other problem is that this value seems to rise over time. On another server 2.0 GB is stuck.

I'd really like this memory back... any help would be most appreciated.

Here's cat /proc/meminfo if it's worth anything:
```
# cat /proc/meminfo 
MemTotal:        7358492 kB
MemFree:         1472180 kB
Buffers:            5328 kB
Cached:          1435456 kB
SwapCached:            0 kB
Active:          5524644 kB
Inactive:          41380 kB
Active(anon):    5492108 kB
Inactive(anon):        0 kB
Active(file):      32536 kB
Inactive(file):    41380 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:               320 kB
Writeback:             0 kB
AnonPages:       4125252 kB
Mapped:            42536 kB
Slab:              29432 kB
SReclaimable:      13872 kB
SUnreclaim:        15560 kB
PageTables:            0 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     3679244 kB
Committed_AS:    7223012 kB
VmallocTotal:   34359738367 kB
VmallocUsed:        7696 kB
VmallocChunk:   34359729675 kB
DirectMap4k:     7340032 kB
DirectMap2M:           0 kB
```
- trisweb almost 13 years
  
  My thoughts exactly. Thanks for the advice. Do you know any other good articles or arguments on why swap is necessary?
- womble almost 13 years
  
  Because if you don't have swap, things like this happen. But don't bother trying to argue with your swap denier; either break out the quicklime or say "if you don't want swap on here, you fix this mess you've insisted on creating". They'll either eventually change their mind themselves or they'll die trying. Problem solved either way.
- trisweb almost 13 years
  
  Excellent, thanks for the tips. You were right about mmap'd files by the way - a quick lsof showed gigs of log files taking up the memory. Clearing them out solved the issue.
- David Schwartz over 9 years
  
  The problem is that without swap, overcommitting results in the OOM killer running and not overcommitting results in a system that can't launch processes. You need swap to make effective use of RAM.
psusi almost 13 years

mmap'd pages are discardable so that should not cause the cache to be pinned. Are you using a ramfs?
Ram over 7 years

Hi, sorry to dig up an old thread, but I'm facing the same issue currently and lsof -s doesn't show any unusual usage. However, I am using a ramfs like you said [and the 2.6.10 kernel, which doesn't have the drop_caches feature]. What do you think is the likely suspect?
Nickolay almost 7 years

Thanks for the tip! I'm adding lsof -s | sort -rnk 7 | less to my toolbox now. A note for other readers: this may large entries like /proc/net/rpc/nfs4.nametoid/channel, but they didn't turn out to be the culprit in my case.
Michael Martinez over 6 years

make sure your large files or programs aren't using mlock. in /proc/meminfo look at "Unevictable" pages.
Naveed Abbas over 6 years

Obviously the correct answer. See also here.