Docker "cannot allocate memory" - virtual memory tuning
TL;DR
sudo su
sysctl -w vm.swappiness=10
Explanation
I've created a testing scenario where I can reproduce this error 10/10 times. This is just building a larger image directly via command line rather than through CI.
as mentioned workaround I knew was
echo 1 > /proc/sys/vm/drop_caches
So I've tried to corelate it to DirectMap
values. Since I learned that those values represent TLB load and cannot be tuned directly I've looked up the preference value to use them and that is swappiness.
RHLE 7 Docs explain swappiness:
swappiness
The swappiness value, ranging from 0 to 100, controls the degree to which the system favors anonymous memory or the page cache. A high value improves file-system performance while aggressively swapping less active processes out of RAM. A low value avoids swapping processes out of memory, which usually decreases latency at the cost of I/O performance. The default value is 60.
WARNING
Setting swappiness==0 will very aggressively avoids swapping out, which > increase the risk of OOM killing under strong memory and I/O pressure.
So reducing it lowers the reliance on memory cache pages. By default, EC2 Centos 7 Images that we use, set it to 30 so reducing it to 10 made the large image built successfully 10/10 times.
Related videos on Youtube
JackLeo
Updated on September 18, 2022Comments
-
JackLeo almost 2 years
We are building or running Docker containers in our Jenkins instances built on top of Centos7 within AWS EC2. We have 2 instances of t2.medium boxes with 2 CPUs and 3.5 Gb of Available memory.
In once case we are building the containers in another we are just pulling them and running (different container).We started to get errors
open /var/lib/docker/overlay/<sha>-init/merged/dev/console: cannot allocate memory
and in
journalctl
we getpage allocation failure: order:4
Running page cache dump resolves the issue for a while
echo 1 > /proc/sys/vm/drop_caches
So what I noticed that while running the docker task,
Dirty
memory block spikes (as it should) andMapped
jumps after it. However, theDirectMap4k
is relatively close to that jump.For example:
Idle machinecat /proc/meminfo | grep -P "(Dirty|Mapped|DirectMap4k)" Dirty: 104 kB Mapped: 45696 kB DirectMap4k: 100352 kB
Active Machine
cat /proc/meminfo | grep -P "(Dirty|Mapped|DirectMap4k)" Dirty: 72428 kB Mapped: 70192 kB DirectMap4k: 100352 kB
So this machine takes some time to start failing, whereas identical machine reports
DirectMap4k: 77824 kB
and thus fails regularly (it also has to handle building more complex container), butsysctl vm
is identical.The underlying problem that build/boot of the docker container throws out of memory error and the question is what needs to be tuned for the kernel to make it stable.
Docker version
Client: Version: 17.06.0-ce API version: 1.30 Go version: go1.8.3 Git commit: 02c1d87 Built: Fri Jun 23 21:20:36 2017 OS/Arch: linux/amd64 Server: Version: 17.06.0-ce API version: 1.30 (minimum version 1.12) Go version: go1.8.3 Git commit: 02c1d87 Built: Fri Jun 23 21:21:56 2017 OS/Arch: linux/amd64 Experimental: false
Kernel
3.10.0-327.10.1.el7.x86_64
sysctl vm
vm.admin_reserve_kbytes = 8192 vm.block_dump = 0 vm.dirty_background_bytes = 0 vm.dirty_background_ratio = 10 vm.dirty_bytes = 0 vm.dirty_expire_centisecs = 3000 vm.dirty_ratio = 30 vm.dirty_writeback_centisecs = 500 vm.drop_caches = 1 vm.extfrag_threshold = 500 vm.hugepages_treat_as_movable = 0 vm.hugetlb_shm_group = 0 vm.laptop_mode = 0 vm.legacy_va_layout = 0 vm.lowmem_reserve_ratio = 256 256 32 vm.max_map_count = 65530 vm.memory_failure_early_kill = 0 vm.memory_failure_recovery = 1 vm.min_free_kbytes = 67584 vm.min_slab_ratio = 5 vm.min_unmapped_ratio = 1 vm.mmap_min_addr = 4096 vm.nr_hugepages = 0 vm.nr_hugepages_mempolicy = 0 vm.nr_overcommit_hugepages = 0 vm.nr_pdflush_threads = 0 vm.numa_zonelist_order = default vm.oom_dump_tasks = 1 vm.oom_kill_allocating_task = 0 vm.overcommit_kbytes = 0 vm.overcommit_memory = 0 vm.overcommit_ratio = 50 vm.page-cluster = 3 vm.panic_on_oom = 0 vm.percpu_pagelist_fraction = 0 vm.stat_interval = 1 vm.swappiness = 30 vm.user_reserve_kbytes = 108990 vm.vfs_cache_pressure = 100 vm.zone_reclaim_mode = 0
-
rohitsakala over 6 yearsHi, My swapiness is set to 10. still I get the same error. Should I decrease even more ? If so, upto which value can I decrease it ? Thanks :)
-
rohitsakala over 6 yearsAlso, I am using Centos and don't have swapping enabled.
-
JackLeo over 6 yearsLong story short - not sure. That's why I had removed this as an accepted answer. It might solve it, it might not. I've noticed that on our older systems eventually, it did not.