Docker "cannot allocate memory" - virtual memory tuning

8,255

TL;DR

sudo su
sysctl -w vm.swappiness=10

Explanation

I've created a testing scenario where I can reproduce this error 10/10 times. This is just building a larger image directly via command line rather than through CI.

as mentioned workaround I knew was

echo 1 > /proc/sys/vm/drop_caches

So I've tried to corelate it to DirectMap values. Since I learned that those values represent TLB load and cannot be tuned directly I've looked up the preference value to use them and that is swappiness.

RHLE 7 Docs explain swappiness:

⁠swappiness

The swappiness value, ranging from 0 to 100, controls the degree to which the system favors anonymous memory or the page cache. A high value improves file-system performance while aggressively swapping less active processes out of RAM. A low value avoids swapping processes out of memory, which usually decreases latency at the cost of I/O performance. The default value is 60.

WARNING
Setting swappiness==0 will very aggressively avoids swapping out, which > increase the risk of OOM killing under strong memory and I/O pressure.

So reducing it lowers the reliance on memory cache pages. By default, EC2 Centos 7 Images that we use, set it to 30 so reducing it to 10 made the large image built successfully 10/10 times.

Share:
8,255

Related videos on Youtube

JackLeo
Author by

JackLeo

Updated on September 18, 2022

Comments

  • JackLeo
    JackLeo almost 2 years

    We are building or running Docker containers in our Jenkins instances built on top of Centos7 within AWS EC2. We have 2 instances of t2.medium boxes with 2 CPUs and 3.5 Gb of Available memory.
    In once case we are building the containers in another we are just pulling them and running (different container).

    We started to get errors

    open /var/lib/docker/overlay/<sha>-init/merged/dev/console: cannot allocate memory
    

    and in journalctl we get

    page allocation failure: order:4
    

    Running page cache dump resolves the issue for a while

    echo 1 > /proc/sys/vm/drop_caches
    

    So what I noticed that while running the docker task, Dirty memory block spikes (as it should) and Mapped jumps after it. However, the DirectMap4k is relatively close to that jump.

    For example:
    Idle machine

    cat /proc/meminfo | grep -P "(Dirty|Mapped|DirectMap4k)"
    Dirty:               104 kB
    Mapped:            45696 kB
    DirectMap4k:      100352 kB
    

    Active Machine

    cat /proc/meminfo | grep -P "(Dirty|Mapped|DirectMap4k)"
    Dirty:             72428 kB
    Mapped:            70192 kB
    DirectMap4k:      100352 kB
    

    So this machine takes some time to start failing, whereas identical machine reports DirectMap4k: 77824 kB and thus fails regularly (it also has to handle building more complex container), but sysctl vm is identical.

    The underlying problem that build/boot of the docker container throws out of memory error and the question is what needs to be tuned for the kernel to make it stable.


    Docker version

    Client:
     Version:      17.06.0-ce
     API version:  1.30
     Go version:   go1.8.3
     Git commit:   02c1d87
     Built:        Fri Jun 23 21:20:36 2017
     OS/Arch:      linux/amd64
    
    Server:
     Version:      17.06.0-ce
     API version:  1.30 (minimum version 1.12)
     Go version:   go1.8.3
     Git commit:   02c1d87
     Built:        Fri Jun 23 21:21:56 2017
     OS/Arch:      linux/amd64
     Experimental: false
    

    Kernel 3.10.0-327.10.1.el7.x86_64

    sysctl vm

    vm.admin_reserve_kbytes = 8192
    vm.block_dump = 0
    vm.dirty_background_bytes = 0
    vm.dirty_background_ratio = 10
    vm.dirty_bytes = 0
    vm.dirty_expire_centisecs = 3000
    vm.dirty_ratio = 30
    vm.dirty_writeback_centisecs = 500
    vm.drop_caches = 1
    vm.extfrag_threshold = 500
    vm.hugepages_treat_as_movable = 0
    vm.hugetlb_shm_group = 0
    vm.laptop_mode = 0
    vm.legacy_va_layout = 0
    vm.lowmem_reserve_ratio = 256   256     32
    vm.max_map_count = 65530
    vm.memory_failure_early_kill = 0
    vm.memory_failure_recovery = 1
    vm.min_free_kbytes = 67584
    vm.min_slab_ratio = 5
    vm.min_unmapped_ratio = 1
    vm.mmap_min_addr = 4096
    vm.nr_hugepages = 0
    vm.nr_hugepages_mempolicy = 0
    vm.nr_overcommit_hugepages = 0
    vm.nr_pdflush_threads = 0
    vm.numa_zonelist_order = default
    vm.oom_dump_tasks = 1
    vm.oom_kill_allocating_task = 0
    vm.overcommit_kbytes = 0
    vm.overcommit_memory = 0
    vm.overcommit_ratio = 50
    vm.page-cluster = 3
    vm.panic_on_oom = 0
    vm.percpu_pagelist_fraction = 0
    vm.stat_interval = 1
    vm.swappiness = 30
    vm.user_reserve_kbytes = 108990
    vm.vfs_cache_pressure = 100
    vm.zone_reclaim_mode = 0
    
  • rohitsakala
    rohitsakala over 6 years
    Hi, My swapiness is set to 10. still I get the same error. Should I decrease even more ? If so, upto which value can I decrease it ? Thanks :)
  • rohitsakala
    rohitsakala over 6 years
    Also, I am using Centos and don't have swapping enabled.
  • JackLeo
    JackLeo over 6 years
    Long story short - not sure. That's why I had removed this as an accepted answer. It might solve it, it might not. I've noticed that on our older systems eventually, it did not.