An oom killer that I cannot explain

7,279

OK, so lets go through each bit.

Active memory is regions of memory that get thrown to the top of the LRU stack (basically get called a lot).

Inactive memory is stuff thats not being used a lot and is a swap nomination should memory need to be swapped.

Free is genuinely free memory About 40Mb. What gives?

The clue is in these lines:

 DMA: 2358*4kB 912*8kB 25*16kB 0*32kB 0*64kB 0*128kB 
      0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 
      0*16384kB = 17128kB 
 Normal: 4266*4kB 657*8kB 32*16kB 1*32kB 0*64kB 0*128kB 
      0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB
      0*16384kB = 22864kB 

This stuff denotes memory fragmentation. This basically shows you how much memory is contiguously available. And here is your problem.

Of free normal memory, you have no more than 32kb of available contiguous memory. Your memory is horribly fragmented. This means if any application needs to allocate more than 32k of space, there is no memory -- so oom killer arrives to kick something out to give memory to do this.

So, what can you do.

The clue to that is this:

Free swap = 0kB 
Total swap = 0kB 

Oh dear! No swap! So, memory thats committed -- just stays there. Newer kernels these days actually 'defragment' memory to make region of memory contiguous, older ones dont do it.

You had 70Mb of memory that could has been swapped! Plus this wouldn't have occurred all in one go but gradually so would not have been a hit for you. But no swap, so no luck. You also have little memory for pagecache which is also bad and slow for your system. This potentially could of given a lot more free contiguous space too which would of been nice for you.

My advice to you. Get yourself 768Mb of swap. Honestly, you really do your kernel a disservice by not enabling it.

Swap is really important for releasing unused memory (a quarter of it in your case) and also would have avoided the nasty fragmentation problems you've experienced as memory could have been swapped out and released more contiguous space. And even if it did get swapped back in, it could have been put back into a region of memory which give you larger contiguous gaps.

Share:
7,279

Related videos on Youtube

Ankur Agarwal
Author by

Ankur Agarwal

Updated on September 18, 2022

Comments

  • Ankur Agarwal
    Ankur Agarwal over 1 year

    I am not able to understand why would kernel issue this oom killer when I see enough memory is available:

    I say enough memory is available after looking at

    Normal

    DMA

    Normal free lines

    This is an embedded nand flash based device with 256 MB RAM

    Kernel : 2.6.31

     myshellscript invoked oom-killer: gfp_mask=0xd0, order=2, oomkilladj=0 
     Backtrace: 
     [<c0106494>] (dump_backtrace+0x0/0x110) from [<c03641a0>] (dump_stack+0x18/0x1c) 
     r6:000000d0 r5:c9040c60 r4:00000002 r3:c0448690 
     [<c0364188>] (dump_stack+0x0/0x1c) from [<c015a314>] (oom_kill_process.clone.11+0x60/0x1b4) 
     [<c015a2b4>] (oom_kill_process.clone.11+0x0/0x1b4) from [<c015a738>] (__out_of_memory+0x154/0x178) 
     r8:c21e86e0 r7:001fb000 r6:00000002 r5:000000d0 r4:c9b6e000 
     [<c015a5e4>] (__out_of_memory+0x0/0x178) from [<c015a980>] (out_of_memory+0x68/0xa0) 
     [<c015a918>] (out_of_memory+0x0/0xa0) from [<c015d230>] (__alloc_pages_nodemask+0x42c/0x520) 
     r5:00000002 r4:000000d0 
     [<c015ce04>] (__alloc_pages_nodemask+0x0/0x520) from [<c015d388>] (__get_free_pages+0x18/0x44) 
     [<c015d370>] (__get_free_pages+0x0/0x44) from [<c0109418>] (get_pgd_slow+0x1c/0xe0) 
     [<c01093fc>] (get_pgd_slow+0x0/0xe0) from [<c0129ab0>] (mm_init.clone.43+0xb0/0xf0) 
     r7:c90858c0 r6:00000000 r5:c90858c0 r4:ce1a6680 
     [<c0129a00>] (mm_init.clone.43+0x0/0xf0) from [<c0129c40>] (mm_alloc+0x34/0x44) 
     r6:0009230c r5:c90858c0 r4:ce1a6680 r3:00000000 
     [<c0129c0c>] (mm_alloc+0x0/0x44) from [<c0180f70>] (bprm_mm_init+0x14/0x148) 
     r4:c5154000 r3:cd472564 
     [<c0180f5c>] (bprm_mm_init+0x0/0x148) from [<c01812d0>] (do_execve+0xa8/0x254) 
     [<c0181228>] (do_execve+0x0/0x254) from [<c0106000>] (sys_execve+0x3c/0x5c) 
     [<c0105fc4>] (sys_execve+0x0/0x5c) from [<c0102e80>] (ret_fast_syscall+0x0/0x2c) 
     r7:0000000b r6:0009230c r5:0009237c r4:000922fc 
     Mem-info: 
     DMA per-cpu: 
     CPU 0: hi: 18, btch: 3 usd: 0 
     Normal per-cpu: 
     CPU 0: hi: 42, btch: 7 usd: 0 
     Active_anon:28162 active_file:16 inactive_anon:18037 
     inactive_file:13 unevictable:0 dirty:0 writeback:0 unstable:0 
     free:9998 slab:2447 mapped:164 pagetables:701 bounce:0 
     DMA free:17128kB min:1560kB low:1948kB high:2340kB active_anon:51068kB inactive_anon:10320kB active_file:24kB inactive_file:0kB unevictable:0kB present:97536kB pages_scanned:0 all_unreclaimable? no 
     lowmem_reserve[]: 0 158 158 
     Normal free:22864kB min:2600kB low:3248kB high:3900kB active_anon:61580kB inactive_anon:61828kB active_file:40kB inactive_file:52kB unevictable:0kB present:162560kB pages_scanned:0 all_unreclaimable? no 
     lowmem_reserve[]: 0 0 0 
     DMA: 2358*4kB 912*8kB 25*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 17128kB 
     Normal: 4266*4kB 657*8kB 32*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 22864kB 
     26591 total pagecache pages 
     0 pages in swap cache 
     Swap cache stats: add 0, delete 0, find 0/0 
     Free swap = 0kB 
     Total swap = 0kB 
     65536 pages of RAM 
     10471 free pages 
     3967 reserved pages 
     2447 slab pages 
     892 shared page count 
     389 shared pages
     620 mapped shared page count
     177 mapped shared pages
     0 pages swap cached
     2481 dma reserved pages
     19892 total user pages
     20512 RSS sum by tasks
     20512 RSS sum by page stats
     164 user cache pages
     26427 kernel cache pages
    
    • DerfK
      DerfK almost 12 years
      What is myshellscript? That was the active process when it blew up. Something it did requested more than the 22MB memory you have free.
  • the-wabbit
    the-wabbit almost 12 years
    What I am wondering about: oom_killer reports how much memory has been requested and needs to be present in one piece: order=2 denotes a request for a frame of 2^2 contiguous pages which arguably are available on the system. So while fragmentation is clearly an issue here, I still don't see the actual reason. The gfp_mask of 0xd0 is defined as GFP_KERNEL, so maybe some specific thresholds are hit?
  • Ankur Agarwal
    Ankur Agarwal almost 12 years
    @Mlfe Thanks for the detailed answer. But the thing is this is a nand flash based embedded device. Swap cannot be enabled, this will reduce nand lifetime.
  • Matthew Ife
    Matthew Ife almost 12 years
    @abc Then I strongly suggest you upgrade to at least kernel 2.6.35 which introduces memory compaction to defragment your memory. lwn.net/Articles/368869
  • Matthew Ife
    Matthew Ife almost 12 years
    @syneticon-dj normal thresholds are in the low: and min: fields. Perhaps there is a threshold based on zone order, but nothing I now about.
  • Rob Fisher
    Rob Fisher almost 9 years
    The answer says, "if any application needs to allocate more than 32k of space, there is no memory", but we see from stackoverflow.com/a/4403582/991411 that applications don't need physically contiguous memory. So what does need physically contiguous memory that would be affected by fragmentation? Is it only kernel space code?
  • Matthew Ife
    Matthew Ife almost 9 years
    The only other thing I know that requires contiguous pages like this outside of kernel space is hugepage support.