Log from oom killer

7,759

I'm not sure which log you're actually referring to but there is a logging system that's often included with most distros called sar, it's typicaly in a package called sysstat.

Also I wrote up this U&L Q&A that covers a variety of methods for logging performance information, titled: " Commands for determining level of usage of server"

Additional sar references

Share:
7,759

Related videos on Youtube

ygram
Author by

ygram

Developer of the sc2 ranking site www.rankedftw.com. I also work, but that's another story.

Updated on September 18, 2022

Comments

  • ygram
    ygram over 1 year

    I'm running a virtual machine at a provider where you pay for everything you use (CPU, memory, disk IO, bandwidth, etc). Currently my machine has 512MB ram and 1GB swap area.

    The kernel version is 3.8.0-21 and the distribution is Ubuntu 13.04 but I suspect the kernel may be custom made, because kernel updates are held back.

    I run a few cron jobs where Python processes are doing things with a PostgreSQL database. Lately the Python processes have been killed by the OOM killer.

    I don't need help in solving the actual problem, I can increase the memory, reschedule the cron jobs, turn off overcommitting of memory, tune the PostgreŚQL settings, change how the Python programs work, but first I would like to know exactly why this happens (to make the correct fix). The thing is that I have plenty of swap free at the time of the kill but no physical memory.

    I have the following in the kernel log:

    [4003981.650341] select 1 (init), adj 0, size 249, to kill
    [4003981.650349] select 609 (rsyslogd), adj 0, size 609, to kill
    [4003981.650359] select 17139 (postgres), adj 0, size 635, to kill
    [4003981.650361] select 10381 (postgres), adj 0, size 6719, to kill
    [4003981.650365] select 14153 (postgres), adj 0, size 7296, to kill
    [4003981.650367] select 14159 (postgres), adj 0, size 7300, to kill
    [4003981.650370] select 26802 (python3), adj 0, size 70767, to kill
    [4003981.650372] send sigkill to 26802 (python3), adj 0, size 70767
    

    I run wmstat every second around the same time (the Python process was killed right before 12:13:48):

    2014-02-01 12:13:43 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
    2014-02-01 12:13:43  r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa
    2014-02-01 12:13:43  1  0     55      5    216    217    0    0   964   396  386  514 71 10  0 18
    2014-02-01 12:13:44  1  1     55     87    185    166    0    0 22304 14536  907 1241 53  9  0 38
    2014-02-01 12:13:45  1  0     55     60    190    189    0    0 21768 17344 1216 4581 21 26  0 53
    2014-02-01 12:13:46  1  1     57      6    218    214    0    0 22264  4836 1031 3696 22 43  0 35
    2014-02-01 12:13:47  2  1     59      4    217    218    0    0 28228 29892 1045 6234 22 34  0 44
    2014-02-01 12:13:48  1  2     73    272     97     74    0    0 39436 14372  975 3708 10 38  0 52
    2014-02-01 12:13:49  1  0     73    185    173     85    0    0 78400   356 1154 1943 23 33  0 44
    2014-02-01 12:13:51  1  1     73    247    132     65    0    0  1936     0  165  188  7 13 69 11
    

    Now to the actual question: I have seen other questions here (like this), where there is a log with more details about the memory (DMA, Normal memory, etc). I can't seem to find it anywhere on my system. It is something that needs to be turned on or where can I find it?

    The information I'm looking for looks something like this:

    Free pages:        6524kB (0kB HighMem)
    Active:20 inactive:23 dirty:0 writeback:0 unstable:0 free:1631 slab:874 mapped:20 pagetables:90
    DMA free:6524kB min:1200kB low:1500kB high:1800kB active:80kB inactive:92kB present:16384kB pages_scanned:41 all_unreclaimable? no
    lowmem_reserve[]: 0 0 0
    Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
    lowmem_reserve[]: 0 0 0
    HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
    lowmem_reserve[]: 0 0 0
    DMA: 411*4kB 252*8kB 113*16kB 27*32kB 1*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 6524kB
    Normal: empty
    HighMem: empty
    Swap cache: add 24282, delete 24269, find 7705/11790, race 0+0
    Free swap  = 124768kB
    Total swap = 128448kB
    Out of Memory: Killed process 453 (smbd).
    
    • Mark Plotnick
      Mark Plotnick over 10 years
      In every x86 linux system I've used, the oom-killer logs DMA and Normal memory counts etc. to the kernel log. In the ubuntu 13.10 linux kernel source I looked`at, the strings in your sample output above, "select.*size.*to kill", occur only in the android driver, in lowmemorykiller.c. They don't occur at all in mm/oom_kill.c. Could you tell us a bit more about the architecture and hypervisor of the virtual machine you're running in?
    • ygram
      ygram over 10 years
      @MarkPlotnick I do not know what architecture and hypervisor they use. I don't know how to see that from within the guest instance, maybe I could email their support and ask for it.
    • kw1c-MuurmansLorentz
      kw1c-MuurmansLorentz over 4 years
      Now dstat provides the feature to find out in your running system which process is candidate for getting killed by oom mechanism dstat --top-oom --out-of-memory--- kill score java 77 java 77 java 77 stackoverflow.com/a/25622401/4268594
  • ygram
    ygram over 10 years
    I added some example of the log I am interested in. I will also look into SAR and see if I can get the information there.
  • slm
    slm over 10 years
    @ygram - look around the OOM project's site: linux-mm.org/OOM