Debug out-of-memory with /var/log/messages

115,585

Solution 1

The kernel will have logged a bunch of stuff before this happened, but most of it will probably not be in /var/log/messages, depending on how your (r)syslogd is configured. Try:

grep oom /var/log/*
grep total_vm /var/log/*

The former should show up a bunch of times and the latter in only one or two places. That is the file you want to look at.

Find the original "Out of memory" line in one of the files that also contains total_vm. Thirty second to a minute (could be more, could be less) before that line you'll find something like:

kernel: foobar invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0

You should also find a table somewhere between that line and the "Out of memory" line with headers like this:

[ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name

This may not tell you much more than you already know, but the fields are:

  • pid The process ID.
  • uid User ID.
  • tgid Thread group ID.
  • total_vm Virtual memory use (in 4 kB pages)
  • rss Resident memory use (in 4 kB pages)
  • nr_ptes Page table entries
  • swapents Swap entries
  • oom_score_adj Usually 0; a lower number indicates the process will be less likely to die when the OOM killer is invoked.

You can mostly ignore nr_ptes and swapents although I believe these are factors in determining who gets killed. This is not necessarily the process using the most memory, but it very likely is. For more about the selection process, see here. Basically, the process that ends up with the highest oom score is killed -- that's the "score" reported on the "Out of memory" line; unfortunately the other scores aren't reported but that table provides some clues in terms of factors.

Again, this probably won't do much more than illuminate the obvious: the system ran out of memory and mysqld was choosen to die because killing it would release the most resources. This does not necessary mean mysqld is doing anything wrong. You can look at the table to see if anything else went way out of line at the time, but there may not be any clear culprit: the system can run out of memory simply because you misjudged or misconfigured the running processes.

Solution 2

The key to this is in the message itself - Out of memory. When the Linux kernel is starved of virtual memory (physical RAM plus swap) it will start killing processes and that's exactly what's happened here. It looks like mysqld was using over 2GB of virtual memory.

How much RAM and swap does the system have? I'd consider adding extra RAM or, if that's not possible, adding extra swap. As a quick fix to at least prevent processes from being terminated you could add a swap file.

Update: Looking at the amount of RAM you have, you can immediately see the problem. You have ~1.6GB of RAM and 100MB of swap yet MySQL is using a lot more RAM than that. That explains why you are seeing processes being terminated.

Share:
115,585

Related videos on Youtube

ibedelovski
Author by

ibedelovski

DevOps Engineer

Updated on September 18, 2022

Comments

  • ibedelovski
    ibedelovski almost 2 years

    The following report is thrown in my messages log:

    kernel: Out of memory: Kill process 9163 (mysqld) score 511 or sacrifice child
    kernel: Killed process 9163, UID 27, (mysqld) total-vm:2457368kB, anon-rss:816780kB, file-rss:4kB
    

    Doesn't matter if this problem is for httpd, mysqld or postfix but I am curious how can I continue debugging the problem.

    How can I get more info about why the PID 9163 is killed and I am not sure if linux keeps history for the terminated PIDs somewhere.

    If this occur in your message log file how you will troubleshoot this issue step by step?

    # free -m
    
                 total       used       free     shared    buffers     cached
    Mem:          1655        934        721          0         10         52
    -/+ buffers/cache:        871        784
    Swap:          109          6        103`
    
    • Stark07
      Stark07 about 10 years
      what all messages about the problem show up in dmesg ?
    • slm
      slm over 5 years
      Useful details about OOM - linux-mm.org/OOM_Killer.
  • ibedelovski
    ibedelovski about 10 years
    total used free shared buffers cached Mem: 1655 934 721 0 10 52 -/+ buffers/cache: 871 784 Swap: 109 6 103 this is memory output at same time when the process was killed
  • mjturner
    mjturner about 10 years
    Can you perhaps paste that in the original message, with formatting retained? Would make it easier to read.
  • ibedelovski
    ibedelovski about 10 years
    I am not really good in formatting... but already paste it in the original message
  • phemmer
    phemmer about 10 years
    dmesg is where this is guaranteed to be. It'll only be in /var/log if the syslog daemon reads from /dev/kmsg (which it usually does though).
  • goldilocks
    goldilocks about 10 years
    @Patrick That depends on when you go to look. If it's recorded in one of the normal file logs (it should be, or you have done something stupid with your logger), it will be there for a long time, whereas by this point, if the OP wants to diagnose a problem that occurred yesterday, or the day before, etc., the record may not be in dmesg anymore even if the system was left running.