how to know if the server runs out of RAM before crashing down

19,269

Solution 1

You should check /var/log/messages The dmesg command will not be useful in this case because it only shows you the kernel messages since last boot.

"Running out of memory" is not usually enough to completely crash Linux. Linux will start killing processes when it runs out of memory (OOM killer). So you would probably look for some kernel panic. If you're using less to read the logs, you can search pressing the / key.

But the bottom line is: you should first read /var/log/messages. It is ordered by time, so it's easy to find the moment when the server last booted. Check what happened before that, which caused your server to crash.

Solution 2

if linux runs out of memory it usually starts the OOM killer (Out Of Memory). Thats a kernel process which goes around killing other procs to free memory. if this happens you should see according logs when you enter dmesg.

try this: dmesg | grep -i oom. if there is no output, the OOM killer probably didnt kill your process.

Share:
19,269

Related videos on Youtube

Pelang
Author by

Pelang

Updated on September 18, 2022

Comments

  • Pelang
    Pelang over 1 year

    I have a server that keeps on crashing. I know there are several causes for a server to crashes down. But if the cause is that the system is running out of RAM before it crash down; how should I confirm that is cause? What log files should I look? And what line/error mes should I look for? I am running CentOS. With heavy usage of php parsing xml files over 2 gigabytes at most. The server has 16GB RAM.

    EDIT 1

    [root@61540 ~]# free -m
                 total       used       free     shared    buffers     cached
    Mem:         16035       1526      14509          0         40       1002
    -/+ buffers/cache:        483      15552
    Swap:         8197          0       8197
    

    EDIT 2 /var/log/messages

    Feb 17 20:38:26 61540 syslogd 1.4.1: restart.
    Feb 17 20:38:26 61540 proftpd[3896]: 66.90.101.85 - received SIGHUP -- master server reparsing configuration file
    Feb 17 22:23:06 61540 avahi-daemon[3984]: recvmsg(): Resource temporarily unavailable
    Feb 17 23:07:37 61540 proftpd[10620] - (Several lines of ftp session)
    Feb 18 23:03:48 61540 syslogd 1.4.1: restart.
    Feb 18 23:03:48 61540 kernel: klogd 1.4.1, log source = /proc/kmsg started.
    Feb 18 23:03:48 61540 kernel: Linux version 2.6.18-308.el5 ([email protected]) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-52)) #1 SMP Tue Feb 21 20:06:06 EST 2012
    Feb 18 23:03:48 61540 kernel: Command line: ro root=LABEL=/
    Feb 18 23:03:48 61540 kernel: BIOS-provided physical RAM map:
    Feb 18 23:03:48 61540 kernel:  BIOS-e820: 0000000000010000 - 000000000009a000 (usable)
    Feb 18 23:03:48 61540 kernel:  BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
    Feb 18 23:03:48 61540 kernel:  BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
    Feb 18 23:03:48 61540 kernel:  BIOS-e820: 0000000000100000 - 00000000cfda0000 (usable)
    Feb 18 23:03:48 61540 kernel:  BIOS-e820: 00000000cfda0000 - 00000000cfdd1000 (ACPI NVS)
    Feb 18 23:03:48 61540 kernel:  BIOS-e820: 00000000cfdd1000 - 00000000cfe00000 (ACPI data)
    Feb 18 23:03:48 61540 kernel:  BIOS-e820: 00000000cfe00000 - 00000000cff00000 (reserved)
    Feb 18 23:03:48 61540 kernel:  BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
    Feb 18 23:03:48 61540 kernel:  BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
    Feb 18 23:03:48 61540 kernel:  BIOS-e820: 0000000100000000 - 000000042f000000 (usable)
    Feb 18 23:03:48 61540 kernel: DMI 2.4 present.
    Feb 18 23:03:48 61540 kernel: No NUMA configuration found
    Feb 18 23:03:48 61540 kernel: Faking a node at 0000000000000000-000000042f000000
    Feb 18 23:03:48 61540 kernel: Bootmem setup node 0 0000000000000000-000000042f000000
    Feb 18 23:03:48 61540 kernel: Memory for crash kernel (0x0 to 0x0) notwithin permissible range
    Feb 18 23:03:48 61540 kernel: disabling kdump
    Feb 18 23:03:48 61540 kernel: ACPI: PM-Timer IO Port: 0x808
    Feb 18 23:03:48 61540 kernel: ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
    Feb 18 23:03:48 61540 kernel: Processor #0 5:1 APIC version 16
    Feb 18 23:03:48 61540 kernel: ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
    Feb 18 23:03:48 61540 kernel: Processor #1 5:1 APIC version 16
    Feb 18 23:03:48 61540 kernel: ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
    Feb 18 23:03:48 61540 kernel: Processor #2 5:1 APIC version 16
    Feb 18 23:03:48 61540 kernel: ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled)
    Feb 18 23:03:48 61540 kernel: Processor #3 5:1 APIC version 16
    Feb 18 23:03:48 61540 kernel: ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] enabled)
    Feb 18 23:03:48 61540 kernel: Processor #4 5:1 APIC version 16
    Feb 18 23:03:48 61540 kernel: ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] enabled)
    Feb 18 23:03:48 61540 kernel: Processor #5 5:1 APIC version 16
    Feb 18 23:03:48 61540 kernel: ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] enabled)
    Feb 18 23:03:48 61540 kernel: Processor #6 5:1 APIC version 16
    Feb 18 23:03:48 61540 kernel: ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] enabled)
    Feb 18 23:03:48 61540 kernel: Processor #7 5:1 APIC version 16
    
    • MadHatter
      MadHatter about 11 years
      How much swap has it got?
    • Pelang
      Pelang about 11 years
      @MadHatter I am not quite sure how to check the swap. but i just run the free -m command. i posted the result on the question. thanks
    • Ency
      Ency about 11 years
      Couldn't be RAM corrupted somewhere at higher addresses and that cause crash?
    • Pelang
      Pelang about 11 years
      @Ency How could I determine that there is a problem on the RAM? Can I see it a log files? or there test or commands available to check the RAM?
    • Ency
      Ency about 11 years
      @Mark memtest (you should see it in grub/lilo menu after install) can do that, but it will require server down time.
    • Greg Petersen
      Greg Petersen about 11 years
      grep -i 'out of memory' /var/log/messages*?
  • Pelang
    Pelang about 11 years
    i tried "dmesg | grep -i oom" and there is no output. so running out of RAM could not be the reason the server crashes down? Can I accurately say that?
  • nickgrim
    nickgrim about 11 years
    This assumes that you haven't restarted it since the crashes, since dmesg shows the contents of the running kernel's message buffer. You'll have to look in logfiles for stuff that happened before the last reboot.
  • Pelang
    Pelang about 11 years
    @nickgrim I have already restarted the server after the crashes. Since I cannot access it. Are there other ways to check RAM errors relating to running out?
  • nickgrim
    nickgrim about 11 years
    Like I said - logfiles. There's some detail in @pablitom's answer.
  • Sean Brill
    Sean Brill about 11 years
    what exactly do you actually mean when you say crash? are you at the terminal or connecting via SSH? is it like completely dead, or just slow, or are your server processes dying while the system stays up?
  • Sean Brill
    Sean Brill about 11 years
    So you don't see the terminal output saying something like kernel panic? or you can't see the terminal?
  • Pelang
    Pelang about 11 years
    I check on /var/log/messages I did not see something wrong. I have posted the logs above on the quesion. thanks
  • Pelang
    Pelang about 11 years
    @mauro.stettler yes. i did not see anything like kernel panic. I just know that I cannot access the directadmin, no ping, and no ssh. I used to manage the server thru ssh, I have no physical access. I need to ask my hosting everytime to restart the server. I am going to confirm from them if they can still access the server from their end when I ask them for the restart. thanks for giving me this idea.
  • Greg Petersen
    Greg Petersen about 11 years
    grep -i 'out of memory' /var/log/messages*?