How do you investigate a system crash when there are no records/logs?

5,733

After talking to guiverc in the comments, I realized that I actually had to have a package called linux-crashdump. But because the server was installed using a minimal Ubuntu template, It did not come with this package preinstalled so when the crash happened nothing was logged. That's why I couldn't find anything.

For anyone who's investigating their crash reason and wonder why there are no log files in /var/crash, make sure you install linux-crashdump so hopefully next time you can have something to look at ;)

Share:
5,733

Related videos on Youtube

xperator
Author by

xperator

Updated on September 18, 2022

Comments

  • xperator
    xperator over 1 year

    TL;DR

    1. How does one actually investigates a system crash when the logs don't show anything?
    2. Secondly, how do I prepare for future crashes? Is it possible to have more aggressive/accurate logging? In case the system panics or freezes in a way that it didn't even had time to log.

    Few weeks ago I got 3 VPS machines (KVM) from a provider, and 2 of them crashed after a week ( at random/different times ). They all had 512MB ram ( with 512mb swap space ).

    One one of them actually was shutdown and had a "offline" label in the provider's admin panel, and the other was kinda frozen, the panel showed "Online" but I couldn't ssh or access to it though web console.

    None of them were running anything cpu/memory intensive tasks. One was just a openvpn server (with 2-3users) and the other just nginx+php serving a static site. Both of them had like 200-300 available memory at all times and the cpu was below 10% usage.

    I had Netdata monitoring installed. So I had a history of almost everything. I looked up every single chart and graph right before the crashes. There was no spike or sudden increase in CPU/Memory/Disk/Network/Process/Firewall usage.

    I looked up every single log file under /var/logs/. I read them line by line (before crash happened). I also used journalctl. There was no error, no warnings, no out of memory, no process killing, just normal events.

    Both the servers that crashed had a syslog that looked like this:

    enter image description here As you can see the ufw is just blocking random spammers right before the crash and then there is no log. Also the boot you see at 20:41:02 is the hard/forced reboot we did after the crash happened, just to get the system back online.

    When I asked the provide they said everything looks ok on their side and the reason my servers crashed was because 512MB RAM was too low and I had to upgrade.

    Also, there are 2 things that I randomly read on the internet that I thought I ask here if they're an actual thing.

    • "Micro RAM spikes, for example rotating ram tables to disk, etc"
    • a parameter called journal_data_writeback that if it's enabled, the system might miss writing logs to the disk during a crash.