How do you investigate a system crash when there are no records/logs?

server crash log

5,733

After talking to guiverc in the comments, I realized that I actually had to have a package called linux-crashdump. But because the server was installed using a minimal Ubuntu template, It did not come with this package preinstalled so when the crash happened nothing was logged. That's why I couldn't find anything.

For anyone who's investigating their crash reason and wonder why there are no log files in /var/crash, make sure you install linux-crashdump so hopefully next time you can have something to look at ;)

5,733

xperator

Updated on September 18, 2022

Comments

xperator over 1 year
TL;DR
1. How does one actually investigates a system crash when the logs don't show anything?
2. Secondly, how do I prepare for future crashes? Is it possible to have more aggressive/accurate logging? In case the system panics or freezes in a way that it didn't even had time to log.
Few weeks ago I got 3 VPS machines (KVM) from a provider, and 2 of them crashed after a week ( at random/different times ). They all had 512MB ram ( with 512mb swap space ).

One one of them actually was shutdown and had a "offline" label in the provider's admin panel, and the other was kinda frozen, the panel showed "Online" but I couldn't ssh or access to it though web console.

None of them were running anything cpu/memory intensive tasks. One was just a openvpn server (with 2-3users) and the other just nginx+php serving a static site. Both of them had like 200-300 available memory at all times and the cpu was below 10% usage.

I had Netdata monitoring installed. So I had a history of almost everything. I looked up every single chart and graph right before the crashes. There was no spike or sudden increase in CPU/Memory/Disk/Network/Process/Firewall usage.

I looked up every single log file under /var/logs/. I read them line by line (before crash happened). I also used journalctl. There was no error, no warnings, no out of memory, no process killing, just normal events.

Both the servers that crashed had a syslog that looked like this:

As you can see the ufw is just blocking random spammers right before the crash and then there is no log. Also the boot you see at 20:41:02 is the hard/forced reboot we did after the crash happened, just to get the system back online.

When I asked the provide they said everything looks ok on their side and the reason my servers crashed was because 512MB RAM was too low and I had to upgrade.

Also, there are 2 things that I randomly read on the internet that I thought I ask here if they're an actual thing.
- "Micro RAM spikes, for example rotating ram tables to disk, etc"
- a parameter called journal_data_writeback that if it's enabled, the system might miss writing logs to the disk during a crash.