How do you investigate a system crash when there are no records/logs?
After talking to guiverc in the comments, I realized that I actually had to have a package called linux-crashdump
. But because the server was installed using a minimal Ubuntu template, It did not come with this package preinstalled so when the crash happened nothing was logged. That's why I couldn't find anything.
For anyone who's investigating their crash reason and wonder why there are no log files in /var/crash
, make sure you install linux-crashdump
so hopefully next time you can have something to look at ;)
Related videos on Youtube
xperator
Updated on September 18, 2022Comments
-
xperator over 1 year
TL;DR
- How does one actually investigates a system crash when the logs don't show anything?
- Secondly, how do I prepare for future crashes? Is it possible to have more aggressive/accurate logging? In case the system panics or freezes in a way that it didn't even had time to log.
Few weeks ago I got 3 VPS machines (KVM) from a provider, and 2 of them crashed after a week ( at random/different times ). They all had 512MB ram ( with 512mb swap space ).
One one of them actually was shutdown and had a "offline" label in the provider's admin panel, and the other was kinda frozen, the panel showed "Online" but I couldn't ssh or access to it though web console.
None of them were running anything cpu/memory intensive tasks. One was just a openvpn server (with 2-3users) and the other just nginx+php serving a static site. Both of them had like 200-300 available memory at all times and the cpu was below 10% usage.
I had Netdata monitoring installed. So I had a history of almost everything. I looked up every single chart and graph right before the crashes. There was no spike or sudden increase in CPU/Memory/Disk/Network/Process/Firewall usage.
I looked up every single log file under
/var/logs/
. I read them line by line (before crash happened). I also usedjournalctl
. There was no error, no warnings, no out of memory, no process killing, just normal events.Both the servers that crashed had a
syslog
that looked like this:As you can see the ufw is just blocking random spammers right before the crash and then there is no log. Also the boot you see at
20:41:02
is the hard/forced reboot we did after the crash happened, just to get the system back online.When I asked the provide they said everything looks ok on their side and the reason my servers crashed was because 512MB RAM was too low and I had to upgrade.
Also, there are 2 things that I randomly read on the internet that I thought I ask here if they're an actual thing.
- "Micro RAM spikes, for example rotating ram tables to disk, etc"
- a parameter called
journal_data_writeback
that if it's enabled, the system might miss writing logs to the disk during a crash.