How can you distinguish between a crash and a reboot on RHEL7?

systemd server-crashes rhel7 system-monitoring

8,087

Solution 1

There's more than one way to do this, but I'll cover the 4 best ones I can think of. (EDIT: I published a cleaned-up version of this as a public article on redhat.com. See: How to distinguish between a crash and a graceful reboot in RHEL 7.)

(1) auditd logs

auditd is amazing. You can see all the different events that it logs by checking ausearch -m. Apropos to the problem at hand, it logs system shutdown and system boot, so you can use the command ausearch -i -m system_boot,system_shutdown | tail -4. If this reports a SYSTEM_SHUTDOWN followed by a SYSTEM_BOOT, all is well; however, if it reports 2 SYSTEM_BOOT lines in a row, then clearly the system did not shutdown gracefully, as in the following example:

[root@a72 ~]# ausearch -i -m system_boot,system_shutdown | tail -4
----
type=SYSTEM_BOOT msg=audit(09/20/2016 01:10:32.392:7) : pid=657 uid=root auid=unset ses=unset subj=system_u:system_r:init_t:s0 msg=' comm=systemd-update-utmp exe=/usr/lib/systemd/systemd-update-utmp hostname=? addr=? terminal=? res=success' 
----
type=SYSTEM_BOOT msg=audit(09/20/2016 01:11:41.134:7) : pid=656 uid=root auid=unset ses=unset subj=system_u:system_r:init_t:s0 msg=' comm=systemd-update-utmp exe=/usr/lib/systemd/systemd-update-utmp hostname=? addr=? terminal=? res=success'

(2) last -x

Same as above, but with the simple last -n2 -x shutdown reboot command. Example where system crashed:

[root@a72 ~]# last -n2 -x shutdown reboot
reboot   system boot  3.10.0-327.el7.x Tue Sep 20 01:11 - 01:20  (00:08)    
reboot   system boot  3.10.0-327.el7.x Tue Sep 20 01:10 - 01:20  (00:09)

Or where system had a graceful reboot:

[root@a72 ~]# last -n2 -x shutdown reboot
reboot   system boot  3.10.0-327.el7.x Tue Sep 20 01:21 - 01:21  (00:00)    
shutdown system down  3.10.0-327.el7.x Tue Sep 20 01:21 - 01:21  (00:00)

(3) create your own service unit

This is IMHO the best approach because you can tailor it to whatever you want. There are a million ways to do this. Here's one I just made up. This next service only runs at shutdown.

[root@a72 ~]# cat /etc/systemd/system/set_gracefulshutdown.service
[Unit]
Description=Set flag for graceful shutdown
DefaultDependencies=no
RefuseManualStart=true
Before=shutdown.target

[Service]
Type=oneshot
ExecStart=/bin/touch /root/graceful_shutdown

[Install]
WantedBy=shutdown.target
[root@a72 ~]# systemctl enable set_gracefulshutdown.service 
Created symlink from /etc/systemd/system/shutdown.target.wants/set_gracefulshutdown.service to /etc/systemd/system/set_gracefulshutdown.service.

Then when the system boots, this next service will only start if the file created by the above shutdown service exists.

[root@a72 ~]# cat /etc/systemd/system/check_graceful.service 
[Unit]
Description=Check if system booted after a graceful shutdown
ConditionPathExists=/root/graceful_shutdown
RefuseManualStart=true
RefuseManualStop=true

[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/rm /root/graceful_shutdown

[Install]
WantedBy=multi-user.target
[root@a72 ~]# systemctl enable check_graceful
Created symlink from /etc/systemd/system/multi-user.target.wants/check_graceful.service to /etc/systemd/system/check_graceful.service.

So at any given time I can check if the previous boot was done after a graceful shutdown by doing systemctl is-active check_graceful, e.g.:

[root@a72 ~]# systemctl is-active check_graceful && echo YAY || echo OH NOES
active
YAY
[root@a72 ~]# systemctl status check_graceful
● check_graceful.service - Check if system booted after a graceful shutdown
   Loaded: loaded (/etc/systemd/system/check_graceful.service; enabled; vendor preset: disabled)
   Active: active (exited) since Tue 2016-09-20 01:10:32 EDT; 20s ago
  Process: 669 ExecStart=/bin/rm /root/graceful_shutdown (code=exited, status=0/SUCCESS)
 Main PID: 669 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/check_graceful.service

Sep 20 01:10:32 a72.example.com systemd[1]: Starting Check if system booted after a graceful shutdown...
Sep 20 01:10:32 a72.example.com systemd[1]: Started Check if system booted after a graceful shutdown.

Or here's after an ungraceful shutdown:

[root@a72 ~]# systemctl is-active check_graceful && echo YAY || echo OH NOES
inactive
OH NOES
[root@a72 ~]# systemctl status check_graceful
● check_graceful.service - Check if system booted after a graceful shutdown
   Loaded: loaded (/etc/systemd/system/check_graceful.service; enabled; vendor preset: disabled)
   Active: inactive (dead)
Condition: start condition failed at Tue 2016-09-20 01:11:41 EDT; 16s ago
           ConditionPathExists=/root/graceful_shutdown was not met

Sep 20 01:11:41 a72.example.com systemd[1]: Started Check if system booted after a graceful shutdown.

(4) journalctl

It is worth mentioning that if you configure systemd-journald to keep a persistent journal, you can then use journalctl -b -1 -n to look at the last few (10 by default) lines of the previous boot (-b -2 is the boot before that, etc). Example where the system rebooted gracefully:

[root@a72 ~]# mkdir /var/log/journal
[root@a72 ~]# systemctl -s SIGUSR1 kill systemd-journald
[root@a72 ~]# reboot
...
[root@a72 ~]# journalctl -b -1 -n
-- Logs begin at Tue 2016-09-20 01:01:15 EDT, end at Tue 2016-09-20 01:21:33 EDT. --
Sep 20 01:21:19 a72.example.com systemd[1]: Stopped Create Static Device Nodes in /dev.
Sep 20 01:21:19 a72.example.com systemd[1]: Stopping Create Static Device Nodes in /dev...
Sep 20 01:21:19 a72.example.com systemd[1]: Reached target Shutdown.
Sep 20 01:21:19 a72.example.com systemd[1]: Starting Shutdown.
Sep 20 01:21:19 a72.example.com systemd[1]: Reached target Final Step.
Sep 20 01:21:19 a72.example.com systemd[1]: Starting Final Step.
Sep 20 01:21:19 a72.example.com systemd[1]: Starting Reboot...
Sep 20 01:21:19 a72.example.com systemd[1]: Shutting down.
Sep 20 01:21:19 a72.example.com systemd-shutdown[1]: Sending SIGTERM to remaining processes...
Sep 20 01:21:19 a72.example.com systemd-journal[483]: Journal stopped

If you get good output like that, then clearly the system was shutdown gracefully. That said, it's not super-reliable in my experience when bad things happen (system crashes). Sometimes the indexing gets weird.

Solution 2

Funny, I just happened to reboot a CentOS 7 system last night, and so I have a nice log of just this to look at.

In the case of a crash, obviously nothing is logged between the time of the crash and the system restart.

In the case of a reboot, it is pretty obvious, as you get a log of (nearly) everything systemd is doing to shut down the system.

One such log entry you aren't likely to see in any circumstance other than shutting down or going to single-user mode is:

Jul 13 01:27:55 yaungol systemd: Stopped target Multi-User System.

You can reboot your own system to see what actually gets logged.

Solution 3

I don't particularly like the answer, but it's an answer we got from RH. I'm posting it here in case it helps someone else.

One possible way is to grep for rsyslogd in /var/log/messages. A graceful shutdown would have exiting on signal 15. A crash would not.

tac /var/log/messages | grep 'rsyslogd.*start\|rsyslogd.*exit'

Two consecutive start lines may indicate a crash. And a start followed by an exit may indicate a reboot.

Unfortunately it could also give bad results if rsyslogd goes down or is restarted outside a reboot/crash.

Solution 4

This seems to work consistently for "graceful shutdowns" (shutdown, reboot, systemctl) as well as "crashes" (power off, reset, echo c > /proc/sysrq-trigger):

last -x | grep 'reboot\|shutdown'

A reboot line followed by a shutdown line indicates a "graceful shutdown". Two reboot lines indicates a "crash".

View more solutions

8,087

kwb

Updated on September 18, 2022

Comments

kwb almost 2 years

Is there a way to determine whether a RHEL7 server was rebooted via systemctl (or reboot / shutdown aliases), or whether the server crashed? Pre-systemd this was fairly easy to determine with last -x runlevel, but with RHEL7 it's not so clear.
Stefan Lasiewski almost 8 years

Bad play Red Hat. There are other behaviors which will result in that same exiting on signal 15 besides a reboot. A normal service rsyslog restart also results in the message exiting on signal 15 message.
kwb almost 8 years

Would you believe CentOS 7 logs this and RHEL 7 does not? That was our initial approach based on what we saw in CentOS (and Fedora) logs. When we tested on RHEL7, no dice.
Michael Hampton almost 8 years

@kwb After taking a look at a RHEL 7.2 system, yes, I believe it. In fact, it appears that a lot of things that should be logged aren't being logged. All I can say to that is: WTF?
rsaw almost 8 years

Not sure what you guys are talking about. systemd in RHEL 7.0-7.2 generates the Stopping Multi-User System and Stopped target Multi-User System messages.
Michael Hampton almost 8 years

@rsaw We are well aware that the messages are generated. The problem is that they aren't appearing in the journal.
rsaw almost 8 years

@MichaelHampton the journal is not persistent by default. You can only see logs from your current boot unless you mkdir /var/log/journal or explicitly set Storage=persistent in /etc/systemd/journald.conf. I posted a separate answer.
rsaw almost 8 years

This is a valid answer, but as someone who works in Red Hat tech support, it's not what I would have gone with. See my answer.