get notification when systemd-monitored service enters failed state
Solution 1
systemd units support OnFailure that will activate a unit (or more) when the unit goes to failed. You can put something like
OnFailure=notify-failed@%n
And then create the [email protected]
service where you can use the required specifier (you probably will want at least %i) to launch the script or command that will send notification.
You can see a practical example in http://northernlightlabs.se/systemd.status.mail.on.unit.failure
Solution 2
Just my way to notify :
/etc/systemd/system/[email protected]
[Unit]
Description=Sent email
[Service]
Type=oneshot
ExecStart=/usr/bin/bash -c '/usr/bin/systemctl status %i | /usr/bin/mailx -Ssendwait -s "[SYSTEMD_%i] Fail" [email protected]'
[Install]
WantedBy=multi-user.target
add to systemd:
systemctl enable /etc/systemd/system/[email protected]
At others services add:
[Unit]
OnFailure=notify-email@%i.service
Reload the configuration:
systemctl daemon-reload
Related videos on Youtube
Display Name
☼̡͓̪̰͉͚͓̙̫̫͎͕̤ͥ̉͒ͥ̒̅̀͗̓̍̉̾̄͂͆̋̒̅̀͘͘ ☘̸̼̯͍̹̼̱̤̘͓ͤ̈̊̏̒̾͐̓̈́̅̇͒ͨ̾ͮͧ̌̓͢
Updated on September 18, 2022Comments
-
Display Name over 1 year
I need to have network messages sent when a systemd service I have crashes or is hung (i.e., enters failed state; I monitor for hung by using WatchdogSec=). I noticed that newer systemd have FailureAction=, but then saw that this doesn't allow arbitrary commands, but just rebooting/shutdown.
Specifically, I need a way to have one network message sent when systemd detects the program has crashed, and another when it detects it has hung.
I'm hoping for a better answer than "parse the logs", and I need something that has a near-instant response time, so I don't think a polling approach is good; it should be something triggered by the event occurring.
-
orodbhen almost 8 yearsThere are a couple corrections needed to the instructions on the linked site. First,
notify%n.service
is redundant, and will result in[email protected]
. Second,%i
should be used instead of%I
, or all dashes in the name will be converted to forward slashes. -
Vladimir Panteleev over 6 yearsIs there a way to do this for multiple or all units, without modifying their unit files?
-
starbeamrainbowlabs over 4 yearsIs there a way to avoid triggering it lots of times in a row? In some situations receiving 1K emails about a service that failed at night and tried over and over again to restart itself isn't helpful.
-
Nefritox about 4 yearsAs far I know, no, there is no option from systemd. You should put some control into the bash command, something like touching a file and checking if it have +10min for example... in simple command logic: find -mmin +10 && send email && touch file ;
-
Guss over 2 years@VladimirPanteleev - you don't need to modify the actual unit files - you can just add an override for that specific feature. For example, run
systemctl edit my-service.service
and in the editor that opens add a line[Unit]
followed byOnFailure=notify-failed@%n
, save and exit. This will create an override file in/etc/systemd/system/my-service.service.d/override.conf
with the added functionality (of course you can automate the creation of such files for multiple services, just don't forget to dosystemctl daemon-reload
if you modified files not throughsystemctl
). -
Prajith about 2 yearsWhy are you enabling the notification service? It's supposed to be started by other units, no reason to start it on boot.
-
Admin about 2 yearsFor anybody looking to do this for all service files at once, check Example 3 at the very end of systemd.unit. You need to place a configuration under
service.d
directory and it will apply to all services.