get notification when systemd-monitored service enters failed state

37,104

Solution 1

systemd units support OnFailure that will activate a unit (or more) when the unit goes to failed. You can put something like

 OnFailure=notify-failed@%n

And then create the [email protected] service where you can use the required specifier (you probably will want at least %i) to launch the script or command that will send notification.

You can see a practical example in http://northernlightlabs.se/systemd.status.mail.on.unit.failure

Solution 2

Just my way to notify :

/etc/systemd/system/[email protected]

[Unit]
Description=Sent email 

[Service]
Type=oneshot
ExecStart=/usr/bin/bash -c '/usr/bin/systemctl status %i | /usr/bin/mailx -Ssendwait -s "[SYSTEMD_%i] Fail" [email protected]'

[Install]
WantedBy=multi-user.target

add to systemd:

systemctl enable /etc/systemd/system/[email protected]

At others services add:

[Unit]
OnFailure=notify-email@%i.service

Reload the configuration:

systemctl daemon-reload
Share:
37,104

Related videos on Youtube

Display Name
Author by

Display Name

☼̡͓̪̰͉͚͓̙̫̫͎͕̤ͥ̉͒ͥ̒̅̀͗̓̍̉̾̄͂͆̋̒̅̀͘͘ ☘̸̼̯͍̹̼̱̤̘͓ͤ̈̊̏̒̾͐̓̈́̅̇͒ͨ̾ͮͧ̌̓͢

Updated on September 18, 2022

Comments

  • Display Name
    Display Name over 1 year

    I need to have network messages sent when a systemd service I have crashes or is hung (i.e., enters failed state; I monitor for hung by using WatchdogSec=). I noticed that newer systemd have FailureAction=, but then saw that this doesn't allow arbitrary commands, but just rebooting/shutdown.

    Specifically, I need a way to have one network message sent when systemd detects the program has crashed, and another when it detects it has hung.

    I'm hoping for a better answer than "parse the logs", and I need something that has a near-instant response time, so I don't think a polling approach is good; it should be something triggered by the event occurring.

  • orodbhen
    orodbhen almost 8 years
    There are a couple corrections needed to the instructions on the linked site. First, notify%n.service is redundant, and will result in [email protected]. Second, %i should be used instead of %I, or all dashes in the name will be converted to forward slashes.
  • Vladimir Panteleev
    Vladimir Panteleev over 6 years
    Is there a way to do this for multiple or all units, without modifying their unit files?
  • starbeamrainbowlabs
    starbeamrainbowlabs over 4 years
    Is there a way to avoid triggering it lots of times in a row? In some situations receiving 1K emails about a service that failed at night and tried over and over again to restart itself isn't helpful.
  • Nefritox
    Nefritox about 4 years
    As far I know, no, there is no option from systemd. You should put some control into the bash command, something like touching a file and checking if it have +10min for example... in simple command logic: find -mmin +10 && send email && touch file ;
  • Guss
    Guss over 2 years
    @VladimirPanteleev - you don't need to modify the actual unit files - you can just add an override for that specific feature. For example, run systemctl edit my-service.service and in the editor that opens add a line [Unit] followed by OnFailure=notify-failed@%n, save and exit. This will create an override file in /etc/systemd/system/my-service.service.d/override.conf with the added functionality (of course you can automate the creation of such files for multiple services, just don't forget to do systemctl daemon-reload if you modified files not through systemctl).
  • Prajith
    Prajith about 2 years
    Why are you enabling the notification service? It's supposed to be started by other units, no reason to start it on boot.
  • Admin
    Admin about 2 years
    For anybody looking to do this for all service files at once, check Example 3 at the very end of systemd.unit. You need to place a configuration under service.d directory and it will apply to all services.