How to get email alert if one of raid 1 disks fails?

5,668

Blazer, it looks like in the process of improving your question (which is now a good one, by the way), you've found your own answer. Well done, you! But there is a little more that could usefully be said.

As far as I know, that mdadm.conf will suffice for you to get automated notifications. Certainly, mine looks very little different to that, and I know from a recent failout test that I get notifications. (I'm a little curious about the second slash in /dev/md/0, but if that's what your system wrote, it's very likely right.)

But it's axiomatic in professional sysadmin that, unless you've tested something, you can't really know that it works.

At the very least, you will want to check that you can send mail from that system, as root, to the specified gmail.com address, and receive it.

If I were you, I'd at least perform a soft failure test. You can do that with mdadm /dev/md0 -f /dev/sdb1. That will fail the second partition out of the array, and should generate a formal notification to you (check your system's mail logs to see if it's gone). Check the output of cat /proc/mdstat so you know what a half-bad array looks like.

You can resync the array later with mdadm /dev/md0 -a /dev/sdb1, and check that it's sync'ed back with another cat /proc/mdstat.

If you want to go the whole hog, schedule some downtime, try pulling one of the drives, and check that the system can still boot. Where the metadevice in question is the boot partition, people sometimes forget to have a GRUB boot block on both drives, so when the second one fails, their system becomes unbootable. Replace and resync the drive later.

Whatever tests you decide to do, document them, so that when there's a real failure, you know what to expect, and you can minimise the chance of pilot error trashing the second drive.

Share:
5,668

Related videos on Youtube

Blazer
Author by

Blazer

Updated on September 18, 2022

Comments

  • Blazer
    Blazer almost 2 years

    I need to know how can I get email alert if one of raid 1 disks fail to work/crashes. I have CentOS 6.4 64bits, software raid.

    I did some mistake folowing this tutorial, because it was a bottom note

    NOTE: It has been found that mdadm will not send an e-mail if the DEVICE partitions section does not exist in the /etc/mdadm.conf file. If those sections do not exist a new /etc/mdadm.conf file can be created by using the following command: mdadm –detail –scan > /etc/mdadm.conf"

    I executed that line and my mdadm.conf file was empty whith that response from ssh : "mdadm: An option must be given to set the mode before a second device (–scan) is listed"

    I also undestand I have to start it using this ssh line: mdadm –monitor –scan –daemonize but I get this response " mdadm: An option must be given to set the mode before a second device (–scan) is listed "

    this is 'cat /proc/mdstat' :

      Personalities : [raid1]
      md0 : active raid1 sdb1[1] sda2[0]
      117153664 blocks super 1.1 [2/2] [UU]
      bitmap: 1/1 pages [4KB], 65536KB chunk
    
       unused devices: <none>
    

    and

     mdadm -D /dev/md0
    /dev/md0:
        Version : 1.1
      Creation Time : Sat Aug 17 09:19:15 2013
     Raid Level : raid1
      Array Size : 117153664 (111.73 GiB 119.97 GB)
     Used Dev Size : 117153664 (111.73 GiB 119.97 GB)
      Raid Devices : 2
     Total Devices : 2
     Persistence : Superblock is persistent
    
     Intent Bitmap : Internal
    
      Update Time : Mon Sep 16 18:55:19 2013
          State : active
      Active Devices : 2
     Working Devices : 2
     Failed Devices : 0
     Spare Devices : 0
    
           Name : trader:0
           UUID : 0944131a:0513ca86:cb8ad6c5:3baca49f
         Events : 1751
    
    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync   /dev/sda2
       1       8       17        1      active sync   /dev/sdb1
    

    mdadm.conf file generated 1 minute ago with mdadm --examine --scan > /etc/mdadm.conf :

      MAILADDR [email protected]
      ARRAY /dev/md/0 metadata=1.1 UUID=0944131a:0513ca86:cb8ad6c5:3baca49f name=trader:0
    

    Is this enough to get email notifications if one hdd fails for my case ?

    • MadHatter
      MadHatter almost 11 years
      man mdadm.conf, look for MAILADDR.
    • Blazer
      Blazer almost 11 years
      Thanks but I did some mistake folowing this tutoria novell.com/support/kb/doc.php?id=7001034, because it was a bottom note 'NOTE: It has been found that mdadm will not send an e-mail if the DEVICE partitions section does not exist in the /etc/mdadm.conf file. If those sections do not exist a new /etc/mdadm.conf file can be created by using the following command: mdadm –detail –scan > /etc/mdadm.conf" I executed that line and my mdadm.conf file is empty now :( whith that response from ssh : mdadm: An option must be given to set the mode before a second device (–scan) is listed
    • Blazer
      Blazer almost 11 years
      Now I need to restore my file I dont know how.
    • MadHatter
      MadHatter almost 11 years
      Blazer, it sounds like you have done some research and work already (good for you!), and that you have a more specific question than the one you've asked. Your question at the moment is attracting justified downvotes. I recommend you edit it, and say what you've tried to do, why, what's happened, what you expected, and what the problem is; that will be a respectable question, and hopefully you'll get good answers to it. In short: do the work to write a good question in the first place; don't make us tease the story out of you one diagnostic at a time.
    • Blazer
      Blazer almost 11 years
      I just did it MadHatter.
    • tgharold
      tgharold almost 11 years
      Blazer - You need to look into doing version control of your file system, or some sort of backups. My preference is to version /etc, /usr/local, and /opt using FSVS (fsvs.tigris.org) which stores the data in an SVN repository. That way I can commit all changes prior to installing software, commit again after the install, commit again after configuration. Then I can always go back in time to see what I changed and when.
  • Dave M
    Dave M over 4 years
    While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - From Review
  • MeSo2
    MeSo2 over 4 years
    @DaveM good point, I added the essential parts to the answer.