How do you check the health of individual hard drives in a RAID array?

ubuntu raid hard-drive

15,142

Solution 1

Typically, what you wan is a package called smartmontools. It can query the SMART interface on your disks, which is in most modern disks.

There is a daemon called smartd which can help you with continuous monitoring.

However, if your system is a home server, just checking manually is often better. Like so:

smartctl -a /dev/sda

A lot of data spews forth. The stuff that most interest me are the following:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   100   100   051    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   097   097   000    Old_age   Always       -       13946
 13 Read_Soft_Error_Rate    0x000e   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   075   066   000    Old_age   Always       -       25
194 Temperature_Celsius     0x0022   075   064   000    Old_age   Always       -       25
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x000a   100   100   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x000a   100   100   000    Old_age   Always       -       0

This gives you a way to measure the drive health subjectively. When the error rate starts going up, its time to look for a replacement. Also, you can check that they are not running hot.

Solution 2

Something like "mdadm --query --detail /dev/md0" should work, but when the drive actually fail, the root will receive an e-mail (it's the standard config on Centos and i believe on other distros as well). Just check that notification by failing (like: mdadm --manage /dev/md0 --fail /dev/sda1), and You will be 100% sure.

Solution 3

You are going to want to install smartd and look at your configuration options for it.

For me I have it specifically monitoring my RAID disks:

/dev/sda -a
/dev/sdb -a
/dev/sdc -a

This gives me drive monitoring for what I need.

You can also setup smartd to do full drive tests at specified times.

15,142

JayD3e

Updated on September 17, 2022

Comments

JayD3e almost 2 years

I'm running a simple 1 TB RAID 1 array with mdadm on Ubuntu Server 10.10. I would like to simply check the status of each hard drive to make sure their both functional before it is too late. How could I easily do this?
Mike over 13 years

also you can employ mdadm to monitor the raid itself as suggested by pitr but you would want to do both.