S.M.A.R.T - Predictive Failure Count

21,773

Solution 1

There are errors on your disk. S.M.A.R.T. stands for Self-Monitoring, Analysis and Reporting Technology

The specific errors you mention correlate to mechanical degradation of the drive. You can possibly use this report to obtain a warranty replacement fomr IBM. The drive WILL eventually fail.

Solution 2

From a Seagate doc:

Predictive failures

S.M.A.R.T. signals predictive failures when the drive is performing unacceptably for a period of time. The firmware keeps a running count of the number of times the error rate for each attribute is unacceptable. To accomplish this, a counter is incremented each time the error rate is unacceptable and decremented (not to exceed zero) whenever the error rate is acceptable. If the counter continually increments such that it reaches the predictive threshold, a predictive failure is signaled. This counter is referred to as the Failure

History Counter. There is a separate Failure History Counter for each attribute.

Here's out to locate the faulty disk:

MegaCli -PdLocate -start -physdrv\[E:S] -aA
  • E : Enclosure
  • S : Slot
  • A : Adapter

Solution 3

The drive is physically failing at this point. The most important thing to worry about right now is having a good backup of your data, and a plan to get that drive replaced ASAP.

Share:
21,773

Related videos on Youtube

Bastien974
Author by

Bastien974

Updated on September 18, 2022

Comments

  • Bastien974
    Bastien974 almost 2 years

    I'm monitoring my IBM ServeRAID M5015 controller for RAID status with MegaCLI, I have this on one of the disk :

    Enclosure Device ID: 252
    Slot Number: 6
    Enclosure position: 0
    Device Id: 14
    Sequence Number: 2
    Media Error Count: 32
    Other Error Count: 0
    Predictive Failure Count: 18
    Last Predictive Failure Event Seq Number: 8119
    PD Type: SAS
    Raw Size: 279.396 GB [0x22ecb25c Sectors]
    Non Coerced Size: 278.896 GB [0x22dcb25c Sectors]
    Coerced Size: 278.464 GB [0x22cee000 Sectors]
    Firmware state: Online, Spun Up
    SAS Address(0): 0x5000c50042c319c9
    SAS Address(1): 0x0
    Connected Port Number: 5(path0)
    Inquiry Data: IBM-ESXSST9300653SS     B6336XN04HC10525B633
    IBM FRU/CRU: 81Y9671
    FDE Capable: Not Capable
    FDE Enable: Disable
    Secured: Unsecured
    Locked: Unlocked
    Needs EKM Attention: No
    Foreign State: None
    Device Speed: 6.0Gb/s
    Link Speed: 6.0Gb/s
    Media Type: Hard Disk Device
    Drive:  Not Certified
    Drive Temperature :33 Celsius
    

    What does this mean exactly ? I can't find an exact description, is there a way to have more details ? The RAID array has the Optimal state.

    Media Error Count: 32

    Predictive Failure Count: 18

    Is there a way through the CLI to power-on the front LED so I physically know which disk I need to replace ?

    • Phil
      Phil over 10 years
      A related tip for future use; when inserting drives into any hotswap systems label the caddy with the serial number (remember to keep it up to date!). It's never bad to have this information on the front of the system.
  • Bastien974
    Bastien974 over 12 years
    Obviously I'm gonna change it, but I still want to know what those number mean, is it considered high ? what if it's not increasing any more ?
  • DanBig
    DanBig over 12 years
    Its probably a read or write error. It will most likely rise as the drive continues to be used.