S.M.A.R.T - Predictive Failure Count
Solution 1
There are errors on your disk. S.M.A.R.T. stands for Self-Monitoring, Analysis and Reporting Technology
The specific errors you mention correlate to mechanical degradation of the drive. You can possibly use this report to obtain a warranty replacement fomr IBM. The drive WILL eventually fail.
Solution 2
From a Seagate doc:
Predictive failures
S.M.A.R.T. signals predictive failures when the drive is performing unacceptably for a period of time. The firmware keeps a running count of the number of times the error rate for each attribute is unacceptable. To accomplish this, a counter is incremented each time the error rate is unacceptable and decremented (not to exceed zero) whenever the error rate is acceptable. If the counter continually increments such that it reaches the predictive threshold, a predictive failure is signaled. This counter is referred to as the Failure
History Counter. There is a separate Failure History Counter for each attribute.
Here's out to locate the faulty disk:
MegaCli -PdLocate -start -physdrv\[E:S] -aA
- E : Enclosure
- S : Slot
- A : Adapter
Solution 3
The drive is physically failing at this point. The most important thing to worry about right now is having a good backup of your data, and a plan to get that drive replaced ASAP.
Related videos on Youtube
Bastien974
Updated on September 18, 2022Comments
-
Bastien974 almost 2 years
I'm monitoring my IBM ServeRAID M5015 controller for RAID status with MegaCLI, I have this on one of the disk :
Enclosure Device ID: 252 Slot Number: 6 Enclosure position: 0 Device Id: 14 Sequence Number: 2 Media Error Count: 32 Other Error Count: 0 Predictive Failure Count: 18 Last Predictive Failure Event Seq Number: 8119 PD Type: SAS Raw Size: 279.396 GB [0x22ecb25c Sectors] Non Coerced Size: 278.896 GB [0x22dcb25c Sectors] Coerced Size: 278.464 GB [0x22cee000 Sectors] Firmware state: Online, Spun Up SAS Address(0): 0x5000c50042c319c9 SAS Address(1): 0x0 Connected Port Number: 5(path0) Inquiry Data: IBM-ESXSST9300653SS B6336XN04HC10525B633 IBM FRU/CRU: 81Y9671 FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: None Device Speed: 6.0Gb/s Link Speed: 6.0Gb/s Media Type: Hard Disk Device Drive: Not Certified Drive Temperature :33 Celsius
What does this mean exactly ? I can't find an exact description, is there a way to have more details ? The RAID array has the Optimal state.
Media Error Count: 32
Predictive Failure Count: 18
Is there a way through the CLI to power-on the front LED so I physically know which disk I need to replace ?
-
Phil over 10 yearsA related tip for future use; when inserting drives into any hotswap systems label the caddy with the serial number (remember to keep it up to date!). It's never bad to have this information on the front of the system.
-
-
Bastien974 over 12 yearsObviously I'm gonna change it, but I still want to know what those number mean, is it considered high ? what if it's not increasing any more ?
-
DanBig over 12 yearsIts probably a read or write error. It will most likely rise as the drive continues to be used.