How do I remove a failing disk from a LSI MegaRAID disk group?

6,927

According to https://www.45drives.com/wiki/index.php?title=How_do_I_replace_a_failed_drive_with_LSI_9280_cards%3F, the correct sequence is:

storcli /c0/e252/s4 set offline
storcli /c0/e252/s4 set missing
storcli /c0/eall/s4 spindown     // Note: /eall instead of /e252. No idea why.

After the first command, the output of storcli /c0/e252/s4 should show Offln. After the last command, it was UGood (Unconfigured Good) for me.

Note that the second command (set missing) failed for me.

Afterwards, the rebuild should start if you have a dedicated or global hot spare (DHS or GHS) and enabled automatic rebuild. To verify this, run

storcli /c0/eall/sall show rebuild

That will print something like this:

Controller = 0
Status = Success
Description = Show Drive Rebuild Status Succeeded.


----------------------------------------------------------
Drive-ID    Progress% Status          Estimated Time Left 
----------------------------------------------------------
/c0/e252/s0 -         Not in progress -                   
/c0/e252/s1 -         Not in progress -                   
/c0/e252/s2 -         Not in progress -                   
/c0/e252/s3 -         Not in progress -                   
/c0/e252/s4 -         Not in progress -                   
/c0/e252/s6 18        In progress     -                   
/c0/e252/s7 -         Not in progress -                   
----------------------------------------------------------

Note the value "In progress" for slot 6 (s6). The second column gives you the percentage of the rebuild (18%).

I'm using this small script to monitor progress:

while true ; do clear ; date ; storcli /c0/e252/s6 show rebuild ; sleep 5 ; done

To locate the failed drive, you can use this command:

storcli /c0/e252/s4 start locate

That should make the indicator light of your drive blink.

Share:
6,927

Related videos on Youtube

Aaron Digulla
Author by

Aaron Digulla

I'm a software developer living in Switzerland. You can reach me at digulla at hepe dot com.

Updated on September 18, 2022

Comments

  • Aaron Digulla
    Aaron Digulla almost 2 years

    One of the disks in group 0 (EID:Slot 252:4, DiskID 12) is starting to fail it's smart tests:

      1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       1837
    200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       57
    

    but I can't find any documentation how to remove disks from a disk group.

    Do I have to

    storcli /c0/e252/s4 set offline
    

    or rather

    storcli /c0/e252/s4 spindown
    

    or both? What's the difference between "spindown" and "offline"? What about

    storcli /c0/s4 set missing
    

    What does that do? What does "missing" mean?

    And how about the rebuild? Does that start automatically?

    If not, then I guess the "start rebuild" command is my friend but why do I have to specify a single disk for that? It would make much more sense to specify the disk group or volume to rebuild, no?