HP Smart Array; How to safely remove a physcial drive with SMART predictive failure from array so it can be replaced?

44,804

Solution 1

It is safe to run those commands. The mirror group can survive the absence of one disk. It should rebuild automatically, but if it doesn't the command you already identified will kick it into gear.

Solution 2

You can just pull the dead disk and replace it - there's no need for OS involvement at all.

Solution 3

A drive with prefailure won't necessarily have an LED indicator (sometimes it's a slow amber blink), so identifying it for smart hands is a good idea. You don't need to remove the drive from the array or re-add it, though. Those functions will be handled by the controller automatically. All you will need is the hpacucli controller slot=1 pd 1:8 modify led=on line.

Solution 4

The sequence of commands that you specify do not work on our Smart Array 641/642 controllers. A This operation is not supported with the current configuration error is encounter. On my class of array, these commands do not work, even if all the disks are properly operation. The best solution is to ewwhite's process to blink the drive, and physically replace.

Share:
44,804

Related videos on Youtube

gilesw
Author by

gilesw

Updated on September 17, 2022

Comments

  • gilesw
    gilesw almost 2 years

    hpacucli controller slot=1 ld 1 show detail

    Smart Array P400 in Slot 1
    
       array A
    
          Logical Drive: 1
             Size: 273.3 GB
             Fault Tolerance: RAID 1+0
             Heads: 255
             Sectors Per Track: 32
             Cylinders: 65535
             Stripe Size: 128 KB
             Status: OK
             Array Accelerator: Enabled
             Unique Identifier: xxxx
             Disk Name: /dev/cciss/c0d0
             Mount Points: /boot 196 MB, / 7.8 GB
             Logical Drive Label: xxxxx
             Mirror Group 0:
                physicaldrive 1I:1:8 (port 1I:box 1:bay 8, SAS, 72 GB, Predictive Failure)
                physicaldrive 1I:1:7 (port 1I:box 1:bay 7, SAS, 72 GB, OK)
                physicaldrive 1I:1:6 (port 1I:box 1:bay 6, SAS, 72 GB, OK)
                physicaldrive 1I:1:5 (port 1I:box 1:bay 5, SAS, 72 GB, OK)
             Mirror Group 1:
                physicaldrive 2I:1:4 (port 2I:box 1:bay 4, SAS, 72 GB, OK)
                physicaldrive 2I:1:3 (port 2I:box 1:bay 3, SAS, 72 GB, OK)
                physicaldrive 2I:1:2 (port 2I:box 1:bay 2, SAS, 72 GB, OK)
                physicaldrive 2I:1:1 (port 2I:box 1:bay 1, SAS, 72 GB, OK)
    

    hpacucli controller slot=1 show

    Smart Array P400 in Slot 1
       Bus Interface: PCI
       Slot: 1
       Serial Number: xxxx
       Cache Serial Number: xxxx
       RAID 6 (ADG) Status: Disabled
       Controller Status: OK
       Chassis Slot:
       Hardware Revision: Rev D
       Firmware Version: 4.06
       Rebuild Priority: Medium
       Expand Priority: Medium
       Surface Scan Delay: 15 secs
       Post Prompt Timeout: 0 secs
       Cache Board Present: True
       Cache Status: OK
       Accelerator Ratio: 100% Read / 0% Write
       Drive Write Cache: Disabled
       Total Cache Size: 256 MB
       Battery Pack Count: 0
       SATA NCQ Supported: True
    

    Is it safe to run this sequence of commands?

    hpacucli controller slot=1 array A remove drives=1:8
    hpacucli controller slot=1 pd 1:8 modify led=on
    

    get remote hands to remove the drive and replace. Then run:

    hpacucli controller slot=1 array A add drives=1:8
    

    Will this get the array to rebuild safely?

  • gilesw
    gilesw over 13 years
    Is this based on experience with HP servers yourself? I favour your solution simply because if a disk is being written to when physically removed from an array the disk heads will be on the plater and could cause damage to the disk itself. I'd rather the drive was out of the array and spun down. Which is hopefully what the commands should do.
  • Deb
    Deb over 13 years
    @User70139 The SmartArray cards are smart enough to stop writing to a disk that's in pre-fail and start the fail-light blinking. I/O has already been quiesced by the card. The drive is still spinning, but the heads aren't being used. If you're concerned, when pulling the old drive out, pull it out an inch and wait 10 seconds before fully pulling it out.
  • Aashraya Singal
    Aashraya Singal about 13 years
    As long as your HP disks have red handles, they're hot-swap compatible and can be pulled from the server at any time, even when spinning. Obviously you don't want to flail it around until it's had 10-15 seconds to stop the platters spinning. In fact, just don't flail 'em around ever and you should be fine. Drive rebuild/replacment is the responsibility of the controller and you don't need to worry about executing any commands before or after pulling a failed drive. It's all happening further down the stack.