HP Smart Array; How to safely remove a physcial drive with SMART predictive failure from array so it can be replaced?

linux raid hp hp-proliant hp-smart-array

44,804

Solution 1

It is safe to run those commands. The mirror group can survive the absence of one disk. It should rebuild automatically, but if it doesn't the command you already identified will kick it into gear.

Solution 2

You can just pull the dead disk and replace it - there's no need for OS involvement at all.

Solution 3

A drive with prefailure won't necessarily have an LED indicator (sometimes it's a slow amber blink), so identifying it for smart hands is a good idea. You don't need to remove the drive from the array or re-add it, though. Those functions will be handled by the controller automatically. All you will need is the hpacucli controller slot=1 pd 1:8 modify led=on line.

Solution 4

The sequence of commands that you specify do not work on our Smart Array 641/642 controllers. A This operation is not supported with the current configuration error is encounter. On my class of array, these commands do not work, even if all the disks are properly operation. The best solution is to ewwhite's process to blink the drive, and physically replace.

View more solutions

44,804

gilesw

Updated on September 17, 2022

Comments

gilesw almost 2 years

hpacucli controller slot=1 ld 1 show detail

Smart Array P400 in Slot 1

   array A

      Logical Drive: 1
         Size: 273.3 GB
         Fault Tolerance: RAID 1+0
         Heads: 255
         Sectors Per Track: 32
         Cylinders: 65535
         Stripe Size: 128 KB
         Status: OK
         Array Accelerator: Enabled
         Unique Identifier: xxxx
         Disk Name: /dev/cciss/c0d0
         Mount Points: /boot 196 MB, / 7.8 GB
         Logical Drive Label: xxxxx
         Mirror Group 0:
            physicaldrive 1I:1:8 (port 1I:box 1:bay 8, SAS, 72 GB, Predictive Failure)
            physicaldrive 1I:1:7 (port 1I:box 1:bay 7, SAS, 72 GB, OK)
            physicaldrive 1I:1:6 (port 1I:box 1:bay 6, SAS, 72 GB, OK)
            physicaldrive 1I:1:5 (port 1I:box 1:bay 5, SAS, 72 GB, OK)
         Mirror Group 1:
            physicaldrive 2I:1:4 (port 2I:box 1:bay 4, SAS, 72 GB, OK)
            physicaldrive 2I:1:3 (port 2I:box 1:bay 3, SAS, 72 GB, OK)
            physicaldrive 2I:1:2 (port 2I:box 1:bay 2, SAS, 72 GB, OK)
            physicaldrive 2I:1:1 (port 2I:box 1:bay 1, SAS, 72 GB, OK)

hpacucli controller slot=1 show

Smart Array P400 in Slot 1
   Bus Interface: PCI
   Slot: 1
   Serial Number: xxxx
   Cache Serial Number: xxxx
   RAID 6 (ADG) Status: Disabled
   Controller Status: OK
   Chassis Slot:
   Hardware Revision: Rev D
   Firmware Version: 4.06
   Rebuild Priority: Medium
   Expand Priority: Medium
   Surface Scan Delay: 15 secs
   Post Prompt Timeout: 0 secs
   Cache Board Present: True
   Cache Status: OK
   Accelerator Ratio: 100% Read / 0% Write
   Drive Write Cache: Disabled
   Total Cache Size: 256 MB
   Battery Pack Count: 0
   SATA NCQ Supported: True

Is it safe to run this sequence of commands?

hpacucli controller slot=1 array A remove drives=1:8
hpacucli controller slot=1 pd 1:8 modify led=on

get remote hands to remove the drive and replace. Then run:

hpacucli controller slot=1 array A add drives=1:8

Will this get the array to rebuild safely?

gilesw over 13 years

Is this based on experience with HP servers yourself? I favour your solution simply because if a disk is being written to when physically removed from an array the disk heads will be on the plater and could cause damage to the disk itself. I'd rather the drive was out of the array and spun down. Which is hopefully what the commands should do.
Deb over 13 years

@User70139 The SmartArray cards are smart enough to stop writing to a disk that's in pre-fail and start the fail-light blinking. I/O has already been quiesced by the card. The drive is still spinning, but the heads aren't being used. If you're concerned, when pulling the old drive out, pull it out an inch and wait 10 seconds before fully pulling it out.
Aashraya Singal about 13 years

As long as your HP disks have red handles, they're hot-swap compatible and can be pulled from the server at any time, even when spinning. Obviously you don't want to flail it around until it's had 10-15 seconds to stop the platters spinning. In fact, just don't flail 'em around ever and you should be fine. Drive rebuild/replacment is the responsibility of the controller and you don't need to worry about executing any commands before or after pulling a failed drive. It's all happening further down the stack.