HP Smart Array; How to safely remove a physcial drive with SMART predictive failure from array so it can be replaced?
Solution 1
It is safe to run those commands. The mirror group can survive the absence of one disk. It should rebuild automatically, but if it doesn't the command you already identified will kick it into gear.
Solution 2
You can just pull the dead disk and replace it - there's no need for OS involvement at all.
Solution 3
A drive with prefailure won't necessarily have an LED indicator (sometimes it's a slow amber blink), so identifying it for smart hands is a good idea. You don't need to remove the drive from the array or re-add it, though. Those functions will be handled by the controller automatically. All you will need is the hpacucli controller slot=1 pd 1:8 modify led=on
line.
Solution 4
The sequence of commands that you specify do not work on our Smart Array 641/642 controllers. A This operation is not supported with the current configuration
error is encounter. On my class of array, these commands do not work, even if all the disks are properly operation. The best solution is to ewwhite's process to blink the drive, and physically replace.
Related videos on Youtube
gilesw
Updated on September 17, 2022Comments
-
gilesw almost 2 years
hpacucli controller slot=1 ld 1 show detail
Smart Array P400 in Slot 1 array A Logical Drive: 1 Size: 273.3 GB Fault Tolerance: RAID 1+0 Heads: 255 Sectors Per Track: 32 Cylinders: 65535 Stripe Size: 128 KB Status: OK Array Accelerator: Enabled Unique Identifier: xxxx Disk Name: /dev/cciss/c0d0 Mount Points: /boot 196 MB, / 7.8 GB Logical Drive Label: xxxxx Mirror Group 0: physicaldrive 1I:1:8 (port 1I:box 1:bay 8, SAS, 72 GB, Predictive Failure) physicaldrive 1I:1:7 (port 1I:box 1:bay 7, SAS, 72 GB, OK) physicaldrive 1I:1:6 (port 1I:box 1:bay 6, SAS, 72 GB, OK) physicaldrive 1I:1:5 (port 1I:box 1:bay 5, SAS, 72 GB, OK) Mirror Group 1: physicaldrive 2I:1:4 (port 2I:box 1:bay 4, SAS, 72 GB, OK) physicaldrive 2I:1:3 (port 2I:box 1:bay 3, SAS, 72 GB, OK) physicaldrive 2I:1:2 (port 2I:box 1:bay 2, SAS, 72 GB, OK) physicaldrive 2I:1:1 (port 2I:box 1:bay 1, SAS, 72 GB, OK)
hpacucli controller slot=1 show
Smart Array P400 in Slot 1 Bus Interface: PCI Slot: 1 Serial Number: xxxx Cache Serial Number: xxxx RAID 6 (ADG) Status: Disabled Controller Status: OK Chassis Slot: Hardware Revision: Rev D Firmware Version: 4.06 Rebuild Priority: Medium Expand Priority: Medium Surface Scan Delay: 15 secs Post Prompt Timeout: 0 secs Cache Board Present: True Cache Status: OK Accelerator Ratio: 100% Read / 0% Write Drive Write Cache: Disabled Total Cache Size: 256 MB Battery Pack Count: 0 SATA NCQ Supported: True
Is it safe to run this sequence of commands?
hpacucli controller slot=1 array A remove drives=1:8 hpacucli controller slot=1 pd 1:8 modify led=on
get remote hands to remove the drive and replace. Then run:
hpacucli controller slot=1 array A add drives=1:8
Will this get the array to rebuild safely?
-
gilesw over 13 yearsIs this based on experience with HP servers yourself? I favour your solution simply because if a disk is being written to when physically removed from an array the disk heads will be on the plater and could cause damage to the disk itself. I'd rather the drive was out of the array and spun down. Which is hopefully what the commands should do.
-
Deb over 13 years@User70139 The SmartArray cards are smart enough to stop writing to a disk that's in pre-fail and start the fail-light blinking. I/O has already been quiesced by the card. The drive is still spinning, but the heads aren't being used. If you're concerned, when pulling the old drive out, pull it out an inch and wait 10 seconds before fully pulling it out.
-
Aashraya Singal about 13 yearsAs long as your HP disks have red handles, they're hot-swap compatible and can be pulled from the server at any time, even when spinning. Obviously you don't want to flail it around until it's had 10-15 seconds to stop the platters spinning. In fact, just don't flail 'em around ever and you should be fine. Drive rebuild/replacment is the responsibility of the controller and you don't need to worry about executing any commands before or after pulling a failed drive. It's all happening further down the stack.