mdadm: drive replacement shows up as spare and refuses to sync
After hours of Googling and some extremely wise help from JyZyXEL in the #linux-raid Freenode channel, we have a solution! There was not a single interruption to the RAID array during this process - exactly what I needed and expected from mdadm.
For some (currently unknown) reason, the RAID state became frozen. The winning command to figure this out is cat /sys/block/md0/md/sync_action
:
root@galaxy:~# cat /sys/block/md0/md/sync_action frozen
Simply put, that is why it was not using the available spares. All my hair is gone at the cost of a simple cat command!
So, just unfreeze the array:
root@galaxy:~# echo idle > /sys/block/md0/md/sync_action
And you're away!
root@galaxy:~# cat /sys/block/md0/md/sync_action recover root@galaxy:~# cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid6 sdm[6] sdb[5] sda[0] sde[4] sdd[3] sdc[1] 15627548672 blocks super 1.2 level 6, 512k chunk, algorithm 2 [6/5] [UU_UUU] [>....................] recovery = 0.0% (129664/3906887168) finish=4016.8min speed=16208K/sec bitmap: 17/30 pages [68KB], 65536KB chunk unused devices: root@galaxy:~# mdadm --detail /dev/md0 /dev/md0: Version : 1.2 Creation Time : Wed Jul 30 13:17:25 2014 Raid Level : raid6 Array Size : 15627548672 (14903.59 GiB 16002.61 GB) Used Dev Size : 3906887168 (3725.90 GiB 4000.65 GB) Raid Devices : 6 Total Devices : 6 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Tue Mar 17 22:05:30 2015 State : active, degraded, recovering Active Devices : 5 Working Devices : 6 Failed Devices : 0 Spare Devices : 1 Layout : left-symmetric Chunk Size : 512K Rebuild Status : 0% complete Name : eclipse:0 UUID : cc7dac66:f6ac1117:ca755769:0e59d5c5 Events : 73562 Number Major Minor RaidDevice State 0 8 0 0 active sync /dev/sda 1 8 32 1 active sync /dev/sdc 6 8 192 2 spare rebuilding /dev/sdm 3 8 48 3 active sync /dev/sdd 4 8 64 4 active sync /dev/sde 5 8 16 5 active sync /dev/sdb
Bliss :-)
Related videos on Youtube
![Milos Ivanovic](https://i.stack.imgur.com/t5SBg.jpg?s=256&g=1)
Milos Ivanovic
Updated on September 18, 2022Comments
-
Milos Ivanovic almost 2 years
Prelude
I had the following devices in my
/dev/md0
RAID 6:/dev/sd[abcdef]
The following drives were also present, unrelated to the RAID:
/dev/sd[gh]
The following drives were part of a card reader that was connected, again, unrelated:
/dev/sd[ijkl]
Analysis
sdf
's SATA cable went bad (you could say it was unplugged while in use), andsdf
was subsequently rejected from the/dev/md0
array. I replaced the cable and the drive was back, now at/dev/sdm
. Please do not challenge my diagnosis, there is no problem with the drive.mdadm --detail /dev/md0
showedsdf(F)
, i.e., thatsdf
was faulty. So I usedmdadm --manage /dev/md0 --remove faulty
to remove the faulty drives.Now
mdadm --detail /dev/md0
showed "removed" in the space wheresdf
used to be.root@galaxy:~# mdadm --detail /dev/md0 /dev/md0: Version : 1.2 Creation Time : Wed Jul 30 13:17:25 2014 Raid Level : raid6 Array Size : 15627548672 (14903.59 GiB 16002.61 GB) Used Dev Size : 3906887168 (3725.90 GiB 4000.65 GB) Raid Devices : 6 Total Devices : 5 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Tue Mar 17 21:16:14 2015 State : active, degraded Active Devices : 5 Working Devices : 5 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 512K Name : eclipse:0 UUID : cc7dac66:f6ac1117:ca755769:0e59d5c5 Events : 67205 Number Major Minor RaidDevice State 0 8 0 0 active sync /dev/sda 1 8 32 1 active sync /dev/sdc 4 0 0 4 removed 3 8 48 3 active sync /dev/sdd 4 8 64 4 active sync /dev/sde 5 8 16 5 active sync /dev/sdb
For some reason the RaidDevice of the "removed" device now matches one that is active. Anyway, let's try add the previous device (now known as
/dev/sdm
) because that was the original intent:root@galaxy:~# mdadm --add /dev/md0 /dev/sdm mdadm: added /dev/sdm root@galaxy:~# mdadm --detail /dev/md0 /dev/md0: Version : 1.2 Creation Time : Wed Jul 30 13:17:25 2014 Raid Level : raid6 Array Size : 15627548672 (14903.59 GiB 16002.61 GB) Used Dev Size : 3906887168 (3725.90 GiB 4000.65 GB) Raid Devices : 6 Total Devices : 6 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Tue Mar 17 21:19:30 2015 State : active, degraded Active Devices : 5 Working Devices : 6 Failed Devices : 0 Spare Devices : 1 Layout : left-symmetric Chunk Size : 512K Name : eclipse:0 UUID : cc7dac66:f6ac1117:ca755769:0e59d5c5 Events : 67623 Number Major Minor RaidDevice State 0 8 0 0 active sync /dev/sda 1 8 32 1 active sync /dev/sdc 4 0 0 4 removed 3 8 48 3 active sync /dev/sdd 4 8 64 4 active sync /dev/sde 5 8 16 5 active sync /dev/sdb 6 8 192 - spare /dev/sdm
As you can see, the device shows up as a spare and refuses to sync with the rest of the array:
root@galaxy:~# cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid6 sdm[6](S) sdb[5] sda[0] sde[4] sdd[3] sdc[1] 15627548672 blocks super 1.2 level 6, 512k chunk, algorithm 2 [6/5] [UU_UUU] bitmap: 17/30 pages [68KB], 65536KB chunk unused devices:
I have also tried using
mdadm --zero-superblock /dev/sdm
before adding, with the same result.The reason I am using RAID 6 is to provide high availability. I will not accept stopping
/dev/md0
and re-assembling it with--assume-clean
or similar as workarounds to resolve this. This needs to be resolved online, otherwise I don't see the point of using mdadm. -
Richard Gomes over 4 yearsThanks for that. Deserved to be bookmarked :-) I'm not having this kind of problem in particular but something similar. I've bought a pair of disks and I'm trying to add them to an existing RAID6 array with two faulty disks. No data loss at this time! :-) ... One of the disks was added OK but the other one is reported as faulty and automagically removed from the array. S.M.A.R.T. does not report anything wrong with the brand new disk... so, I'm still trying to figure out why the disk is refused. I'm doing full tests with the new disk in order to stress it and see if SMART reports anything.