Recover RAID-5 that was already running in degraded mode (lost a second disk)

5,497

Solution 1

Provided the drives have not actually failed but rather become temporarily unavailable or for some other reason have come out of sync, you can try to force the raid online ignoring the change number/time stamp of each member.

By doing this you run the risk of corrupting data, especially if you don't know which drive went offline last - but it sounds like you have little choice.

Read up on the various ways to use the --force option in the mdadm man page.

If one of the drives have actually failed and another is out of sync, you can still bring the raid online supplying "missing" as the device ID for the failed drive, combined with the --force option. This should start the raid as degraded.

Solution 2

If al else fails, you could use raidextract: http://www.chiark.greenend.org.uk/~peterb/linux/raidextract/

Solution 3

Is RAID5 supposed to recover from a two-disk failure? I thought it was not supposed to. What you are looking for is probably the commands to hot-remove and hot-add drives to the raid array.

mdadm --remove /dev/md0 /dev/sdX
mdadm --add /dev/md0 /dev/sdX
Share:
5,497

Related videos on Youtube

Inovagent
Author by

Inovagent

I've been playing with Linux for over ten years now. I build Linux servers and I think software RAID using commodity level motherboards is awesome.

Updated on September 17, 2022

Comments

  • Inovagent
    Inovagent over 1 year

    this is silly, this has happened before and I figured out how to fix it and it was fine.

    I'm running 4 500GB SATA drives in a RAID-5 on Ubuntu 7.10 server. One of the disks failed (actually I think it's one of the connectors in the hot-swap cage) and it's been running off of three disks while I find a replacement HDD or further diagnose the problem.

    Now, before you read any further, NO I do not have backups and the information is not super important, just nice to have.

    Anyway once before, I had some kind of HW hiccup, maybe the power went out or something, and I had problems recovering the array. It wasn't that one of the disks failed, it was something else.

    I was able to simply add back in the second "failed" disk and in a few minutes, I was back up and running. Maybe I had to run some kind of filesystem check, I don't know.

    I spent hours, if not days, figuring out how to do it last time and have since forgotten.

    The crux of the issue is that if I run a mdadm --examine on sdb, sdc, and sdd, sdd thinks it's still part of the array but on the superblock info of sdb and sdc, it lists sdd as removed.

    sda is the disk that failed long before, it's listed correctly in all of them as faulty removed.

    TIA. The server in question is not on the internet so it's not possible to C&P the output of various commands on to the forum.

    I know, by now a lot of you probably think I'm a nitwit, or worse. However I do recollect that once I figured out the series of commands to run, it was a fairly straightforward procedure and it worked great.

    • Amok
      Amok over 14 years
      Even if you do get the RAID up and running again there is no good way to know that your data isn't corrupted, so if there is anything important that you can verify the integrity of you should do so.
  • ConcernedOfTunbridgeWells
    ConcernedOfTunbridgeWells over 14 years
    No, it doesn't support two disk failure.
  • Raynet
    Raynet over 14 years
    Fortunately with mdadm, if the whole drive (both) hasn't failed, you can force the raid to be assembled and usually recover most if not all of the data. Usually there is only a handful of sectors that are not readable and one can even rebuild the raid so that if the broken sectors are in the beginning of drive A and at the end of drive B, first assemble the raid with the working drives + B drive and after half point, assemble the raid with working drives + A drive. Though always image the drives first if possible.
  • Ababneh A
    Ababneh A over 14 years
    Sorry, my second link is not hyperlinked because new users aren't allowed to post more than one link
  • sybreon
    sybreon over 14 years
    I see. Did not know that. Thanks.
  • divegeek
    divegeek over 14 years
    mdadm should have been sending you e-mails about the failures. You should forcibly reassemble the drive exactly as it was in the e-mail before the second failure -- order is important! I've had to do that, and it worked just fine.