How to get an inactive RAID device working again?

linux ubuntu raid software-raid mdadm

207,434

Solution 1

For your bonus question:

mdadm --examine --scan >> /etc/mdadm/mdadm.conf

Solution 2

I have found that I have to add the array manually in /etc/mdadm/mdadm.conf in order to make Linux mount it on reboot. Otherwise I get exactly what you have here - md_d1-devices that are inactive etc.

The conf-file should look like below - i.e. one ARRAY-line for each md-device. In my case the new arrays were missing in this file, but if you have them listed this is probably not a fix to your problem.

# definitions of existing MD arrays
ARRAY /dev/md0 level=raid5 num-devices=3 UUID=f10f5f96:106599e0:a2f56e56:f5d3ad6d
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=aa591bbe:bbbec94d:a2f56e56:f5d3ad6d

Add one array per md-device, and add them after the comment included above, or if no such comment exists, at the end of the file. You get the UUIDs by doing sudo mdadm -E --scan:

$ sudo mdadm -E --scan
ARRAY /dev/md0 level=raid5 num-devices=3 UUID=f10f5f96:106599e0:a2f56e56:f5d3ad6d
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=aa591bbe:bbbec94d:a2f56e56:f5d3ad6d

As you can see you can pretty much just copy the output from the scan-result into the file.

I run ubuntu desktop 10.04 LTS, and as far as I remember this behavior differs from the server version of Ubuntu, however it was such a long time ago I created my md-devices on the server I may be wrong. It may also be that I just missed some option.

Anyway, adding the array in the conf-file seems to do the trick. I've run the above raid 1 and raid 5 for years with no problems.

Solution 3

Warning: First of all let me say that the below (due to the use of "--force") seems risky to me, and if you have irrecoverable data I'd recommend making copies of the partitions involved before you start trying any of the things below. However, this worked for me.

I had the same problem, with an array showing up as inactive, and nothing I did including the "mdadm --examine --scan >/etc/mdadm.conf", as suggested by others here, helped at all.

In my case, when it tried to start the RAID-5 array after a drive replacement, it was saying that it was dirty (via dmesg):

md/raid:md2: not clean -- starting background reconstruction
md/raid:md2: device sda4 operational as raid disk 0
md/raid:md2: device sdd4 operational as raid disk 3
md/raid:md2: device sdc4 operational as raid disk 2
md/raid:md2: device sde4 operational as raid disk 4
md/raid:md2: allocated 5334kB
md/raid:md2: cannot start dirty degraded array.

Causing it to show up as inactive in /proc/mdstat:

md2 : inactive sda4[0] sdd4[3] sdc4[2] sde4[5]
      3888504544 blocks super 1.2

I did find that all the devices had the same events on them, except for the drive I had replaced (/dev/sdb4):

[root@nfs1 sr]# mdadm -E /dev/sd*4 | grep Event
mdadm: No md superblock detected on /dev/sdb4.
         Events : 8448
         Events : 8448
         Events : 8448
         Events : 8448

However, the array details showed that it had 4 out of 5 devices available:

[root@nfs1 sr]# mdadm --detail /dev/md2
/dev/md2:
[...]
   Raid Devices : 5
  Total Devices : 4
[...]
 Active Devices : 4
Working Devices : 4
[...]
    Number   Major   Minor   RaidDevice State
       0       8        4        0      inactive dirty  /dev/sda4
       2       8       36        2      inactive dirty  /dev/sdc4
       3       8       52        3      inactive dirty  /dev/sdd4
       5       8       68        4      inactive dirty  /dev/sde4

(The above is from memory on the "State" column, I can't find it in my scroll-back buffer).

I was able to resolve this by stopping the array and then re-assembling it:

mdadm --stop /dev/md2
mdadm -A --force /dev/md2 /dev/sd[acde]4

At that point the array was up, running with 4 of the 5 devices, and I was able to add the replacement device and it's rebuilding. I'm able to access the file-system without any problem.

Solution 4

I was having issues with Ubuntu 10.04 where an error in FStab prevented the server from booting.

I ran this command as mentioned in the above solutions:

mdadm --examine --scan >> /etc/mdadm/mdadm.conf

This will append the results from "mdadm --examine --scan" to "/etc/mdadm/mdadm.conf"

In my case, this was:

ARRAY /dev/md/0 metadata=1.2 UUID=2660925e:6d2c43a7:4b95519e:b6d110e7 name=localhost:0

This is a fakeraid 0. My command in /etc/fstab for automatically mounting is:

/dev/md0 /home/shared/BigDrive ext3 defaults,nobootwait,nofail 0 0

The important thing here is that you have "nobootwait" and "nofail". Nobootwait will skip any system messages which are preventing you from booting. In my case, this was on a remote server so it was essential.

Hope this will help some people.

Solution 5

A simple way to get the array to run assuming there is no hardware problem and you have enough drives/partitions to start the array is the following:

md20 : inactive sdf1[2](S)
      732442488 blocks super 1.2

 sudo mdadm --manage /dev/md20  --run

It could be that for whatever reason the array is fine but something prevented it from starting or building. In my case this was because mdadm didn't know the original array name was md127 and all drives were unplugged for that array. When replugging I had to manually assemble (probably a bug where mdadm thought the array was already active because of the offline old array name).

View more solutions

207,434

Assaf Levy

I'm a software developer by profession, and more active on Stack Overflow. I used to be more of a Linux geek, wasting a lot of free time tweaking my system, following Slashdot, etc. Fortunately I'm mostly past that. :) Nowadays I mainly use Mac at home, while still strongly preferring Linux (Ubuntu these days) for development at work and at work too.

Updated on September 17, 2022

Comments

Assaf Levy almost 2 years

After booting, my RAID1 device (/dev/md_d0 *) sometimes goes in some funny state and I cannot mount it.

^{* Originally I created /dev/md0 but it has somehow changed itself into /dev/md_d0.}

# mount /opt
mount: wrong fs type, bad option, bad superblock on /dev/md_d0,
       missing codepage or helper program, or other error
       (could this be the IDE device where you in fact use
       ide-scsi so that sr0 or sda or so is needed?)
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

The RAID device appears to be inactive somehow:

# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] 
                [raid4] [raid10] 
md_d0 : inactive sda4[0](S)
      241095104 blocks

# mdadm --detail /dev/md_d0
mdadm: md device /dev/md_d0 does not appear to be active.

Question is, how to make the device active again (using mdmadm, I presume)?

(Other times it's alright (active) after boot, and I can mount it manually without problems. But it still won't mount automatically even though I have it in /etc/fstab:

/dev/md_d0        /opt           ext4    defaults        0       0

So a bonus question: what should I do to make the RAID device automatically mount at /opt at boot time?)

This is an Ubuntu 9.10 workstation. Background info about my RAID setup in this question.

Edit: My /etc/mdadm/mdadm.conf looks like this. I've never touched this file, at least by hand.

# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
DEVICE partitions

# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR <my mail address>

# definitions of existing MD arrays

# This file was auto-generated on Wed, 27 Jan 2010 17:14:36 +0200

In /proc/partitions the last entry is md_d0 at least now, after reboot, when the device happens to be active again. (I'm not sure if it would be the same when it's inactive.)

Resolution: as Jimmy Hedman suggested, I took the output of mdadm --examine --scan:

ARRAY /dev/md0 level=raid1 num-devices=2 UUID=de8fbd92[...]

and added it in /etc/mdadm/mdadm.conf, which seems to have fixed the main problem. After changing /etc/fstab to use /dev/md0 again (instead of /dev/md_d0), the RAID device also gets automatically mounted!

Assaf Levy over 14 years

Ok, on those occasions when the device is active after reboot, just mount /dev/md_d0 in /etc/rc.local works fine. mdadm -A /dev/md_d0 on the other hand fails with that error message in both cases (so I couldn't use it before that && operator). Anyway, half of the problem seems solved so +1 for that.
Assaf Levy over 14 years

Actually mdadm.conf doesn't contain any configuration name, at least directly (it does refer to /proc/partitions though); see the edited question. I've never touched mdadm.conf - what is the tool that autogenerates it?
Assaf Levy over 14 years

Seems I cannot reproduce the inactive situation any more, after following the advice in Jimmy's answer (seems like that anyway after a few reboots)... Which is nice :) Thanks in any case!
Assaf Levy over 14 years

Ok, mdadm --examine --scan produced ARRAY /dev/md0 level=raid1 num-devices=2 UUID=... (Note the md0 instead of md_d0!) I put that in the mdadm.conf file (manually, because the was some problem with sudo and >> ("permission denied"), and sudo is required) and also updated fstab to use md0 (not md_d0) again. Now I don't seem to run into the "inactive" problem anymore and the RAID device mounts automatically at /opt upon booting. So thanks!
Assaf Levy over 14 years

For the record, removed the /etc/rc.local workaround as it seems I got everything working properly: superuser.com/questions/117824/… :)
Assaf Levy almost 13 years

So essentially you're saying the same thing as the currently accepted answer, just more verbosely? :) Still, +1, nice first post.
Mei over 10 years

The reason you had problems with sudo ... >> mdadm.conf is that the shell opens the redirected files before sudo runs. The command su -c '.... >> mdadm.conf' should work.
Michael Robinson over 9 years

This is what did it for me. I have my RAID drives attached via a PCI express SATA card, so I'm guessing at boot time the system couldn't see those drives yet.
nh2 over 5 years

I brought the question of this state to the Linux RAID mailing list, and got this response: spinics.net/lists/raid/msg61352.html
nh2 over 5 years

As I just wrote here, echo active > /sys/block/md0/md/array_state worked for me, bringing making my RAID show up as RAID1 with missing disk again instead of RAID0 with spare-only.