Linux Software RAID1: How to boot after (physically) removing /dev/sda? (LVM, mdadm, Grub2)

7,298

Solution 1

You need to install GRUB to the MBR of both drives, and you need to do it in a way that GRUB considers each disk to be the first disk in the system.

GRUB uses its own enumeration for disks, which is abstracted from what the Linux kernel presents. You can change which device it thinks is the first disk (hd0), by using a "device" line in the grub shell, like so:

device (hd0) /dev/sdb

This tells grub that, for all subsequent commands, treat /dev/sdb as the disk hd0. From here you can complete the installation manually:

device (hd0) /dev/sdb
root (hd0,0)
setup (hd0)

This sets up GRUB on the first partition of the disk it considers to be hd0, which you've just set as /dev/sdb.

I do the same for both /dev/sda and /dev/sdb, just to be sure.

Edited to add: I always found the Gentoo Wiki handy, until I did this often enough to commit it to memory.

Solution 2

Have you considered installing a third drive to serve as just the boot drive? I have seen problems too with raid 1 lvm setups (on CentOS) not being able to boot the second drive. I think the problem stems from grub not being able to handle native lvm partitions, although I'm not entirely sure.

Anyway, that's my answer: install a third small drive solely for the purpose of booting the system. Heck, I bet you could even get clever and do that with some sort of little flash or ssd device.

Solution 3

Grub should be able to recognize RAID1 setups and install to all slave disks when told to install to the MD device.

Share:
7,298
flight
Author by

flight

Updated on September 17, 2022

Comments

  • flight
    flight over 1 year

    A server set up with Debian 6.0/squeeze. During the squeeze installation, I configured the two 500GB SATA disks (/dev/sda and /dev/sdb) as a RAID1 (managed with mdadm). The RAID keeps a 500 GB LVM volume group (vg0). In the volume group, there's a single logical volume (lv0). vg0-lv0 is formatted with extfs3 and mounted as root partition (no dedicated /boot partition). The system boots using GRUB2.

    In normal use, the systems boots fine.

    Also, when I tried and removed the second SATA drive (/dev/sdb) after a shutdown, the system came up without problem, and after reconnecting the drive, I was able to --re-add /dev/sdb1 to the RAID array.

    But: After removing the first SATA drive (/dev/sda), the system won't boot any more! A GRUB welcome message shows up for a second, then the system reboots.

    I tried to install GRUB2 manually on /dev/sdb ("grub-install /dev/sdb"), but that doesn't help.

    Appearently squeeze fails to set up GRUB2 to launch from the second disk when the first disk is removed, which seems to be quite an essential feature when running this kind of Software RAID1, isn't it?

    At the moment, I'm lost whether this is a problem with GRUB2, with LVM or with the RAID setup. Any hints?

  • flight
    flight about 13 years
    You're talking GRUB1. GRUB2 doesn't have a a setup command in the shell.
  • flight
    flight about 13 years
    That's what I thought as well ;-), and yes, the Debconf frontend to grub-pc suggested an install in /dev/sda as well as /dev/sdb (and /dev/dm-0, where it failed to install subsequently). Still, it wouldn't boot with the second disk only.
  • flight
    flight about 13 years
    My current solution is to boot from a USB stick with GRUB2 on it (and a /boot filesystem, which is not exactly necessary, I think).
  • flight
    flight about 13 years
    Still I refrain from accepting this answer, since this ought to work without a third drive. From what I can tell, this has to be a bug in GRUB2 (in Debian Squeeze).
  • Phil Hollenback
    Phil Hollenback about 13 years
    Sure, that's a reasonable assumption. I just wanted to point out that I've seen weird lvm/raid/grub issues before, and solved it via a third drive via beating my head against weird annoying boot-time bugs.
  • ddm-j
    ddm-j about 13 years
    I dimly remember that one had to point it at the MD device rather than at the components, but I may be confusing that with LILO here.
  • Daniel Lawson
    Daniel Lawson about 13 years
    You're probably right there. Carry on :)
  • Cedric Knight
    Cedric Knight over 6 years
    This appears to be how grub-install works with GRUB 1.99 and 2.02. In whatever way sda+sdb RAID1 holds your boot partition, the core is likely to be referenced by UUID (check my linked question to see if it is). So if you grub-install /dev/sda; grub-install /dev/sdb, it doesn't matter if you remove one of those drives: so long as the BIOS can load MBR from one of them, it will find the RAID UUID and LV by searching.