Grub rescue, will not boot from mdadm RAID, no such disk or device -- mduuid wrong?

23,721

Look at /dev/disk/by-id with the raid device prefixed by md-uuid. Those are the correct id's for using mduuid/ in grub. Probably need to insmod mdraid1x too if you are using current metadata.

Share:
23,721

Related videos on Youtube

ctrlbrk
Author by

ctrlbrk

Updated on September 18, 2022

Comments

  • ctrlbrk
    ctrlbrk over 1 year

    I am running a 14 disk RAID 6 on mdadm behind 2 LSI SAS2008's in JBOD mode (no HW raid) on Debian 7 in BIOS legacy mode.

    Grub2 is dropping to a rescue shell complaining that "no such device" exists for "mduuid/b1c40379914e5d18dddb893b4dc5a28f".

    Output from mdadm:

    # mdadm -D /dev/md0
    /dev/md0:
            Version : 1.2
      Creation Time : Wed Nov  7 17:06:02 2012
         Raid Level : raid6
         Array Size : 35160446976 (33531.62 GiB 36004.30 GB)
      Used Dev Size : 2930037248 (2794.30 GiB 3000.36 GB)
       Raid Devices : 14
      Total Devices : 14
        Persistence : Superblock is persistent
    
        Update Time : Thu Sep 18 19:44:56 2014
              State : clean
     Active Devices : 14
    Working Devices : 14
     Failed Devices : 0
      Spare Devices : 0
    
             Layout : left-symmetric
         Chunk Size : 512K
    
               Name : media:0  (local to host media)
               UUID : b1c40379:914e5d18:dddb893b:4dc5a28f
             Events : 2319862
    
        Number   Major   Minor   RaidDevice State
          13       8       82        0      active sync   /dev/sdf2
          15       8      130        1      active sync   /dev/sdi2
          14       8       98        2      active sync   /dev/sdg2
          21       8      194        3      active sync   /dev/sdm2
          16       8      226        4      active sync   /dev/sdo2
          12       8      162        5      active sync   /dev/sdk2
          18       8       50        6      active sync   /dev/sdd2
          17       8      146        7      active sync   /dev/sdj2
          20       8      210        8      active sync   /dev/sdn2
          19       8       66        9      active sync   /dev/sde2
          11       8       34       10      active sync   /dev/sdc2
          24       8      178       11      active sync   /dev/sdl2
          23       8      114       12      active sync   /dev/sdh2
          22       8       18       13      active sync   /dev/sdb2
    

    Output from blkid:

    # blkid
    /dev/md0: UUID="2c61b08d-cb1f-4c2c-8ce0-eaea15af32fb" TYPE="xfs"
    /dev/md/0: UUID="2c61b08d-cb1f-4c2c-8ce0-eaea15af32fb" TYPE="xfs"
    /dev/sdd2: UUID="b1c40379-914e-5d18-dddb-893b4dc5a28f" UUID_SUB="09a00673-c9c1-dc15-b792-f0226016a8a6" LABEL="media:0" TYPE="linux_raid_member"
    /dev/sdc2: UUID="b1c40379-914e-5d18-dddb-893b4dc5a28f" UUID_SUB="ce717500-cadf-3b12-e893-48d43c1408e7" LABEL="media:0" TYPE="linux_raid_member"
    /dev/sdf2: UUID="b1c40379-914e-5d18-dddb-893b4dc5a28f" UUID_SUB="071afb12-f78f-4f15-f65a-a6298eadcfa7" LABEL="media:0" TYPE="linux_raid_member"
    /dev/sdb2: UUID="b1c40379-914e-5d18-dddb-893b4dc5a28f" UUID_SUB="822fd02b-454d-a94c-57f6-8535964996b1" LABEL="media:0" TYPE="linux_raid_member"
    /dev/sde2: UUID="b1c40379-914e-5d18-dddb-893b4dc5a28f" UUID_SUB="de3f41b8-3016-870c-344f-2a92c08e1085" LABEL="media:0" TYPE="linux_raid_member"
    /dev/sdg2: UUID="b1c40379-914e-5d18-dddb-893b4dc5a28f" UUID_SUB="e319bdaa-22bc-1153-c43b-48788a9c1832" LABEL="media:0" TYPE="linux_raid_member"
    /dev/sdi2: UUID="b1c40379-914e-5d18-dddb-893b4dc5a28f" UUID_SUB="3dd1df1b-203c-6453-0964-ebad245b1670" LABEL="media:0" TYPE="linux_raid_member"
    /dev/sdh2: UUID="b1c40379-914e-5d18-dddb-893b4dc5a28f" UUID_SUB="f5477580-9435-7948-6e97-fe82c8805bcd" LABEL="media:0" TYPE="linux_raid_member"
    /dev/sdj2: UUID="b1c40379-914e-5d18-dddb-893b4dc5a28f" UUID_SUB="4a013330-37c5-65f9-cb76-1d357ce4ddb4" LABEL="media:0" TYPE="linux_raid_member"
    /dev/sdm2: UUID="b1c40379-914e-5d18-dddb-893b4dc5a28f" UUID_SUB="b750b4e4-2b1b-ac5f-cbd3-bde5eab657e7" LABEL="media:0" TYPE="linux_raid_member"
    /dev/sdk2: UUID="b1c40379-914e-5d18-dddb-893b4dc5a28f" UUID_SUB="d5521994-6c4f-04f9-f7ca-0dd9dff3c6cd" LABEL="media:0" TYPE="linux_raid_member"
    /dev/sdn2: UUID="b1c40379-914e-5d18-dddb-893b4dc5a28f" UUID_SUB="4670b36c-07cb-e661-20e3-d314f7c3fd42" LABEL="media:0" TYPE="linux_raid_member"
    /dev/sdl2: UUID="b1c40379-914e-5d18-dddb-893b4dc5a28f" UUID_SUB="c1514b9f-2461-6fed-324a-50fb9469043a" LABEL="media:0" TYPE="linux_raid_member"
    /dev/sdo2: UUID="b1c40379-914e-5d18-dddb-893b4dc5a28f" UUID_SUB="6c33c472-af1f-fd8f-22d1-0ea39edc75bb" LABEL="media:0" TYPE="linux_raid_member"
    

    The UUID for md0 is 2c61b08d-cb1f-4c2c-8ce0-eaea15af32fb so I do not understand why grub insists on looking for b1c40379914e5d18dddb893b4dc5a28f.

    Here is the output from bootinfoscript 0.61. This contains alot of detailed information, and I couldn't find anything wrong with any of it:

    http://pastebin.com/bPgGN68L

    During the grub rescue an ls shows the member disks and also shows (md/0) but if I try an ls (md/0) I get an unknown disk error. Trying an ls on any member device results in unknown filesystem. The filesystem on the md0 is XFS, and I assume the unknown filesystem is normal if its trying to read an individual disk instead of md0.

    I have come close to losing my mind over this, I've tried uninstalling and reinstalling grub numerous times, update-initramfs -u -k all numerous times, update-grub numerous times, grub-install numerous times to all member disks without error, etc.

    I even tried manually editing grub.cfg to replace all instances of mduuid/b1c40379914e5d18dddb893b4dc5a28f with (md/0) and then re-install grub, but the exact same error of no such device mduuid/b1c40379914e5d18dddb893b4dc5a28f still happened.

    EDIT TO ADD

    I don't have IPMI on this box so please forgive the embarrassing cell phone phone picture:

    http://imgur.com/zooX12b

    One thing I noticed is it is only showing half the disks. I am not sure if this matters or is important or not, but one theory would be because there are two LSI cards physically in the machine.

    This last screenshot was shown after I specifically altered grub.cfg to replace all instances of mduuid/b1c40379914e5d18dddb893b4dc5a28f with mduuid/2c61b08d-cb1f-4c2c-8ce0-eaea15af32fb and then re-ran grub-install on all member drives. Where it is getting this old b1c* address I have no clue.

    I even tried installing a SATA drive on /dev/sda, outside of the array, and installing grub on it and booting from it. Still, same identical error.

    EDIT TO CLARIFY

    Grub installation is to each individual member disk, not to /dev/md0, and completes without error. But drops to grub rescue on reboot.

    EDIT TO ADD

    These operations were suggested by a friend. They did not work, I still need help!

    enter image description here

    I could really use some assistance from anyone/everyone to help me get GRUB working on this box.

    Anyone have other suggestions and fixes?

    EDIT 5

    Grub bug report:

    https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=764798

  • ctrlbrk
    ctrlbrk over 9 years
    I'm not installing grub to mdadm, I'm installing it to all member disks as described in my post and shown by the attachment
  • ctrlbrk
    ctrlbrk over 9 years
    I created a new article because this issue is quite different than the original issue, as grub does install, there is no seg fault, and no raid errors