How to make RAID controller rescan devices

5,795

Restart the out-of-sync controller (eg c1)

/opt/MegaRAID/storcli/storcli64 /c1 restart
Share:
5,795

Related videos on Youtube

Michael
Author by

Michael

Updated on September 18, 2022

Comments

  • Michael
    Michael almost 2 years

    I have the following setup:

    A single server with two LSI MegaRAID SAS 9380-8e controllers which are both connected to two 60-bay disk shelves while roughly following the design by Edmund White (see https://github.com/ewwhite/zfs-ha/wiki). The goal is to replicate the exact setup, but it's currently mid-migration.

    After wiring the first shelf, all 60 disks were seen by both controllers and multipathing was setup and works smoothly. When adding the second disk shelf, there was still some old RAID configuration on the 60 disks which was dutifully reported by both controllers. Using the first controller I removed the configuration from disks and set them to being JBOD. All 60 disks are now visible to the OS and could be registered with multipath but only report a single path (going through controller 1), the second controller still reports all 60 disks as foreign (UGood F) and there is seemingly no way to forcibly make the controller rescan the devices or forget the current config for just this shelf:

    # /opt/MegaRAID/storcli/storcli64 /c1 /e71 /sall show | head -n20
    Controller = 1
    Status = Success
    Description = Show Drive Information Succeeded.
    
    
    Drive Information :
    =================
    
    -----------------------------------------------------------------------
    EID:Slt DID State DG     Size Intf Med SED PI SeSz Model            Sp 
    -----------------------------------------------------------------------
    71:0     74 UGood F  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  D  
    71:1    107 UGood F  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  D  
    71:2     72 UGood F  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  D  
    71:3     95 UGood F  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  D  
    71:4     90 UGood F  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  D  
    71:5     77 UGood F  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  D  
    71:6     73 UGood F  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  D  
    71:7     76 UGood F  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  D  
    71:8     83 UGood F  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  D  
    

    This is the same shelf as seen by the other controller:

    # /opt/MegaRAID/storcli/storcli64 /c0 /e165 /sall show | head -n20
    Controller = 0
    Status = Success
    Description = Show Drive Information Succeeded.
    
    
    Drive Information :
    =================
    
    -----------------------------------------------------------------------
    EID:Slt DID State DG     Size Intf Med SED PI SeSz Model            Sp 
    -----------------------------------------------------------------------
    165:0   127 JBOD  -  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  U  
    165:1   121 JBOD  -  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  U  
    165:2   118 JBOD  -  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  U  
    165:3   116 JBOD  -  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  U  
    165:4   146 JBOD  -  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  U  
    165:5   122 JBOD  -  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  U  
    165:6   115 JBOD  -  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  U  
    165:7   142 JBOD  -  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  U  
    165:8   145 JBOD  -  3.637 TB SAS  HDD N   N  512B HUS724040ALS640  U  
    

    But trying to clear the (wrong) info from the second controller does not work:

    # /opt/MegaRAID/storcli/storcli64 /c1 /fall show
    Controller = 1
    Status = Success
    Description = Couldn't find any foreign Configuration
    
    # /opt/MegaRAID/storcli/storcli64 /c1 /fall delete
    Controller = 1
    Status = Success
    Description = Couldn't find any foreign Configuration
    
    # /opt/MegaRAID/storcli/storcli64 /c1 /fall import
    Controller = 1
    Status = Success
    Description = Couldn't find any foreign Configuration
    

    Forcing the disks into JBOD on the second controller does not work either:

    # /opt/MegaRAID/storcli/storcli64 /c1 /e71 /sall set jbod | head -n20
    Controller = 1
    Status = Failure
    Description = Set Drive JBOD Failed.
    
    Detailed Status :
    ===============
    
    -------------------------------------------------
    Drive       Status  ErrCd ErrMsg                 
    -------------------------------------------------
    /c1/e71/s0  Failure   255 Operation not allowed. 
    /c1/e71/s1  Failure   255 Operation not allowed. 
    /c1/e71/s2  Failure   255 Operation not allowed. 
    /c1/e71/s3  Failure   255 Operation not allowed. 
    /c1/e71/s4  Failure   255 Operation not allowed. 
    /c1/e71/s5  Failure   255 Operation not allowed. 
    /c1/e71/s6  Failure   255 Operation not allowed. 
    /c1/e71/s7  Failure   255 Operation not allowed. 
    /c1/e71/s8  Failure   255 Operation not allowed. 
    /c1/e71/s9  Failure   255 Operation not allowed. 
    

    Is there any way to tell the RAID controller those disks do no longer have a foreign config and should be seen as JBODs?

    • Lenniey
      Lenniey almost 7 years
      Could you try /cx rescan?
    • Lenniey
      Lenniey almost 7 years
      Sorry, I was on 3Ware...did all your disks come from the same old machine / vendor? Some controllers install their own firmware and can only be used by another one if you low-level-format the disk or remove the config from the old controller. Also I assume the controllers are all on the same firmware / BIOS etc.?
    • ewwhite
      ewwhite almost 7 years
      Does the controller have a JBOD mode? Why aren't you using a SAS HBA for ZFS? Are these a bunch of RAID0 arrays?
    • Michael
      Michael almost 7 years
      @eewhite: Yes, there is a JBOD mode (see sample output of controller c0 above). I am migrating from a different setup and had those 4 relatively new RAID controllers around. And it already works well with the first shelf. The problem when adding the second shelf was simply that the controller detected the existing config and pulled it in.
    • Michael
      Michael almost 7 years
      @Lenniey: Yes, all disks were connected to a LSI controller (same brand) before. The controller 0 also showed the foreign config at first. I changed the disks to JBOD on this contoller and can access them from the OS. Only now, controller 1 is not updateing it's config to the change.
    • Lenniey
      Lenniey almost 7 years
      I'd try disconnecting controller 1 (PCI-wise) + storage, reboot etc., reconnecting and rebooting again. I had so many strange troubles with RAID-controllers of different vendors / HDD incompatibilities and whatever you can think of, that this is ususally my "workflow". Next would be to attach only one single disk, try to format (or initialize) it and see what happens.
  • jeffre
    jeffre about 5 years
    I'd like to add that you may want to consult with Broadcom on your configuration, as I was recently informed by their support what you (and I) were doing is not supported: "You cannot have 2 MegaRAID controllers taking charge of the same set of drives on 1 enclosure even if there are 2 SAS expander chips on the backplane."