Experience with AMCC 3ware 9650se raid cards? Ours seems dead

9,850

Solution 1

We managed to bring the card back to life, magically. We took the card out of the machine and stuck it in a completely different machine running something redhat with very new drivers. The story goes that the first time it booted, the raid bios did not kick in during the boot (like we'd been seeing), but the kernel reported a lot of different errors. Eventually it was able to actually bring it up and then the next reboot the raid bios started working again and it booted cleanly. We put it back in the machine and everything came back to life.

To me, this sounds like a problem with microcode - i've seen some drivers for things like sound cards, soft raids, video cards, etc download some sort of microcode to the card when turning it on. If the last time that happened things went bad, or if it got corrupted due to the power blip from the UPSes kicking in when we lost power (walls down the hall turned into a waterfall), then that would certainly explain what happened.

Figured I'd post an update for all future googlers.

Edit 3-Jan-2012: @rakslice made the point that these cards often have battery back-ups attached. We hadn't tried to remove the battery (didn't think of it), but it's a great idea. Anybody else having this problem may want to try the same. We're still not sure if we fixed it because the Fedora kernel did some magic handshake to recover the card, or if we happened to leave it unpowered long enough for something to reset.

Solution 2

It's quite painless to swap 3ware cards.

Just make sure it's the same or newer model and that the firmware versions are the same. If the firmware versions are different, the disks won't import to the controller. (been there, done that)

Does the old card show up in lspci at all? I've had problems where the BIOS settings would get scrambled and cause the card to not show up at all. I had to reenable the PCI slot and also enable MSI for the 3Ware cards to appear again.

Solution 3

Some info on using 3ware 9650 raid cards in modern, common motherboards:

  • Avoid full size 9650 cards as they don't work with newer motherboards, bios fails to kick in after soft reset. In older motherboards they work fine (tested in core2 motherboards).

  • The low profile 9650SE cards are later made and they work fine in modern uefi, etc. motherboards.

  • They are still working (most of them made around 2007 perhaps?)

  • Did not see a failing battery yet, after 8-9 years (using them in ideal conditions, batteries always checked, charged).

  • You can switch cards, but use the same firmware (or newer if same version is not available). When building raids use the lower ports first, because you can also switch to a 9650 card with fewer ports easily as long as the higher ports are not used on the original card.

  • avoid the first x16 pci express port on the motherboard, some motherboards are expecting video cards there, causing strange behavior.

  • installing 3dm2 and cli is working out of the box in ubuntu (tested: 14.04LTS, 16.04LTS), just run the shell script from the install.

  • It's a pity that 3ware is no more, these are great products

  • if you use them still, sadly its time to swicth to something new. I'm afraid there is only LSI (now Broadband) to consider.

  • after Broadcom bought Avago they made changes to Avago website, drivers/downloads are harder to find for 3ware.

Solution 4

This is Dan who posted previously, this time I've created an account :)

Anyway, now that my data was pulled.. I decided to screw around with the card and success!!

  1. Downloaded LiveCD version of Ubuntu 10.04.3 LTS

  2. Booted Live and ensured the card was detected ('tail /var/log/messages | grep 3w-')

  3. Installed tw_cli from the following guy's repo: http://jonas.genannt.name

  4. Downloaded the latest firmware (2.08.00.009) from CodeSet 9.3.0.8 for the 9500S-8 from http://www.3ware.com/support/downloadpageprod.asp?pcode=9&path=Escalade9500SSeries&prodname=3ware%209500S%20Series

  5. Used tw_cli to flash the firmware (stock tw_cli from 3ware doesn't support this). I did not use the force flag, and flashed despite already having the same version.

  6. Rebooted when it told me so.

BIOS now comes up as expected!

RMA my !@#. Perhaps I should share this with 3Ware. Big thanks to everyone for listening.

Solution 5

3ware cards are excellent at array compatibility. Do ensure the firmware is no older then the old card (as far as you can determine), and you probably want to try and keep within the same series if possible.

Keep those two in mind and it just works.

Share:
9,850

Related videos on Youtube

antiduh
Author by

antiduh

I'm a full-time software engineer in Rochester, NY.

Updated on September 17, 2022

Comments

  • antiduh
    antiduh almost 2 years

    We have a 8-port 3ware 9650se raid card for our main disk array. We had to bring the server down for a pending power outage, and when we turned the machine back on, the raid card never started.

    This card has been in service for a couple years without problems, and was working up until the shutdown.

    Now, when we turn the machine on, the bios option rom that normally kicks in before the bootloader doesn't show up, none of the drives start, and when the OS tries to access the device, it just times out.

    The firmware on it has been upgraded in the past, so it's possible we've hit some sort of firmware bug.

    We're using it in a Silicon Mechanics R272 machine with gentoo for the OS. The OS eventually boots, but alas, without the card.

    We've ordered a new one, but I'm worried that if we replace the card it won't recognize the existing array. Has anybody performed a card swap before?

    Any help would be greatly appreciated.

    Edit: These are the kernel errors we see:

    3ware 9000 Storage Controller device driver for Linux v2.26.02.012.
    3w-9xxx 0000:09:00.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
    3w-9xxx 0000:09:00.0: setting latency timer to 64
    3w-9xxx: scsi0: ERROR: (0x06:0x000D): PCI Abort: clearing.
    3w-9xxx: scsi0: ERROR: (0x06:0x001F): Microcontroller not ready during reset sequence.
    3w-9xxx: scsi0: ERROR: (0x06:0x0036): Response queue (large) empty failed during reset sequence.
    3w-9xxx 0000:09:00.0: PCI INT A disabled
    
    • Ali Chehab
      Ali Chehab about 14 years
      If the card has been in the machine for years, and this is one of the fist times it's been offline, it's also possible that the mechanical connection was a little loose due to thermal expansion and contraction. I've seen this multiple times, a machine goes off line and when it comes back, some card doesn't want to work. Re-seat the care - remove, and reinsert - and it magically comes back to life. It's one of the fist things I do when I see something like this now.
    • antiduh
      antiduh about 14 years
      Thanks for the input, we did try to re-seat the card many times, tried difference pcie slots, etc, all to no avail.
    • rakslice
      rakslice over 12 years
      Did the card have the back-up battery installed? If so, the back-up battery would preserve the (apparently bad) state of the card even through a hard power off of the host. Edit: or moving it to a different host. But that fixed it. Whoops. =)
    • Admin
      Admin over 11 years
      IS been a long time, but I recently change my old dead 3650SE-8i(death caused: degraded hd) to a new one and all my array and my data are as they should be, so have faith. Don
  • antiduh
    antiduh about 14 years
    Yeah, this is what we see: "09:00.0 RAID bus controller: 3ware Inc 9650SE SATA-II RAID PCIe (rev 01)". That said, the fact that the drives dont' start and the card's bios doesn't show up during boot isn't very encouraging.
  • antiduh
    antiduh about 14 years
    Also, what do you mean by 'enable MSI'? That's not a bios option i'm familiar with. We're using PhoenixBIOS on the mainboard, if it's any help.
  • antiduh
    antiduh about 14 years
    Thanks, that's encouraging to hear. We're buying the exact same card we had before, don't want to change up anything unless we really have to.
  • antiduh
    antiduh about 14 years
    Ahh, google-fu needed some tweaking - mjmwired.net/kernel/Documentation/MSI-HOWTO.txt
  • James
    James about 14 years
    Yeah, this sounds like a totally different problem... our systems had a BIOS option to disable/enable MSI as well as the kernel bits. Good luck with the new card.
  • James
    James about 14 years
    Oh - have you tried re-flashing the firmware/BIOS as well? You can do it via a command line tool or the 3dm2 GUI.
  • Kendall
    Kendall almost 13 years
    I realize I'm bringing this back from the dead, but yes, 3Ware cards do store the information about the array setup on the first few blocks of the drives.
  • antiduh
    antiduh almost 13 years
    Checking through the Fedora releases, it was probably 12 or 13, likely 12, given that the machine we brought it to life in was a freshly installed machine. I don't remember anything about what drivers were installed, but they would've been whatever was available at the time.
  • antiduh
    antiduh almost 13 years
    As for the procedure, we had been trying desperately to boot it in the original machine - leaving it on, leaving it off, booting multiple times with power on, booting multiple times with hard-power-off between boots, reseating it, switching slots. Finally we pulled it out of the Silicon Mechanics machine and stuck it in the Fedora machine. From what I remember going on, we only had to boot it once for the OS to be able to read it, twice to get the BIOS to kick back in. You'll want to play with power - leaving it out of the machine for a while, leaving it on in the machine for a while, etc.
  • azazil
    azazil almost 4 years
    4 years later the cards are still working, no issues at all, fully supported by Ubuntu 18.04LTS, even cli and 3dm2 software is working fine.