Emulex FC HBA not resetting

9,020

Solution 1

I ended up replacing the HBA as the problem looked "hardware-ish" to me towards the end.

Solution 2

That HBA port is connected to a target port, just disconnect it for a moment, reset it then reconnect. Don't know what's the easiest way for you to do this as I don't know your SAN design but I'd imagine a port shutdown on your switch might be the easiest/quickest way.

Share:
9,020

Related videos on Youtube

the-wabbit
Author by

the-wabbit

Updated on September 18, 2022

Comments

  • the-wabbit
    the-wabbit almost 2 years

    A Solaris 11 machine with an oldish Emulex HBA seems to have some problems:

    # dmesg | egrep emlxs
    Dec 29 15:31:10 san2 emlxs: [ID 349649 kern.info] [ 5.0608]emlxs1: NOTICE: 730: Link reset. (Disabling link...)
    Dec 29 15:31:10 san2 emlxs: [ID 349649 kern.info] [ 5.0333]emlxs1: NOTICE: 710: Link down.
    Dec 29 15:31:11 san2 emlxs: [ID 349649 kern.info] [ B.1FD3]emlxs1: NOTICE: 930: Packet transport failed. (*Node not found. did=10300)
    

    When I try to reset the affected adapter using emlxadm (it is a dual HBA PCI-X card), it throws errors and does not reset:

    # /opt/EMLXemlxu/bin/emlxadm
    
    Available Emulex HBA's:
    
    1. FCT:emlxs0      : /devices/pci@0,0/pci8086,25e2@2/pci8086,350c@0,3/pci1014,1a7@1/pci10df,f980@4 (CONNECTED)
    2. FCT:emlxs1      : /devices/pci@0,0/pci8086,25e2@2/pci8086,350c@0,3/pci1014,1a7@1/pci10df,f980@5 (NOT CONNECTED)
    
    Enter an HBA number or zero to exit: 2
    
    HBA 2: /devices/pci@0,0/pci8086,25e2@2/pci8086,350c@0,3/pci1014,1a7@1/pci10df,f980@5
    
    emlxadm> get_fw_rev
    
    Firmware revision: LP9802DC 1.92a1
    
    emlxadm> reset_hard
    
    ioctl: FCIO_RESET_HARD: Device busy
    fcio:  FCIO_RESET_HARD: Operation successful.
    
    > exit
    
    # fcadm force-lip 10000000c93a3e6b
    Error: Failed to reinitialize the link of HBA 10000000c93a3e6b
    

    fmdump has no errors logged. A reboot seems to fix the problem and it appears only infrequently. The HBA is connected to a Brocade Silkworm with a 5.30 version of FabricOS.

    Updates: Firmware is the latest/greatest, driver version is 2.60f.

    I have tried resetting the port and even power-cycled the Silkworm, but the port stays in offline state, apparently because the Emulex HBA has switched off the GBIC PHY:

    SW3014BC2_B:admin> portshow 13
    portName:
    portHealth: No Fabric Watch License
    
    Authentication: None
    portDisableReason: None
    portCFlags: 0x1
    portFlags: 0x4001        PRESENT U_PORT LED
    portType:  4.2
    portState: 2    Offline
    portPhys:  4    No_Light
    

    Any ideas on possible causes or how to regain connectivity without rebooting?

    • pfo
      pfo over 12 years
      Upgrade firmware and driver to the newest release, if not possible replace HBA with other make as the drivers/hardware/optics seem to be crappy.
    • Chopper3
      Chopper3 over 12 years
      @pfo, I think he wants to do this without any server downtime as simple reboot would reset the HBAs
    • the-wabbit
      the-wabbit over 12 years
      @pfo there is nothing to upgrade unfortunately. I will try replacing the Emulex card by a QLogic one as I get my hands on the machine.
  • the-wabbit
    the-wabbit over 12 years
    Thanks for your input. I've even power-cycled the switch to no avail. The SAN topology is simplistic - every host has a dual-port HBA and connects to a total of two Silkworm swithes in a star topology. The switches are not interconnected themselves.