e1000e Reset adapter unexpectedly / Detected Hardware Unit Hang

79,144

Solution 1

Ok so after posting this question last night night I continued to do some research the only real solution I came across seems to have taken care of the problem.

Disabling TSO, GSO and GRO using ethtool:

ethtool -K eth0 gso off gro off tso off

According to a post found here: http://ehc.ac/p/e1000/bugs/378/

From what I understand this will or can cause a reduction in performance.

I also noticed another solution was to disable Active-State Power Management

pcie_aspm=off

According to this post on serverfault: Linux e1000e (Intel networking driver) problems galore, where do I start?

I haven’t tried this solution yet. I will try it and see if that makes a difference and post back my findings.

EDIT:

Ok so I have tried turning off Active-State Power Management, pcie_aspm=off and this didn't have any effect. I continued to notice errors in my log file.

This may still work for some as some of the Intel nics have issues with different kernels of falling asleep when power management is enabled.

Solution 2

Disabling Enhanced C1 (C1E) in the BIOS fixed it for me.

Not sure if the lower power state of C1E is messing with the driver, or that there's an oops in the driver when the processor is in this state.

Anyway, problem solved.

Solution 3

Disabling only TCP Segmentation Offload (TSO) does the trick for me.

ethtool -K eth0 tso off

Note: It does not seem to be necessary to also disable Generic Receive Offload (GRO) and Generic Segmentation Offload (GSO), as it is recommended by various sources. As far as I learned, these are implemented purely in software, and should be safe. Don't sacrifice more performance than necessary.

Solution 4

I had the issue (triggering same kernel error as you and userspace SSH errors like "Corrupted MAC on input").

Solution

What worked for me was to disable TCP checksum offloading :

# ethtool -K eth0 tx off rx off

Clean & long-term integration of this with debian-ish /etc/network/interfaces:

#!/bin/bash
#
# Disables TCP offloading on all ifaces
#
# Inspired by: @Michelunik https://serverfault.com/a/422554/62953

RUN=true
case "${IF_NO_TOE,,}" in
    no|off|false|disable|disabled)
        RUN=false
    ;;
esac


# Other offloading options that could be disabled (not TCP related):
#  sg tso ufo gso gro lro rxvlan txvlan rxhash
# see man ethtool

if [ "$MODE" = start -a "$RUN" = true ]; then
  TOE_OPTIONS="rx tx"
  for TOE_OPTION in $TOE_OPTIONS; do
    /sbin/ethtool --offload "$IFACE" "$TOE_OPTION" off &>/dev/null || true
  done
fi

source, inspiration.

Context

  • Debian Jessie
  • Kernel 4.7.0-0.bpo.1-amd64
  • lspci 00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I218-V (rev 04)
Share:
79,144

Related videos on Youtube

Kyle Coots
Author by

Kyle Coots

Hey Everyone! My name is Kyle Coots I am a Freelance Web Developer. I have been building websites and fixing computers since I was a teenager (almost 25 years). I love technology in general. I also love hunting and fishing (fishing more :) ), spending time with my wife and kids, and working on cars too. If you need anything related to website development or computer problems reach out to me I will do what I can to help. Sincerely, Kyle Coots

Updated on September 18, 2022

Comments

  • Kyle Coots
    Kyle Coots over 1 year

    I have a Dell 1U Server with Intel(R) Xeon(R) CPU L5420 @ 2.50GHz, 8 cores running Ubuntu Server Kernel Version 3.13.0-32-generic on x86_64. It has dual 1000baseT networking cards. I have it set up to forward packets from eth0 to eth1.

    I have noticed that in my kern.log file it keeps hanging then resting. This is happening often. This happens every few second then maybe it will be ok for a few minutes then back to every few seconds.

    Here is the log file dump:

     [118943.768245] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
     [118943.768245]   TDH                  <45>
     [118943.768245]   TDT                  <50>
     [118943.768245]   next_to_use          <50>
     [118943.768245]   next_to_clean        <43>
     [118943.768245] buffer_info[next_to_clean]:
     [118943.768245]   time_stamp           <101c48d04>
     [118943.768245]   next_to_watch        <45>
     [118943.768245]   jiffies              <101c4970f>
     [118943.768245]   next_to_watch.status <0>
     [118943.768245] MAC Status             <80283>
     [118943.768245] PHY Status             <792d>
     [118943.768245] PHY 1000BASE-T Status  <7800>
     [118943.768245] PHY Extended Status    <3000>
     [118943.768245] PCI Status             <10>
     [118944.780015] e1000e 0000:00:19.0 eth0: Reset adapter unexpectedly
    

    Here is the info from ethtool:

    Settings:

    Settings for eth0:
    
    Supported ports: [ TP ]
    Supported link modes:   10baseT/Half 10baseT/Full 
                            100baseT/Half 100baseT/Full 
                            1000baseT/Full 
    Supported pause frame use: No
    Supports auto-negotiation: Yes
    Advertised link modes:  10baseT/Half 10baseT/Full 
                            100baseT/Half 100baseT/Full 
                            1000baseT/Full 
    Advertised pause frame use: No
    Advertised auto-negotiation: Yes
    Speed: 1000Mb/s
    Duplex: Full
    Port: Twisted Pair
    PHYAD: 1
    Transceiver: internal
    Auto-negotiation: on
    MDI-X: off (auto)
    Supports Wake-on: pumbg
    Wake-on: g
    Current message level: 0x00000007 (7)
                   drv probe link
    Link detected: yes
    

    Driver info:

    ethtool -i eth0
    
    driver: e1000e
    version: 2.3.2-k
    firmware-version: 1.4-0
    bus-info: 0000:00:19.0
    supports-statistics: yes
    supports-test: yes
    supports-eeprom-access: yes
    supports-register-dump: yes
    supports-priv-flags: no
    

    What could be causing this? Is this just a bug in the software or a actual hardware issue? I have seen many other having similar issues but no real solution and this also leads me to believe that its a software issue?

    Maybe someone can shed some light on this for me?

  • Peter
    Peter about 9 years
    Thanks! I tried the ethtool fix, and it solved my issue. (also stuck it in an init script)
  • Tails
    Tails almost 8 years
    This was exactly the fix that worked for me. Running Ubuntu 16.04 LTS on a ASRock H170M-ITX/DL motherboard. Thanks SteveG. =)
  • godzillante
    godzillante over 7 years
    Hi, do you know if running ethtool -K eth0 gso off gro off tso off will drop the connection, even for a short time?
  • Oleg Gryb
    Oleg Gryb over 6 years
    Indeed, disabling options with ethtool helped, disabling power management options didn't
  • Mike McCabe
    Mike McCabe almost 6 years
    'According to a post found here: ehc.ac/p/e1000/bugs/378' above now goes to a domainsquatter, original content can be found here: web.archive.org/web/20160205153351/http://ehc.ac:80/p/e1000/‌​…
  • Flatron
    Flatron over 5 years
    mind that this may increase the servers power consumption a lot!
  • Anuj Shah
    Anuj Shah over 3 years
    Worked for me with on CentOS 7, Kernel 3.10.0-1160.11.1.el7.x86_64, Device: 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-LM (rev 31)
  • Luc H
    Luc H about 3 years
    @godzillante for future reference: It can drop the connection for a couple of seconds, however clients will not be disconnected unless they timeout depending on your application.
  • laimison
    laimison over 2 years
    no downtime noticed too
  • user249654
    user249654 about 2 years
    Intel NUC BOXNUC8i7BEH2 sudo ethtool -K eno1 tso off gso off