e1000e Reset adapter unexpectedly / Detected Hardware Unit Hang

networking linux-networking ethernet forwarding nic

79,144

Solution 1

Ok so after posting this question last night night I continued to do some research the only real solution I came across seems to have taken care of the problem.

Disabling TSO, GSO and GRO using ethtool:

ethtool -K eth0 gso off gro off tso off

According to a post found here: http://ehc.ac/p/e1000/bugs/378/

From what I understand this will or can cause a reduction in performance.

I also noticed another solution was to disable Active-State Power Management

pcie_aspm=off

According to this post on serverfault: Linux e1000e (Intel networking driver) problems galore, where do I start?

I haven’t tried this solution yet. I will try it and see if that makes a difference and post back my findings.

EDIT:

Ok so I have tried turning off Active-State Power Management, pcie_aspm=off and this didn't have any effect. I continued to notice errors in my log file.

This may still work for some as some of the Intel nics have issues with different kernels of falling asleep when power management is enabled.

Solution 2

Disabling Enhanced C1 (C1E) in the BIOS fixed it for me.

Not sure if the lower power state of C1E is messing with the driver, or that there's an oops in the driver when the processor is in this state.

Anyway, problem solved.

Solution 3

Disabling only TCP Segmentation Offload (TSO) does the trick for me.

ethtool -K eth0 tso off

Note: It does not seem to be necessary to also disable Generic Receive Offload (GRO) and Generic Segmentation Offload (GSO), as it is recommended by various sources. As far as I learned, these are implemented purely in software, and should be safe. Don't sacrifice more performance than necessary.

Solution 4

I had the issue (triggering same kernel error as you and userspace SSH errors like "Corrupted MAC on input").

Solution

What worked for me was to disable TCP checksum offloading :

# ethtool -K eth0 tx off rx off

Clean & long-term integration of this with debian-ish /etc/network/interfaces:

#!/bin/bash
#
# Disables TCP offloading on all ifaces
#
# Inspired by: @Michelunik https://serverfault.com/a/422554/62953

RUN=true
case "${IF_NO_TOE,,}" in
    no|off|false|disable|disabled)
        RUN=false
    ;;
esac


# Other offloading options that could be disabled (not TCP related):
#  sg tso ufo gso gro lro rxvlan txvlan rxhash
# see man ethtool

if [ "$MODE" = start -a "$RUN" = true ]; then
  TOE_OPTIONS="rx tx"
  for TOE_OPTION in $TOE_OPTIONS; do
    /sbin/ethtool --offload "$IFACE" "$TOE_OPTION" off &>/dev/null || true
  done
fi

source, inspiration.

Context

Debian Jessie
Kernel 4.7.0-0.bpo.1-amd64
lspci 00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I218-V (rev 04)

View more solutions

79,144

Kyle Coots

Hey Everyone! My name is Kyle Coots I am a Freelance Web Developer. I have been building websites and fixing computers since I was a teenager (almost 25 years). I love technology in general. I also love hunting and fishing (fishing more :) ), spending time with my wife and kids, and working on cars too. If you need anything related to website development or computer problems reach out to me I will do what I can to help. Sincerely, Kyle Coots

Updated on September 18, 2022

Comments

Kyle Coots over 1 year

I have a Dell 1U Server with Intel(R) Xeon(R) CPU L5420 @ 2.50GHz, 8 cores running Ubuntu Server Kernel Version 3.13.0-32-generic on x86_64. It has dual 1000baseT networking cards. I have it set up to forward packets from eth0 to eth1.

I have noticed that in my kern.log file it keeps hanging then resting. This is happening often. This happens every few second then maybe it will be ok for a few minutes then back to every few seconds.

Here is the log file dump:

 [118943.768245] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
 [118943.768245]   TDH                  <45>
 [118943.768245]   TDT                  <50>
 [118943.768245]   next_to_use          <50>
 [118943.768245]   next_to_clean        <43>
 [118943.768245] buffer_info[next_to_clean]:
 [118943.768245]   time_stamp           <101c48d04>
 [118943.768245]   next_to_watch        <45>
 [118943.768245]   jiffies              <101c4970f>
 [118943.768245]   next_to_watch.status <0>
 [118943.768245] MAC Status             <80283>
 [118943.768245] PHY Status             <792d>
 [118943.768245] PHY 1000BASE-T Status  <7800>
 [118943.768245] PHY Extended Status    <3000>
 [118943.768245] PCI Status             <10>
 [118944.780015] e1000e 0000:00:19.0 eth0: Reset adapter unexpectedly

Here is the info from ethtool:

Settings:

Settings for eth0:

Supported ports: [ TP ]
Supported link modes:   10baseT/Half 10baseT/Full 
                        100baseT/Half 100baseT/Full 
                        1000baseT/Full 
Supported pause frame use: No
Supports auto-negotiation: Yes
Advertised link modes:  10baseT/Half 10baseT/Full 
                        100baseT/Half 100baseT/Full 
                        1000baseT/Full 
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
MDI-X: off (auto)
Supports Wake-on: pumbg
Wake-on: g
Current message level: 0x00000007 (7)
               drv probe link
Link detected: yes

Driver info:

ethtool -i eth0

driver: e1000e
version: 2.3.2-k
firmware-version: 1.4-0
bus-info: 0000:00:19.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

What could be causing this? Is this just a bug in the software or a actual hardware issue? I have seen many other having similar issues but no real solution and this also leads me to believe that its a software issue?

Maybe someone can shed some light on this for me?

Admin almost 8 years

Seems to be the problem is known: bugzilla.kernel.org/show_bug.cgi?id=47331

Peter about 9 years

Thanks! I tried the ethtool fix, and it solved my issue. (also stuck it in an init script)
Tails almost 8 years

This was exactly the fix that worked for me. Running Ubuntu 16.04 LTS on a ASRock H170M-ITX/DL motherboard. Thanks SteveG. =)
godzillante over 7 years

Hi, do you know if running ethtool -K eth0 gso off gro off tso off will drop the connection, even for a short time?
Oleg Gryb over 6 years

Indeed, disabling options with ethtool helped, disabling power management options didn't
Mike McCabe almost 6 years

'According to a post found here: ehc.ac/p/e1000/bugs/378' above now goes to a domainsquatter, original content can be found here: web.archive.org/web/20160205153351/http://ehc.ac:80/p/e1000/‌…
Flatron over 5 years

mind that this may increase the servers power consumption a lot!
Anuj Shah over 3 years

Worked for me with on CentOS 7, Kernel 3.10.0-1160.11.1.el7.x86_64, Device: 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-LM (rev 31)
Luc H about 3 years

@godzillante for future reference: It can drop the connection for a couple of seconds, however clients will not be disconnected unless they timeout depending on your application.
laimison over 2 years

no downtime noticed too
user249654 about 2 years

Intel NUC BOXNUC8i7BEH2 sudo ethtool -K eno1 tso off gso off