Packet loss with RX error packets

linux networking centos route packet

19,622

I's difficult to tell, what is your operating system version and patch level?
Here are some things you could check :

Your network switch may be polluted/overloaded by a broken card generating garbage. Check the errors on the switch.
If you are using "virtual connect" on your enclosure, you could check the error on the virtual connect box.
If you are using "pass through" connection on the enclosure , check if all your connection are going through the same pass through pannel, maybe it must be replaced. Pass through are not passive, they have their own firmware too check with your HP support if its recent.
If you recently patched your Operating system, try to install the supported driver of the last SPP (support pack, only for redhat or SuSE I'm afraid), the drivers may be less recent but they are supported by HP.
Maybe its a simple duplex mismatch, check the speed on the switch and on the host (ethtool ethX)

19,622

user3744406

Updated on September 18, 2022

Comments

user3744406 over 1 year

I am seeing packet loss from all my blade servers within the HP Blade C7000 enclosure.

# ping 192.168.2.140
PING 192.168.2.140 (192.168.2.140) 56(84) bytes of data.
64 bytes from 192.168.2.140: icmp_seq=1 ttl=64 time=0.210 ms
64 bytes from 192.168.2.140: icmp_seq=2 ttl=64 time=0.185 ms
64 bytes from 192.168.2.140: icmp_seq=4 ttl=64 time=0.206 ms
64 bytes from 192.168.2.140: icmp_seq=6 ttl=64 time=0.164 ms
64 bytes from 192.168.2.140: icmp_seq=7 ttl=64 time=0.210 ms
64 bytes from 192.168.2.140: icmp_seq=8 ttl=64 time=0.213 ms
64 bytes from 192.168.2.140: icmp_seq=9 ttl=64 time=0.213 ms

--- 192.168.2.140 ping statistics ---
9 packets transmitted, 7 received, 22% packet loss, time 8000ms
rtt min/avg/max/mdev = 0.164/0.200/0.213/0.018 ms

# ping 192.168.2.165
 PING 192.168.2.165 (192.168.2.165) 56(84) bytes of data.
 64 bytes from 192.168.2.165: icmp_seq=1 ttl=64 time=0.990 ms
 64 bytes from 192.168.2.165: icmp_seq=3 ttl=64 time=0.204 ms
 64 bytes from 192.168.2.165: icmp_seq=4 ttl=64 time=0.165 ms
 64 bytes from 192.168.2.165: icmp_seq=5 ttl=64 time=0.168 ms

 --- 192.168.2.165 ping statistics ---
 6 packets transmitted, 4 received, 33% packet loss, time 5001ms
 rtt min/avg/max/mdev = 0.165/0.381/0.990/0.352 ms


$ /sbin/ifconfig
eth0      Link encap:Ethernet  HWaddr E4:11:5B:D0:36:B0
          inet addr:192.168.2.163  Bcast:192.168.2.255  Mask:255.255.255.192
          inet6 addr: fe80::e611:5bff:fed0:36b0/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:44022 errors:29632 dropped:0 overruns:0 frame:10025
          TX packets:42694 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:4345443 (4.1 MiB)  TX bytes:4549025 (4.3 MiB)

$ netstat -i
Kernel Interface table
Iface       MTU Met    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0       1500   0     5124  32118      0      0     2651      0      0      0 BMRU
lo        16436   0     4522      0      0      0     4522      0      0      0 LRU


 $ route -n
 Kernel IP routing table
 Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
 192.168.4.128   0.0.0.0         255.255.255.192 U     0      0        0 eth0
 169.254.0.0     0.0.0.0         255.255.0.0     U     0      0        0 eth0
 0.0.0.0         192.168.4.134   0.0.0.0         UG    0      0        0 eth0

# ethtool -S eth0 | grep -i errors
  rx_errors: 43821
  tx_errors: 0
  rx_crc_errors: 14812
  rx_alignment_symbol_errors: 0
  rx_in_range_errors: 0
  rx_out_range_errors: 0
  rx_address_match_errors: 150211
# ethtool -S eth0 | grep -i drops
  rx_drops_no_pbuf: 0
  rx_drops_no_txpb: 0
  rx_drops_no_erx_descr: 0
  rx_drops_no_tpre_descr: 0
  rx_drops_too_many_frags: 0
  rx_drops_invalid_ring: 3215
  rx_drops_mtu: 0
  rx_drops_no_fragments: 0

Network configuration looks fine .

How to solve the issue?

klerk about 10 years

Do you see some errors from netstat -i ?
user3744406 about 10 years

Added netstat -i output
S edwards about 10 years

have you consider to change the cable ? so much RX errors... Also you should try to connect to the interface directly with a cross cable to check if the problem is internal or comes from environnement (other computer on network, bad switch...)
klerk about 10 years

you should check device which is first connected to the enclosure, here is no droped packets just packets received with errors
user3744406 about 10 years

Changed few cables , but no luck. Tomorrow will try to replace all cables and will see if it helps . Any other suggestions to resolve the issue ? Also added more info in original description of problem
vonbrand about 10 years

Any cables passing close to electric power cables? Who did the patch panel, is it certified? Ditto the cables? Any too-near-to-the-cabling fluorescent lamp (they generate sufficient interference to make some network cards just give up)? Is the target nearby? If not, try from a different origin, maybe the problem is "outside"?
bahamat about 10 years

Since you've already tried cables, I suspect the physical port. Does this happen on any other hosts/interfaces on the same subnet or just this one?
user3744406 about 10 years

All hosts sharing same subnet within one blade enclosure are showing this problem . Another batch of hosts within the same subnet on another blade enclosure are working fine . Servers from both the enclosures are connected to the same switch .
user3744406 about 10 years

Also the packet loss problem started arising from one of the blade enclosure since saturday morning . Earlier it used to work fine.
user3744406 about 10 years

Issue was temporary resolved for now . Disabled interface eth0 and enabled eth1 and packet transfer seems to be working fine . Not sure exactly what is the issue , may be a case of bad port or there is some changes on network stack . I will check with Network team further on this