Debugging dropped UDP messages on Linux

5,067

This is expected behavior when you have unmatched speed and are running mismatched speeds. If you are able to saturate the 1GB link, the other end will have read only 100 packets by the time you have sent the 1000 packets. It is unlikely your router will buffer the remaining 900 packets.

UDP is an unreliable protocol. Unlike TCP it does not come with a built-in reliable delivery.

It may help to run a similar test with TCP connections. Running it in both directions may help to determine if the issue is unidirectional.

Running time on the processes may tell give an idea if one of the processes is running slower then the other. netstat -i before and after the running the test will allow you to calculate how much data arrived, and see if any error were generated.

ethtool may tell you if one of the hosts is in half-duplex mode. Half-duplex connections are prone to issues such as you are seeing. If there are cabling or other issues, the connection may fall back to 10 Mbit half-duplex in one or both directions.

If the switch is managed, then should check the configuration and error counters on the relevant ports.

If the two systems have different Ethernet hardware, that may be the issue. Some hardware just can't handle a saturated link.

Share:
5,067

Related videos on Youtube

gnr
Author by

gnr

Updated on September 18, 2022

Comments

  • gnr
    gnr almost 2 years

    Here's my setup: I have 1 host that has a 1 Gbit Ethernet connection and 2 hosts with 100 mbit connections (connected to the 1Gbit host through different switches).

    In a test, I send 1000 1kb messages from the 1Gbit host to the 100 mbit hosts (with no delay in btwn sendto() calls). For one of the 100 mbit hosts, no packets get dropped. The other though has no drops until around the 100th and then starts dropping the majority of the remaining. Very reproducible. When I introduce a 1ms delay, there are no drops on either host.

    I'd like to know why there is different behavior btwn the two hosts.

    What are some methods/tools I should use to track this down? I am using Linux 6.8. And my rmem_max is set to 10MB on both hosts.

    • Mark Riddell
      Mark Riddell about 8 years
      It sounds as if your buffers are being overrun between the 1GB host and one of the 100mbit hosts. You would need to investigate whether it is the switch that is dropping the packets or the host itself.
    • gnr
      gnr about 8 years
      @MarkoPolo yes that is what I was thinking too - any suggestions on how to investigate that? I have all the normal Linux tools
    • Mark Riddell
      Mark Riddell about 8 years
      netstat-s will show you a value for packet receive errors under the UDP section - that would include packets dropped because of buffer overflows.
    • Ron Maupin
      Ron Maupin about 8 years
      It is unlikely the problem is on the hosts, or with UDP. The problem will be a layer-2 problem. Switches have very tiny buffers. One premise in networking is that it is better to drop traffic sooner so that the upper-layer protocols/applications notice it missing sooner. You need access to the switches to check for buffer overruns on the switch interfaces. UDP is a fire-and-forget protocol with no guaranteed delivery. Applications which use UDP must be OK with that, or need to have their own schemes to request lost traffic be resent.
  • gnr
    gnr about 8 years
    Completely agree - except somehow there is a software setting or hardware setup on one of the hosts which allows me to saturate the 1GB link and not lose messages on the 100 Mbit host - is it possible that the switch would somehow know to buffer/throttle the messages?
  • BillThor
    BillThor about 8 years
    @gnr. I've added a list of things I have used in similar instances.