TCP Receive window

14,871

Solution 1

I can guess two things from the sample you have provided:

  1. The server has a send buffer of approx 15k.
  2. The dump you provide was done at the server end.

For the window of a TCP connection to scale to a certain size, both the send buffer on the sender and the receive buffer on the receiver must be big enough.

The actual window used is the minimum of the receive window offered/requested by the receiver and the sender's OS-set send buffer size.

Long story short, you need to configure the send buffer size on the server.

To clear things up, let's analyse your sample packet by packet.

The server sends another bunch of data:

 22 2.005109    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=19305 Ack=1 Win=65536 Len=1460
 23 2.005116    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=20765 Ack=1 Win=65536 Len=1460
 24 2.005121    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=22225 Ack=1 Win=65536 Len=1460
 25 2.005128    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      946    21500 > 57353 [PSH, ACK] Seq=23685 Ack=1 Win=65536 Len=892

Notice the PSH. That's a flag indicating to any hops in between that a complete chunk of data has been sent and please send it to the other end. (A "complete" chunk being your 8kb in this case)

While the server is still sending, it gets 2 ACKS:

 26 2.005154    CCC.CCC.CCC.CCC       sss.sss.sss.sss       TCP      60     57353 > 21500 [ACK] Seq=1 Ack=14601 Win=99999744 Len=0
 27 2.007106    CCC.CCC.CCC.CCC       sss.sss.sss.sss       TCP      60     57353 > 21500 [ACK] Seq=1 Ack=16385 Win=99999744 Len=0
 

Note in particular the numbers: Ack=14601 and Ack=16385. Those numbers are the sequence numbers of the packets the receiver is acknowledging.

Ack=14601 means "I have received everything up to seq no 14601".

Note also these are older data, not in the sample you have given.

So the server processes those ACKs and continues sending data:

 28 2.007398    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=24577 Ack=1 Win=65536 Len=1460
 29 2.007401    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=26037 Ack=1 Win=65536 Len=1460
 30 2.007403    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=27497 Ack=1 Win=65536 Len=1460
 31 2.007404    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=28957 Ack=1 Win=65536 Len=1460
 32 2.007406    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=30417 Ack=1 Win=65536 Len=1460
 33 2.007408    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      946    21500 > 57353 [PSH, ACK] Seq=31877 Ack=1 Win=65536 Len=892

Here we have a complete block of data: 1460*5+892 == 8192.

Then, 0.443 ms after sending that last packet, it gets one more ACK:

 34 2.007883    CCC.CCC.CCC.CCC       sss.sss.sss.sss       TCP      60     57353 > 21500 [ACK] Seq=1 Ack=19305 Win=99999744 Len=0

And then there is a delay of almost exactly 250ms, during which the server sends nothing, before it receives these:

 35 2.257143    CCC.CCC.CCC.CCC       sss.sss.sss.sss       TCP      60     57353 > 21500 [ACK] Seq=1 Ack=22225 Win=99999744 Len=0
 36 2.257160    CCC.CCC.CCC.CCC       sss.sss.sss.sss       TCP      60     57353 > 21500 [ACK] Seq=1 Ack=24577 Win=99999744 Len=0
 

And then continues sending:

 37 2.257358    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=32769 Ack=1 Win=65536 Len=1460
 38 2.257362    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=34229 Ack=1 Win=65536 Len=1460
 39 2.257364    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=35689 Ack=1 Win=65536 Len=1460
 40 2.257365    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=37149 Ack=1 Win=65536 Len=1460

There are two very interesting things to notice here.
First, how many bytes were sent by the server without waiting for an ACK. Te last ACK seq no the server received before that delay is Ack=19305, and the seq no of the last packet sent by the server at that point is Seq=30417.

There so during that pause, there are 11112 bytes that the server has sent that have not yet been ACKed by the client.

Second, that was one ACK received by the server an instant after it sent a bunch of data, that didn't trigger it to send more. It's as if that ACK wasn't good enough.

The ACK received before that was Ack=16385, giving 30417-16385=14032 bytes that were sent by the server unacknowledged at that point. Only after receiving an ACK for seq no 24577, reducing that count to 30417-24577=5840, did the server start sending again.

So the fact that buffer size of 8k is large compared to the effective window size of 16k means throughput is actually reduced somewhat because the server will not send any of the 8k block until there is room for all of it.

Lastly, for those that are wondering, there is a TCP option called window scaling which allows one end of a connection to declare that the window size is actually some multiple of the number in the TCP header. see RFC 1323. The option is passed in the SYN packets so they aren't visible mid-connection - there is only a hint that window scaling is in effect because the window size TCP header is smaller than the window that is being used.

Solution 2

You can't set a receive buffer size of >= 64k once the socket is connected. You have to do it first. In the case of a server that means setting the receive buffer size on the listening socket: accepted sockets inherit it from the socket they are accepted from. If you don't do this, the TCP window scaling option cannot be negotiated so the peers have no way of telling each other about the size over 64k.

Share:
14,871
fabrizi0
Author by

fabrizi0

Updated on June 04, 2022

Comments

  • fabrizi0
    fabrizi0 almost 2 years

    I am trying to understand how the receiver window affect the throughput over a high latency connection.

    I have a simple client-server pair of apps on two machines, far apart, with the connection between the two of 250mSec latency RTT. I ran this test with both Windows (XP, 7), and Linux (Ubuntu 10.x), with the same results, so for simplicity let's assume the case of: Client receiving data: WinXP Pro Server sending data: Win7 Pro Again, latency is 250mSec RTT.

    I run my TCP test without changing the receiver buffer size on the client (default is 8Kb), and I see on the wire (using Wireshark):

    • the client send ACKS to the server and the TCP packets contains RWIN=65k
    • server send data and report RWIN=65k

    Looking at the trace I see a bursts of 3-4 packets (with a payload of 1460 bytes), immediately followed by the ACK sent from the client machine to the server, then nothing for approx 250mSec then a new burst of packets from the server to the client.

    So, in conclusion it appears that the server doesn't send new data even before it fills up the receiver's window.

    To do more tests, I also ran the same test this time changing the receiver's buffer size on the client machine (on Windows, changing the receiver's buffer size ends up affecting the RWIN advertised by the machine). I would expect to see large burst of packets before blocking for ACK... and at least a higher throughput.

    In this case I set recv buffer size to 100,000,000. The packets from the client to the server have now a RWIN=99,999,744 (well, that's nice), but unfortunately the pattern of the data sent FROM the server to the client is still the same: a short burst followed by a long wait. To confirm also what I see on the wire, I also measure the time to send a chunk of data from the server to the client. I don't see ANY changes in using a large RWIN or using the default.

    Can anybody help me understanding why changing the RWIN doesn't really affect the throughput?

    Few notes: - server send data as fast as possible using write() of chunks of 8Kb - as I said before, I see similar effects using Linux as well. changing the receiver buffer size affects the RWIN used by a node, but the throughput remains the same. - I analyze the trace after several hundred packets, to give enough time to the TCP slow start mechanism to enlarge the CWIN size.


    As suggested, I'm adding a small snapshot of a wire trace here

    No.     Time        Source                Destination           Protocol Length Info
         21 2.005080    CCC.CCC.CCC.CCC       sss.sss.sss.sss       TCP      60     57353 > 21500 [ACK] Seq=1 Ack=11681 Win=99999744 Len=0
         22 2.005109    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=19305 Ack=1 Win=65536 Len=1460
         23 2.005116    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=20765 Ack=1 Win=65536 Len=1460
         24 2.005121    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=22225 Ack=1 Win=65536 Len=1460
         25 2.005128    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      946    21500 > 57353 [PSH, ACK] Seq=23685 Ack=1 Win=65536 Len=892
         26 2.005154    CCC.CCC.CCC.CCC       sss.sss.sss.sss       TCP      60     57353 > 21500 [ACK] Seq=1 Ack=14601 Win=99999744 Len=0
         27 2.007106    CCC.CCC.CCC.CCC       sss.sss.sss.sss       TCP      60     57353 > 21500 [ACK] Seq=1 Ack=16385 Win=99999744 Len=0
         28 2.007398    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=24577 Ack=1 Win=65536 Len=1460
         29 2.007401    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=26037 Ack=1 Win=65536 Len=1460
         30 2.007403    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=27497 Ack=1 Win=65536 Len=1460
         31 2.007404    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=28957 Ack=1 Win=65536 Len=1460
         32 2.007406    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=30417 Ack=1 Win=65536 Len=1460
         33 2.007408    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      946    21500 > 57353 [PSH, ACK] Seq=31877 Ack=1 Win=65536 Len=892
         34 2.007883    CCC.CCC.CCC.CCC       sss.sss.sss.sss       TCP      60     57353 > 21500 [ACK] Seq=1 Ack=19305 Win=99999744 Len=0
         35 2.257143    CCC.CCC.CCC.CCC       sss.sss.sss.sss       TCP      60     57353 > 21500 [ACK] Seq=1 Ack=22225 Win=99999744 Len=0
         36 2.257160    CCC.CCC.CCC.CCC       sss.sss.sss.sss       TCP      60     57353 > 21500 [ACK] Seq=1 Ack=24577 Win=99999744 Len=0
         37 2.257358    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=32769 Ack=1 Win=65536 Len=1460
         38 2.257362    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=34229 Ack=1 Win=65536 Len=1460
         39 2.257364    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=35689 Ack=1 Win=65536 Len=1460
         40 2.257365    sss.sss.sss.sss       CCC.CCC.CCC.CCC       TCP      1514   21500 > 57353 [ACK] Seq=37149 Ack=1 Win=65536 Len=1460
    

    As you see, the server stop sending data at packet #33.

    Client send ACK at packet #34 of an old packet (seq=19305, sent on packet #20, not shown here). With an RWIN of 100Mb, I would expect the server NOT to block for a while.

    After 20-30 packets, the congestion window on the server side should be large enough to send more packets than I see... I assume the congestion window eventually is going to grow up to the RWIN... but still, even after hundred of packets, the pattern is the same: data data then block for 250mSec...

    • JustinDanielson
      JustinDanielson about 12 years
      Is either machine on wireless? Can you print the sequence numbers being sent back and fourth from the server/client in the packets and acks. There could be a lot of noise on the client or server side causing data loss.
  • fabrizi0
    fabrizi0 about 12 years
    Trust me, the sender buffer is always full. My application writes as fast as possible. It's a for loop that calls write() of a 8Kb buffer continuously.
  • fabrizi0
    fabrizi0 about 12 years
    Justin. There is no packet loss. I have verified that. I will try to get a printable trace off Wireshark and post it here. Thanks!!!
  • JustinDanielson
    JustinDanielson about 12 years
    The ECN bit could be set by the router. Which would moderate the rate at which data is sent. The client may send 5 packets and the router is setting the ECN bit or dropping the packet. The ECN bit is a way to notify either side if the network is becoming congested. en.wikipedia.org/wiki/Explicit_Congestion_Notification
  • JustinDanielson
    JustinDanielson about 12 years
    If it's a high latency network, it's likely there are many hops along the way. The receiver window can be as big as the bottleneck somewhere in between. But if that bottleneck is dropping packets, the receiver would have never known they were sent. So it'll ack back as many packets as it got, while the server will timeout waiting to receive perhaps 10 acks rather than the 4 it got.
  • fabrizi0
    fabrizi0 about 12 years
    I can try to re-run the test and capture a trace on both end and try to manually reconstruct what is really happening... but that will take some time. Any tools you would recommend to help me tracing the internal state?
  • JustinDanielson
    JustinDanielson about 12 years
    The window size is only 16 bits. I don't understand how you could send a value > 2^17-1. Try setting the window size to 131071 or 65536 and see how it responds. Can you wireshark on the server side or no? Could you post some more lines above 21? The ACK sequence 19305, 22225, 24577 makes me believe something was dropped. Are 19305 and 20765 transmitted twice?
  • JustinDanielson
    JustinDanielson about 12 years
    ACKs are never sent for 20765 or 23685.
  • JustinDanielson
    JustinDanielson about 12 years
    I don't know of any tools that will do it for you. You could pull the data into a text file and sort by timestamps.
  • fabrizi0
    fabrizi0 about 12 years
    Ok, RWIN can be max 65k, but there is extension RFC-1323 that basically allow transmitting a scaling factor in the SYN packet, in the optional header of the TCP packet. That basically can extend the RWIN beyond the old traditional 65Kb.
  • fabrizi0
    fabrizi0 about 12 years
    The client ack up to 24577 on packet #36. A receiver doesn't have to ack for every individual packet. If the client send ACK #24577, it means it ack all the packets UP TO #24577.
  • JustinDanielson
    JustinDanielson about 12 years
    But if there is no loss, there would not be a scenario in which the client would skip a sequence number. Doesn't the client send an ACK everytime it receives a packet.
  • fabrizi0
    fabrizi0 about 12 years
    No Justin. If the client detect a missed packet (by looking at the seq. number) it re-send the previous ACK. Wireshark mark this as ACK as DUP of a previous ACK. The client does NOT send ACK for every packet received.
  • JustinDanielson
    JustinDanielson about 12 years
    Ok... hrm. The only other thing I can think is that a buffer somewhere on the physical layer between the client and server is filling up. Is this wireshark above from the server or client's perspective?
  • fabrizi0
    fabrizi0 about 12 years
    Michael, server send buffer was 8kb for that trace (the default Window size). The trace I believe was captured at the client end. But you got it right: the problem is the SEND buffer that is limiting the window. I have just run a new test with a larger sender window on the server side, and I can clearly see much longer burst of data before blocking. The graph in Wireshark clearly show how the congestion window double every time an ACK is received.