When is a TCP connection considered idle?

linux sockets tcp keep-alive retransmit-timeout

25,346

I have a connection where data is only sent from a server to a client at rather high rates.

Then you'll never see keepalives. Keepalives are sent when there is "silence on the wire". RFC1122 has some explanation re keepalives.

A "keep-alive" mechanism periodically probes the other end of a connection when the connection is otherwise idle, even when there is no data to be sent

Back to your question:

Some other sources state that this is the time a connection is idle, but they do not further define what this means.

This is how long TCP will wait before poking the peer "hoy! still alive?".

$ cat /proc/sys/net/ipv4/tcp_keepalive_time
7200

In other words, you've been using a TCP connection and it has been great. However, for the past 2 hours there hasn't been anything to send. Is it reasonable to assume the connection is still alive? Is it reasonable to assume all the middleboxes in the middle still have state about your connection? Opinions vary and keepalives aren't part of RFC793.

The TCP specification does not include a keep-alive mechanism it could: (1) cause perfectly good connections to break during transient Internet failures; (2) consume unnecessary bandwidth ("if no one is using the connection, who cares if it is still good?")

To test keepalive, we unplugged the cable on the client's NIC.

This isn't testing keepalive. This is testing your TCPs retransmit strategy, i.e. how many times and how often TCP will try to get your message across. On a Linux box this (likely) ends up testing net.ipv4.tcp_retries2:

How may times to retry before killing alive TCP connection. RFC 1122 says that the limit should be longer than 100 sec. It is too small number. Default value 15 corresponds to 13-30min depending on RTO.

But RFC5482 - TCP User Timeout Option provides more ways to influence it.

The TCP user timeout controls how long transmitted data may remain unacknowledged before a connection is forcefully closed.

Back to the question:

Is it correct that keep alive probes are not sent during retransmission

It makes sense: TCP is already trying to elicit a response from the other peer, an empty keepalive would be superfluous.

Linux-specific (2.4+) options to influence keepalive

TCP_KEEPCNT The maximum number of keepalive probes TCP should send before dropping the connection.

TCP_KEEPIDLE The time (in seconds) the connection needs to remain idle before TCP starts sending keepalive probes, if the socket option SO_KEEPALIVE has been set on this socket

TCP_KEEPINTVL The time (in seconds) between individual keepalive probes

Linux-specific (2.6.37+) option to influence TCP User Timeout

TCP_USER_TIMEOUT The maximum amount of time in milliseconds that transmitted data may remain unacknowledged before TCP will forcibly close connection.

So for example your application could use this option to determine how long the connection survives when there is no connectivity (similar to your NIC-unplugging example). E.g. if you have reason to believe the client will come back (perhaps they closed the laptop lid? spotty wireless access?) you can specify a timeout of 12 hours and when they do come back the connection will still function.

25,346

Author by

Jens

Updated on July 17, 2022

Comments

Jens almost 2 years

I have a requirement to enable TCP keepalive on any connections and now I am struggling with the results from our test case. I think this is because I do not really understand when the first keepalive probe is sent. I read the following in the documentation for tcp_keepalive_time on Linux:

the interval between the last data packet sent (simple ACKs are not considered data) and the first keepalive probe; after the connection is marked to need keepalive, this counter is not used any further

Some other sources state that this is the time a connection is idle, but they do not further define what this means. I also looked into Stevens to find a more formal definition of this, because I am wondering what "the last data packet sent" actually means when considering retransmissions.

In my test case, I have a connection where data is only sent from a server to a client at rather high rates. To test keepalive, we unplugged the cable on the client's NIC. I can now see that the network stack tries to send the data and enters the retransmission state, but no keep alive probe is sent. Is it correct that keep alive probes are not sent during retransmission?
Remy Lebeau almost 8 years

FYI, Linux 2.4+ has TCP_KEEPIDLE, TCP_KEEPINTVL and TCP_KEEPCNT options for setsockopt() to set the idle time before probing begins, the time interval between probes, and the max number of probes to send, respectively.
cnicutar almost 8 years

@RemyLebeau Neat. Feel free to edit the answer to point that out!
Jens almost 8 years

@cnicutar Do you have a source I can reference which describes that no keepalive probes are sent during retransmission? It seems that this differs from system to system, e.g. on Windows it seems to be different.
cnicutar almost 8 years

@Jens I don't have a source. It is not illegal to keep sending keepalive probes but it makes little sense: if TCP is already sending stuff, there is already reason for the other side to ACK (which would prove the connection is alive).
Kurt M almost 7 years

@cnicutar great answer. It could be improved slightly by also including information on TCP_USER_TIMEOUT on linux. It seems like the OP and others will want to know how to set per connection re-transmission timeout when searching for answers on why keep alive only works on connections that are idle or that are waiting for data that doesn't arrive.
cnicutar almost 7 years

@KurtM I added a few more details. Feel free to edit the answer to add anything you think is useful.