Does a TCP socket connection have a "keep alive"?

java sockets http tcp keep-alive

175,961

Solution 1

TCP sockets remain open till they are closed.

That said, it's very difficult to detect a broken connection (broken, as in a router died, etc, as opposed to closed) without actually sending data, so most applications do some sort of ping/pong reaction every so often just to make sure the connection is still actually alive.

Solution 2

Does a TCP socket connection have a "keep alive"?

The short answer is yes there is a timeout enforced via TCP Keep-Alive, so no the socket won't remain open forever but will probably time out after a few hours.

If you would like to configure the Keep-Alive timeout on your machine, see the "Changing TCP Timeouts" section below. Otherwise read through the rest of the answer to learn how TCP Keep-Alive works.

Introduction

TCP connections consist of two sockets, one on each end of the connection. When one side wants to terminate the connection, it sends an FIN packet which the other side acknowledges and both close their sockets.

Until that happens, however, both sides will keep their socket open indefinitely. This leaves open the possibility that one side may close their socket, either intentionally or due to some error, without informing the other end via FIN. In order to detect this scenario and close stale connections the TCP Keep Alive process is used.

Keep-Alive Process

There are three configurable properties that determine how Keep-Alives work. On Linux they are¹:

tcp_keepalive_time
- default 7200 seconds
tcp_keepalive_probes
- default 9
tcp_keepalive_intvl
- default 75 seconds

The process works like this:

Client opens TCP connection
If the connection is silent for tcp_keepalive_time seconds, send a single empty ACK packet.¹
Did the server respond with a corresponding ACK of its own?
- No
  1. Wait tcp_keepalive_intvl seconds, then send another ACK
  2. Repeat until the number of ACK probes that have been sent equals tcp_keepalive_probes.
  3. If no response has been received at this point, send a RST and terminate the connection.
- Yes: Return to step 2

This process is enabled by default on most operating systems, and thus dead TCP connections are regularly pruned once the other end has been unresponsive for 2 hours 11 minutes (7200 seconds + 75 * 9 seconds).

Gotchas

2 Hour Default

Since the process doesn't start until a connection has been idle for two hours by default, stale TCP connections can linger for a very long time before being pruned. This can be especially harmful for expensive connections such as database connections.

Keep-Alive is Optional

According to RFC 1122 4.2.3.6, responding to and/or relaying TCP Keep-Alive packets is optional:

Implementors MAY include "keep-alives" in their TCP implementations, although this practice is not universally accepted. If keep-alives are included, the application MUST be able to turn them on or off for each TCP connection, and they MUST default to off.

...

It is extremely important to remember that ACK segments that contain no data are not reliably transmitted by TCP.

The reasoning being that Keep-Alive packets contain no data and are not strictly necessary and risk clogging up the tubes of the interwebs if overused.

In practice however, my experience has been that this concern has dwindled over time as bandwidth has become cheaper; and thus Keep-Alive packets are not usually dropped. Amazon EC2 documentation for instance gives an indirect endorsement of Keep-Alive, so if you're hosting with AWS you are likely safe relying on Keep-Alive, but your mileage may vary.

Changing TCP Timeouts

Per Socket

Unfortunately since TCP connections are managed on the OS level, Java does not support configuring timeouts on a per-socket level such as in java.net.Socket. I have found some attempts³ to use Java Native Interface (JNI) to create Java sockets that call native code to configure these options, but none appear to have widespread community adoption or support.

Instead, you may be forced to apply your configuration to the operating system as a whole. Be aware that this configuration will affect all TCP connections running on the entire system.

Linux

The currently configured TCP Keep-Alive settings can be found in

/proc/sys/net/ipv4/tcp_keepalive_time
/proc/sys/net/ipv4/tcp_keepalive_probes
/proc/sys/net/ipv4/tcp_keepalive_intvl

You can update any of these like so:

# Send first Keep-Alive packet when a TCP socket has been idle for 3 minutes
$ echo 180 > /proc/sys/net/ipv4/tcp_keepalive_time
# Send three Keep-Alive probes...
$ echo 3 > /proc/sys/net/ipv4/tcp_keepalive_probes
# ... spaced 10 seconds apart.
$ echo 10 > /proc/sys/net/ipv4/tcp_keepalive_intvl

Such changes will not persist through a restart. To make persistent changes, use sysctl:

sysctl -w net.ipv4.tcp_keepalive_time=180 net.ipv4.tcp_keepalive_probes=3 net.ipv4.tcp_keepalive_intvl=10

Mac OS X

The currently configured settings can be viewed with sysctl:

$ sysctl net.inet.tcp | grep -E "keepidle|keepintvl|keepcnt"
net.inet.tcp.keepidle: 7200000
net.inet.tcp.keepintvl: 75000
net.inet.tcp.keepcnt: 8

Of note, Mac OS X defines keepidle and keepintvl in units of milliseconds as opposed to Linux which uses seconds.

The properties can be set with sysctl which will persist these settings across reboots:

sysctl -w net.inet.tcp.keepidle=180000 net.inet.tcp.keepcnt=3 net.inet.tcp.keepintvl=10000

Alternatively, you can add them to /etc/sysctl.conf (creating the file if it doesn't exist).

$ cat /etc/sysctl.conf
net.inet.tcp.keepidle=180000
net.inet.tcp.keepintvl=10000
net.inet.tcp.keepcnt=3

Windows

I don't have a Windows machine to confirm, but you should find the respective TCP Keep-Alive settings in the registry at

\HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\TCPIP\Parameters

_Footnotes

_{1. See man tcp for more information.}

_{2. This packet is often referred to as a "Keep-Alive" packet, but within the TCP specification it is just a regular ACK packet. Applications like Wireshark are able to label it as a "Keep-Alive" packet by meta-analysis of the sequence and acknowledgement numbers it contains in reference to the preceding communications on the socket.}

_{3. Some examples I found from a basic Google search are lucwilliams/JavaLinuxNet and flonatel/libdontdie.}

Solution 3

You are looking for the SO_KEEPALIVE socket option.

The Java Socket API exposes "keep-alive" to applications via the setKeepAlive and getKeepAlive methods.

EDIT: SO_KEEPALIVE is implemented in the OS network protocol stacks without sending any "real" data. The keep-alive interval is operating system dependent, and may be tuneable via a kernel parameter.

Since no data is sent, SO_KEEPALIVE can only test the liveness of the network connection, not the liveness of the service that the socket is connected to. To test the latter, you need to implement something that involves sending messages to the server and getting a response.

Solution 4

TCP keepalive and HTTP keepalive are very different concepts. In TCP, the keepalive is the administrative packet sent to detect stale connection. In HTTP, keepalive means the persistent connection state.

This is from TCP specification,

Keep-alive packets MUST only be sent when no data or acknowledgement packets have been received for the connection within an interval. This interval MUST be configurable and MUST default to no less than two hours.

As you can see, the default TCP keepalive interval is too long for most applications. You might have to add keepalive in your application protocol.

Solution 5

If you're behind a masquerading NAT (as most home users are these days), there is a limited pool of external ports, and these must be shared among the TCP connections. Therefore masquerading NATs tend to assume a connection has been terminated if no data has been sent for a certain time period.

This and other such issues (anywhere in between the two endpoints) can mean the connection will no longer "work" if you try to send data after a reasonble idle period. However, you may not discover this until you try to send data.

Using keepalives both reduces the chance of the connection being interrupted somewhere down the line, and also lets you find out about a broken connection sooner.

View more solutions

175,961

Author by

Kevin Boyd

Kevin works in Java and more recently in Java ME.

Updated on July 08, 2022

Comments

Kevin Boyd almost 2 years

I have heard of HTTP keep-alive but for now I want to open a socket connection with a remote server.
Now will this socket connection remain open forever or is there a timeout limit associated with it similar to HTTP keep-alive?
Kevin Boyd over 14 years

Okay so the implementation should make sure to check at regular interval that the connection is dead or alive, right?
Matthew Scharley over 14 years

It's a good idea. You don't have to, but if you don't, then you may not detect a broken link till someone actually wants to do something. Which may or may not be a good thing (or may or may not matter), depending on what you're actually trying to achieve.
Kevin Boyd over 14 years

If I a setKeepAlive(true); what would be the interval?... also will Java keep sending keep-alive messages at the default interval or will I have to do it programatically?
Matthew Scharley over 14 years

unixguide.net/network/socketfaq/4.7.shtml Has a description of SO_KEEPALIVE. It's not so much what the OP wanted, though it is a protocol based option to what I suggested... though, once every two hours won't do much for applications.
Kevin Boyd over 14 years

Ah! you add a good point here, that is you have to also consider the in-between things that might hinder the operation of a connection such as NAT routers etc...
Matthew Scharley over 14 years

This is a good point, and a good reminder that there's more to keep in mind than just what we're directly implementing ourselves. Also, Lemmings!!
Artelius over 14 years

Note that p2p file sharing both chews up a lot of ports and produces a lot of zombie connections, making it more likely that the NAT will need to prune idle connections.
nog642 over 14 years

Not necessarily, a TCP connection is identified by 4 elements: src ip, src port, dest ip, dest port. So you can reuse the same external (source) port as long as the destination ip is different.
nog642 over 14 years

You can modify the TCP keepalive interval to suit your application. E.g. msdn.microsoft.com/en-us/library/dd877220%28VS.85%29.aspx
Artelius over 14 years

Oh yeah, you're right. I think the real reason is that NATs have a fixed size table of open connections, due to memory constraints and lookup time.
Pacerier almost 12 years

@MatthewScharley Isn't the interval configurable?
Pacerier almost 12 years

@ZZCoder Can you elaborate what does it mean when you say "In HTTP, keepalive means the persistent connection state"?
Pacerier almost 12 years

@MatthewScharley What's the usual request/response used for ping/pong?
Matthew Scharley almost 12 years

@Pacerier: Depends on the protocol, since it's totally protocol dependent, but for text-based protocols that require one literal "PING" and "PONG" commands are pretty typical.
Matthew Scharley almost 12 years

@Pacerier: It's supposed to be, but it's usually at the OS level not an application level. It's also usually not exposed very cleanly or obviously. The TCP spec also specifies that it must not be allowed to be set less than 2 hours, hence my comment.
Pacerier almost 12 years

@MatthewScharley Regarding "it must not default to no less than two hours"... means it is allowed to be less than two hours right?
Matthew Scharley almost 12 years

@Pacerier: You're right, but that'd be implementation specific and I have no first-hand experience to know if you can in practice. Also note my previous comment that it's usually an OS level setting, so by changing it you are affecting every other application running. You likely don't want keep-alives on a short interval across the board.
Tim Cooper over 11 years

@MatthewScharley : This "ping pong" is already implemented for us, in the standard TCP implementations, and is called "keep-alive" (see the other popular answer to this question). Is there some reason to implement it at the app level?
Matthew Scharley over 11 years

@TimCooper: It's isn't really. As I highlighted in comments on other answers, the TCP implementation isn't useful for most application level requirements. You can't send one on demand, and for most operating systems the TCP keepalive timeout is only configurable on a system-wide level and set far too high to be generally useful for applications.
Stephen C over 11 years

@MatthewScharley - "You're right, but that'd be implementation specific ...". A keep-alive interval that could not be less than two hours would be so useless that it is hard to conceive of anyone implementing it.
Daniel Lubarov about 11 years

For linux, this has instructions for changing the keepalive time (which defaults to 2h).
Stephen C about 11 years

@Daniel - it should be noted that if you are using Java, those instructions only allow you to change the system-wide default for the keep-alive interval. That is rather heavy handed ...
Daniel Lubarov about 11 years

@StephenC, I realize that changing the system setting isn't a great solution, but what's the alternative? AFAIK there's no way for an application to override the system setting.
Robert almost 11 years

@Tim The reason for an keep-alive on application level is that the TCP standard recommends to set the keep-alive timer to higher than two hours. Never seen an TCP connection without traffic that survives this time. Hence the TCP keep-alive stuff is useless by default.
Stephen C over 10 years

@Daniel - the alternative (in Java) would be to do manual keep alive, as mentioned above and in other answers. Not pretty, but it maybe better than an OS-wide change of the default that could break system services or other applications.
Matthew Scharley over 10 years

@Pacerier: In HTTP/1.0 each request/response necessitated reconnecting to the server. For HTTP/1.1 they introduced a Keep-Alive header which could be used to trigger the server to not kill the connection after it was done processing the response to facilitate requesting more files and allowing for 'pipelining'; sending multiple requests then waiting for all the data to come back.
Igor Čordaš almost 10 years

It basically means that many HTTP request will/should reuse the same TCP-Connection (These connections might also have keep-alive but that doesn't meter to HTTP so it's essentially a different concept).
Jasper over 8 years

How does one do ping-pong check ?
geld0r about 8 years

Very helpful, thanks! One addition: For Windows a restart is required in order to have new values of KeepAliveTime being effective.
Jarek Przygódzki about 8 years

On AIX, current TCP Keep-Alive settings can be queried using $ no -a | grep tcp_keep command.
Paul Stelian over 4 years

What's funny is that the TCP keepalive interval is larger than the NAT one by default, at least with some configs. So idle connections may be lost through NATs.
Jared Still about 3 years

When a TCP socket is created, it can be created with keepalive values other than default. Does anyone know how to find the keepalive value for a running process? I have looked in /proc/PID/net/netstat, but the value found there is not useful, or at least, I cannot find an explanation for it.
Gerard van Helden about 3 years

In my opinion, the term "alive" is misleading. After all: asserting that a connection is "alive" only means that you have performed some form of successful communication (past tense!), but it will be no guarantee that any or all subsequent communication will succeed (present/future tense). Therefore there really is no merit in doing this at neither the TCP level nor at the application level, and you're much better off deciding for either a robust retry-on-failure or a crash-my-app approach than any other false sense of security which most people are probably looking for.
Ant almost 3 years

7,200,000,000ms is 2000 hours, not 2.
user207421 almost 2 years

A TCP connection is terminated by exchanging FINs, not RSTs. RST is an abortive close and it is not acknowledged.
user207421 almost 2 years

@Robert No it doesn't. It states that the default may not be less than two hours.
user207421 almost 2 years

7200 seconds is two hours, which is the minimum default timeout. I don't know where you got the 11 minutes from.
user207421 almost 2 years

@Ant No it isn't. You are confusing milliseconds with seconds, and you haven't even got that right.
Ant almost 2 years

@user207421 7,200,000,000ms / 1000 = 7,200,000s 7,200,000s / 60 = 120,000 mins 120,000 mins / 60 = 2000 hours Where did I go wrong?