Detecting socket disconnection?

20,701

Solution 1

You have discovered that you need timers, and heartbeat on a TCP connection.

If you unplug the network cable, the TCP connection may not be broken. If you have nothing to send, the TCP/IP stack have nothing to send, it doesn't know that a cable is gone somewhere, or that the peer PC suddenly burst into flames. That TCP connection could be considered open until you reboot your server years later.

Think of it this way; how can the TCP connection know that the other end dropped off the network - it's off the network so it can't tell you that fact.

Some systems can detect this if you unplug the cable going into your server, and some will not. If you unplug the cable at the other end of e.g. an ethernet switch, that will not be detected.

That's why one always need supervisor timers(that e.g. send a heartbeat message to the peer, or close a TCP connection based on no activity for a given amount of time) for a TCP connection,

One very cheap way to at least avoid TCP connections that you only read data from, never write to, to stay up for years on end, is to enable TCP keepalive on a TCP socket - be aware that the default timeouts for TCP keepalive is often 2 hours.

Solution 2

Neither those answers applies. The first one concerns the case when the connection is broken, and the second one (mine) concerns the case where the peer closes the connection.

In a TCP connection, unless data is being sent or received, there is in principle nothing about pulling a cable that should break a connection, as TCP is deliberately designed to be robust across this sort of thing, and there is certainly nothing about it that should look to the local application like the peer closing.

The only way to detect a broken connection in TCP is to attempt to send data across it, or to interpret a read timeout as a lost connection after a suitable interval, which is an application decision.

You can also set TCP keep-alive on to enable detection of broken connections, and in some systems you can even control the timeout per socket. Not via Java however, so you are stuck with the system default, which should be two hours unless it has been modified.

Your code should call keyIterator.remove() after calling keyIterator.next().

Share:
20,701
neevek
Author by

neevek

neevek on github

Updated on July 10, 2022

Comments

  • neevek
    neevek almost 2 years

    I am kinda upset that this cannot be handled in an elegant way, after trying different solutions (this, this and several others) mentioned in answers to several SO questions, I still could not manage to detect socket disconnection (by unplugging cable).

    I am using NIO non-blocking socket, everything works perfectly except that I find no way of detecting server disconnection.

    I have the following code:

    while (true) {
        handlePendingChanges();
    
        int selectedNum = selector.select(3000);
        if (selectedNum > 0) {
            SelectionKey key = null;
            try {
                Iterator<SelectionKey> keyIterator = selector.selelctedKeys().iterator();
                while (keyIterator.hasNext()) {
                    key = keyIterator.next();
                    if (!key.isValid())
                        continue;
    
                    System.out.println("key state: " + key.isReadable() + ", " + key.isWritable());
    
                    if (key.isConnectable()) {
                        finishConnection(key);
                    } else if (key.isReadable()) {
                        onRead(key);
                    } else if (key.isWritable()) {
                        onWrite(key);
                    }
                }
            } catch (Exception e) {
                e.printStackTrace();
                System.err.println("I am happy that I can catch some errors.");
            } finally {
                selector.selectedKeys().clear();
            }
        }
    }
    

    While the SocketChannels are being read, I unplug the cable, and Selector.select() starts spinning and returning 0, now I have no chance to read or write the channels, because the main reading & writing code is guarded by if (selectedNum > 0), now this is the first confusion coming out of my head, from this answer, it is said that when the channel is broken, select() will return, and the selection key for the channel will indicate readable/writable, but it is apparently not the case here, the keys are not selected, select() still returns 0.

    Also, from EJP's answer to a similar question:

    If the peer closes the socket:

    • read() returns -1
    • readLine() returns null
    • readXXX() throws EOFException, for any other X.

    Not the case here either, I tried commenting out if (selectedNum > 0) and using selector.keys().iterator() to get all the keys regardless whether or not they are selected, reading from those keys does not return -1 (0 returned instead), and writing to those keys does not get EOFException thrown. I only noted one thing, that even the keys are not selected, key.isReadable() returns true while key.isWritable() returns false (I guess this might be because I didn't register the keys for OP_WRITE).

    My question is why Java socket is behaving like this or is there something I did wrong?

  • neevek
    neevek over 11 years
    Hey, @EJP, I knew you would come to rescue, thank you. You can also set TCP keep-alive on to enable detection of broken connections, what role does keep-alive play here in broken connection detection? what's the difference between setting and not setting keep-alive? as for keyIterator.remove(), I already used selector.selectedKeys().clear() in finally block.
  • neevek
    neevek over 11 years
    Your explanation completely clears out my confusion. but there's still one thing that I want to know, sometimes when I plug the cable back in, the connection is resumed, sometimes it is not, why is that?
  • user207421
    user207421 over 11 years
    @neveek Missed that. TCP keep-alive sends a packet every now and then, one that requires a response, and if it doesn't arrive (taking retries and timeouts into account) the connection will be deemed broken: you will get a 'connection reset' on the next I/O.
  • neevek
    neevek over 11 years
    I am not implementing a custom protocol, I am using HTTP, so if a packet is sent over the wire every now and then, will that packet be interpreted as part of the HTTP header or body? and if I, as the client, receive the keep-alive packet, how do I handle that?
  • nos
    nos over 11 years
    @neevek Likely the transmitting end times out. The transmitting end will detect that the other end is gone, as it doesn't receive any acks, so among other tings it'll depend on whether you plug the cable back in before the tcp stack times out the connection.
  • user207421
    user207421 over 11 years
    @neveek The keep-alive packet is an ACK. It isn't seen by the application at all.
  • neevek
    neevek over 11 years
    Awesome~ I learnt something. Both answers are correct, but I have to choose one. Thank you!
  • user207421
    user207421 about 10 years
    I should correct my comment above. If keepalive trips a broken connection you will get ECONNTIMEOUT or whatever it is, 'connection timed out'. Note the different wording from 'connect timed out', which is a connect-time problem.
  • neevek
    neevek about 10 years
    Got it! But I am wondering how you still remember this comment after 1 year? :)
  • user207421
    user207421 almost 7 years
    One does not 'always need supervisor timers'. HTTP, the most used application protocl on the planet, doesn't have one, for example. Read timeouts and IOExceptions on write are sufficient.
  • nos
    nos almost 7 years
    I consider read timeouts a supervisor timer, which you normally have to explicitly enable on sockets. And if you don't you risk not detecting a stale connection (You in this case might be an implementor of an HTTP server/client)..