Linux, sockets, non-blocking connect

45,214

Solution 1

You should use the following steps for an async connect:

  • create socket with socket(..., SOCK_NONBLOCK, ...)
  • start connection with connect(fd, ...)
  • if return value is neither 0 nor EINPROGRESS, then abort with error
  • wait until fd is signalled as ready for output
  • check status of socket with getsockopt(fd, SOL_SOCKET, SO_ERROR, ...)
  • done

No loops - unless you want to handle EINTR.

If the client is started first, you should see the error ECONNREFUSED in the last step. If this happens, close the socket and start from the beginning.

It is difficult to tell what's wrong with your code, without seeing more details. I suppose, that you do not abort on errors in your check_socket operation.

Solution 2

There are a few ways to test if a nonblocking connect succeeds.

  1. call getpeername() first, if it failed with error ENOTCONN, the connection failed. then call getsockopt with SO_ERROR to get the pending error on the socket
  2. call read with a length of 0. if the read failed, the connection failed, and the errno for read indicates why the connection failed; read returns 0 if connection succeeds
  3. call connect again; if the errno is EISCONN, the connection is already connected and the first connect succeeded.

Ref: UNIX Network Programming V1

Solution 3

D. J. Bernstein gathered together various methods how to check if an asynchronous connect() call succeeded or not. Many of these methods do have drawbacks on certain systems, so writing portable code for that is unexpected hard. If anyone want to read all the possible methods and their drawbacks, check out this document.

For those who just want the tl;dr version, the most portable way is the following:

Once the system signals the socket as writable, first call getpeername() to see if it connected or not. If that call succeeded, the socket connected and you can start using it. If that call fails with ENOTCONN, the connection failed. To find out why it failed, try to read one byte from the socket read(fd, &ch, 1), which will fail as well but the error you get is the error you would have gotten from connect() if it wasn't non-blocking.

Share:
45,214

Related videos on Youtube

herolover
Author by

herolover

Updated on July 14, 2020

Comments

  • herolover
    herolover almost 4 years

    I want to create a non-blocking connect. Like this:

    socket.connect(); // returns immediately
    

    For this, I use another thread, an infinite loop and Linux epoll. Like this(pseudocode):

    // in another thread
    {
      create_non_block_socket();
      connect();
    
      epoll_create();
      epoll_ctl(); // subscribe socket to all events
      while (true)
      {
        epoll_wait(); // wait a small time(~100 ms)
        check_socket(); // check on EPOLLOUT event
      }
    }
    

    If I run a server and then a client, all it works. If I first run a client, wait a some small time, run a server, then the client doesn't connect.

    What am I doing wrong? Maybe it can be done differently?

    • Martin James
      Martin James almost 11 years
      If you are raising another thread to perform the connect, why are you doing it asynchronous? Also, may as well put the rest of the comms in there.
    • herolover
      herolover almost 11 years
      Well, how to do it without epoll and nonblocking? If I just call connect() then it will block and wait for connect(am I right?). But then if I want to join this connecting thread to main thread, I can't to do it, because connecting thread will in blocking state. Sorry if I am wrong.
    • user207421
      user207421 almost 11 years
      This is not 'async'. This is non-blocking.
  • DreamWarrior
    DreamWarrior almost 9 years
    I know this is an old comment, but I just wanted to note that I had to wait for read in order to catch ETIMEDOUT. This occurred when the SYN response was not returned. If I only waited for write then the socket would disappear from netstat (from SYN_SENT state) but I'd get no notification that the socket was writable to call getsockopt and find ETIMEDOUT. I also added a call immediately after connect to getsockopt to see if there were any immediate errors available before polling.
  • nosid
    nosid almost 9 years
    @DreamWarrior: That's weird. Take a look at connect(2) and connect(3) and search for poll. Both man pages state, that you should wait for indication, that the socket is writable. Can you prodive a minimal example, that shows the unexpected behavior?
  • DreamWarrior
    DreamWarrior almost 9 years
    the man page states "It is possible to select(2) or poll(2) for completion by selecting the socket for writing". My guess is the key word is "completion". Since it was never completed, as it never received a SYN-ACK (or RST which completes the handshake, but results in failure), it never became writable.
  • DreamWarrior
    DreamWarrior almost 9 years
    I was testing this by performing a non-blocking connect to port 10000 on 1.1.1.1. However, my code was using the Xt scheduler (via XtAppAddInput w/ XtInputWriteMask) to perform the select/poll, so I'm not sure which it used, I just know the write event never "fired". A read event, added with XtInputReadMask, did fire when the TCP stack timed out waiting for the SYN-ACK. In this case, getsockopt returned ETIMEDOUT. I do wonder if there are other errors that would only be sent to the read event, but I don't know how to provoke them; I can only test ECONNREFUSED and ETIMEDOUT.
  • nosid
    nosid almost 9 years
    @DreamWarrior: I can't reproduce the problem you have described. I have written a minimal test program, and it correctly reports ETIMEDOUT using POLLOUT.
  • DreamWarrior
    DreamWarrior almost 9 years
    Interesting, your test program works as expected. So, the only thing I can figure is that the Xt scheduler that I am forced to use to schedule the I/O into my legacy application is not firing the events properly. Super odd -- I wish I had more time to investigate.
  • TFR
    TFR about 6 years
    the extra getsockopt for SO_ERROR is critical and not well documented (or shown in any example I had seen). Poll will return a truthy value for writeable even though the ECONNREFUSED was hit and the socket isn't writeable
  • user207421
    user207421 almost 6 years
    This is not an 'async connect'.This is a non-blocking connect. Given that the program is doing exactly nothing except waiting for success or failure, the approach is completey futile. It would be more to the point to do the connect in blocking mode and then revert to non-blocking for whatever follows, if anything.
  • Alexandre Fenyo
    Alexandre Fenyo over 5 years
    When getsockopt(fd, SOL_SOCKET, SO_ERROR, ...) returns 0, with 0 in so_error, this does not mean that the socket is connected. This means no error occured until now. In this specific case, you need to call getpeername() and if getpeername() returns 0, this means the socket is connected. If the socket is not connected, getpeername() returns -1 with ENOTCONN in errno. getsockopt(fd, SOL_SOCKET, SO_ERROR, ...) can inform you about a connection refused, but not about a connected socket. You need to use getpeername() or other means to be sure the socket is connected.
  • jacobq
    jacobq over 4 years
    I feel dumb asking this, but could you elaborate on how to go about this step: "wait until fd is signalled as ready for output"? Would that be done using select?
  • VL-80
    VL-80 over 2 years
    Please, note: the read() man page says: "If count is zero, read() may detect the errors described below. In the absence of any errors, or if read() does not check for errors, a read() with a count of 0 returns zero and has no other effects." So, it MAY detect the errors.