Why would connect() give EADDRNOTAVAIL?

32,770

Solution 1

Check this link

http://www.toptip.ca/2010/02/linux-eaddrnotavail-address-not.html

EDIT: Yes I meant to add more but had to cut it there because of an emergency

Did you close the socket before attempting to reconnect? Closing will tell the system that the socketpair (ip/port) is now free.

Here are additional items too look at:

  • If the local port is already connected to the given remote IP and port (i.e., there's already an identical socketpair), you'll receive this error (see bug link below).
  • Binding a socket address which isn't the local one will produce this error. if the IP addresses of a machine are 127.0.0.1 and 1.2.3.4, and you're trying to bind to 1.2.3.5 you are going to get this error.
  • EADDRNOTAVAIL: The specified address is unavailable on the remote machine or the address field of the name structure is all zeroes.

Link with a bug similar to yours (answer is close to the bottom)

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4294599

It seems that your socket is basically stuck in one of the TCP internal states and that adding a delay for reconnection might solve your problem as they seem to have done in that bug report.

Solution 2

This can also happen if an invalid port is given, like 0.

Solution 3

If you are unwilling to change the number of temporary ports available (as suggested by David), or you need more connections than the theoretical maximum, there are two other methods to reduce the number of ports in use. However, they are to various degrees violations of the TCP standard, so they should be used with care.

The first is to turn on SO_LINGER with a zero-second timeout, forcing the TCP stack to send a RST packet and flush the connection state. There is one subtlety, however: you should call shutdown on the socket file descriptor before you close, so that you have a chance to send a FIN packet before the RST packet. So the code will look something like:

shutdown(fd, SHUT_RDWR);
struct linger linger;
linger.l_onoff = 1;
linger.l_linger = 0;
// todo: test for error
setsockopt(fd, SOL_SOCKET, SO_LINGER,
           (char *) &linger, sizeof(linger));
close(fd);

The server should only see a premature connection reset if the FIN packet gets reordered with the RST packet.

See TCP option SO_LINGER (zero) - when it's required for more details. (Experimentally, it doesn't seem to matter where you set setsockopt.)

The second is to use SO_REUSEADDR and an explicit bind (even if you're the client), which will allow Linux to reuse temporary ports when you run, before they are done waiting. Note that you must use bind with INADDR_ANY and port 0, otherwise SO_REUSEADDR is not respected. Your code will look something like:

int opts = 1;
// todo: test for error
setsockopt(fd, SOL_SOCKET, SO_REUSEADDR,
         (char *) &opts, sizeof(int));

struct sockaddr_in listen_addr;
listen_addr.sin_family = AF_INET;
listen_addr.sin_port = 0;
listen_addr.sin_addr.s_addr = INADDR_ANY;
// todo: test for error
bind(fd, (struct sockaddr *) &listen_addr, sizeof(listen_addr));

// todo: test for addr
// saddr is the struct sockaddr_in you're connecting to
connect(fd, (struct sockaddr *) &saddr, sizeof(saddr));

This option is less good because you'll still saturate the internal kernel data structures for TCP connections as per netstat -an | grep -e tcp -e udp | wc -l. However, you won't start reusing ports until this happens.

Solution 4

I got this issue. I got it resolve by enabling tcp timestamp.

Root cause:

  1. After connection close, Connections will go in TIME_WAIT state for some time.

  2. During this state if any new connections comes with same IP and PORT, if SO_REUSEADDR is not provided during socket creation then socket bind() will fail with error EADDRINUSE.

  3. But even though after providing SO_REUSEADDR also sockect connect() may fail with error EADDRNOTAVAIL if tcp timestamp is not enable on both side.

Solution: Please enable tcp timestamp on both side client and server.

echo 1 > /proc/sys/net/ipv4/tcp_timestamps

Reason to enable tcp_timestamp:

When we enable tcp_tw_reuse, sockets in TIME_WAIT state can be used before they expire, and the kernel will try to make sure that there is no collision regarding TCP sequence numbers. If we enable tcp_timestamps, it will make sure that those collisions cannot happen. However, we need TCP timestamps to be enabled on both ends. See the definition of tcp_twsk_unique for the gory details.

reference: https://serverfault.com/questions/342741/what-are-the-ramifications-of-setting-tcp-tw-recycle-reuse-to-1

Share:
32,770
WilliamKF
Author by

WilliamKF

Updated on July 22, 2022

Comments

  • WilliamKF
    WilliamKF almost 2 years

    I have in my application a failure that arose which does not seem to be reproducible. I have a TCP socket connection which failed and the application tried to reconnect it. In the second call to connect() attempting to reconnect, I got an error result with errno == EADDRNOTAVAIL which the man page for connect() says means: "The specified address is not available from the local machine."

    Looking at the call to connect(), the second argument appears to be the address to which the error is referring to, but as I understand it, this argument is the TCP socket address of the remote host, so I am confused about the man page referring to the local machine. Is it that this address to the remote TCP socket host is not available from my local machine? If so, why would this be? It had to have succeeded calling connect() the first time before the connection failed and it attempted to reconnect and got this error. The arguments to connect() were the same both times.

    Would this error be a transient one which, if I had tried calling connect again might have gone away if I waited long enough? If not, how should I try to recover from this failure?