Why do I get connection refused after 1024 connections?

13,064

Solution 1

So, after a little more research.. it looks like my server side listen is having a queue depth of 20. I am thinking thats the reason. Do any of you guys think thats the problem too?

Regards

Solution 2

If you are connecting faster than your server is calling accept(), the queue of pending connections may be full. The maximum queue length is set by the second argument to listen() in the server, or the value of sysctl net.core.somaxconn (normally 128) if lower.

Solution 3

Maybe you reached your process limit for open file descriptors.

I'm not sure if I understand you correctly: Do you have both the server side and the client side in the same process? Then you will use twice as much file descriptors. That comes close to what you see with ulimit. If that is not the case could the problem be on the server side? Maybe the server process runs out of descriptors and can no longer accept any more connections.

The accept man page mentions that you should get a return value of:

EMFILE
The per-process limit of open file descriptors has been reached.

ENFILE
The system limit on the total number of open files has been reached.

What error code do you get? You can obviously only add connections that were successfully _accept_ed into select or poll.

I know you already know how to check ulimit, but others may not:

ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 40448
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 4096
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 40448
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
Share:
13,064
Sujith
Author by

Sujith

Loves programming

Updated on June 11, 2022

Comments

  • Sujith
    Sujith almost 2 years

    I am testing on a local Linux server with both the server and client in the same server. After about 1024 connections, in my code, where I connect, I get connection refused. At first I thought it was the fd_set_max limit of 1024 for select and changed the server to do poll instead of select and I still don't get past this number. My ulimit -n is set to 2048 and I monitor the lsof on the server it reaches about 1033 (not sure if this is exact number) and fails. Any help is much appreciated.

  • Sujith
    Sujith almost 15 years
    Thanks for your quick response, let me explain a little more in detail, so both server and client are two separate process in the machine. Server is more of a manager which keeps track of all new client process. So the client process registers itself with the server thats listening on a port. once ~1024 clients register, future clients get a connection refused. and i checked ulimit -a and I have it set to 2048 for soft limit and 4096 for hard.
  • Nathaniel Sharp
    Nathaniel Sharp almost 15 years
    @Gentoo Do you get an error in the accept call of the server? If so which one?
  • Nathaniel Sharp
    Nathaniel Sharp almost 15 years
    @Gentoo unfortunately you will need to know the server return value from accept. Maybe using strace on the server will shed some light on this.
  • Sujith
    Sujith almost 15 years
    @lothar so i am working on a server provided to me by a different group, in my code in the client, the following connect gives me the connection refused error <pre><code> if (connect(sock_fd, (struct sockaddr *)&serv_addr, (int)sizeof(serv_addr)) < 0) { LOGSTRM(LEVEL_ERROR) << "connect() on port " << serv_port << " failed, " << strerror(errno) << ENDSTRM; (void) close(sock_fd); return(-1); } </code></pre>
  • Sujith
    Sujith almost 15 years
    Thanks lothar. I will try doing that.
  • Sujith
    Sujith almost 15 years
    Thanks.. didn't know about this setting.. will check my system when i get to work...
  • Jonathan Leffler
    Jonathan Leffler almost 15 years
    Probably not, in all honesty, though it might be. The queue depth is how many outstanding (incomplete) requests are made. If you are flooding the server with connection requests before the previous ones complete, then maybe; if you are making the requests synchronously, then probably not.
  • Sujith
    Sujith almost 15 years
    So this is an automated workload for 2000 users and user connections are not synchronized. Thats why I think the queue depth could be the problem. Asked my server team to replace the depth and waiting to test.