Need to increase nginx throughput to an upstream unix socket -- linux kernel tuning?
Solution 1
It sounds like the bottleneck is the app powering the socket rather than it being Nginx itself. We see this a lot with PHP when used with sockets versus a TCP/IP connection. In our case, PHP bottlenecks much earlier than Nginx ever would though.
Have you checked over the sysctl.conf connection tracking limit, socket backlog limit
net.core.somaxconn
net.core.netdev_max_backlog
Solution 2
You might try looking at unix_dgram_qlen
, see proc docs. Although this may compound the problem by pointing more in the queue? You'll have to look (netstat -x...)
Solution 3
tl;dr
- Make sure Unicorn backlog is large (use socket, faster than TCP)
listen("/var/www/unicorn.sock", backlog: 1024)
- Optimise NGINX performance settings, for example
worker_connections 10000;
Discussion
We had the same problem - a Rails app served by Unicorn behind a NGINX reverse proxy.
We were getting lines like these in Nginx error log:
2019/01/29 15:54:37 [error] 3999#3999: *846 connect() to unix:/../unicorn.sock failed (11: Resource temporarily unavailable) while connecting to upstream, client: xx.xx.xx.xx, request: "GET / HTTP/1.1"
Reading the other answers we also figured that maybe Unicorn is to blame, so we increased it's backlog, but this did not resolve the problem. Monitoring server processes it was obvious that Unicorn was not getting the requests to work on, so NGINX appeared to be the bottleneck.
Searching for NGINX settings to tweak in nginx.conf
this performance tuning article pointed out several settings that could impact how many parallel requests NGINX can process, especially:
user www-data;
worker_processes auto;
pid /run/nginx.pid;
worker_rlimit_nofile 400000; # important
events {
worker_connections 10000; # important
use epoll; # important
multi_accept on; # important
}
http {
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
keepalive_requests 100000; # important
server_names_hash_bucket_size 256;
include /etc/nginx/mime.types;
default_type application/octet-stream;
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
ssl_prefer_server_ciphers on;
access_log /var/log/nginx/access.log;
error_log /var/log/nginx/error.log;
gzip on;
gzip_disable "msie6";
include /etc/nginx/conf.d/*.conf;
include /etc/nginx/sites-enabled/*;
}
Solution 4
I solved by increasing the backlog number in the config/unicorn.rb... I used to have a backlog of 64.
listen "/path/tmp/sockets/manager_rails.sock", backlog: 64
and I was getting this error:
2014/11/11 15:24:09 [error] 12113#0: *400 connect() to unix:/path/tmp/sockets/manager_rails.sock failed (11: Resource temporarily unavailable) while connecting to upstream, client: 192.168.101.39, server: , request: "GET /welcome HTTP/1.0", upstream: "http://unix:/path/tmp/sockets/manager_rails.sock:/welcome", host: "192.168.101.93:3000"
Now, I increased to 1024 and I don't get the error:
listen "/path/tmp/sockets/manager_rails.sock", backlog: 1024
Ben Lee
Updated on September 18, 2022Comments
-
Ben Lee over 1 year
I am running an nginx server that acts as a proxy to an upstream unix socket, like this:
upstream app_server { server unix:/tmp/app.sock fail_timeout=0; } server { listen ###.###.###.###; server_name whatever.server; root /web/root; try_files $uri @app; location @app { proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_set_header Host $http_host; proxy_redirect off; proxy_pass http://app_server; } }
Some app server processes, in turn, pull requests off
/tmp/app.sock
as they become available. The particular app server in use here is Unicorn, but I don't think that's relevant to this question.The issue is, it just seems that past a certain amount of load, nginx can't get requests through the socket at a fast enough rate. It doesn't matter how many app server processes I set up.
I'm getting a flood of these messages in the nginx error log:
connect() to unix:/tmp/app.sock failed (11: Resource temporarily unavailable) while connecting to upstream
Many requests result in status code 502, and those that don't take a long time to complete. The nginx write queue stat hovers around 1000.
Anyway, I feel like I'm missing something obvious here, because this particular configuration of nginx and app server is pretty common, especially with Unicorn (it's the recommended method in fact). Are there any linux kernel options that needs to be set, or something in nginx? Any ideas about how to increase the throughput to the upstream socket? Something that I'm clearly doing wrong?
Additional information on the environment:
$ uname -a Linux servername 2.6.35-32-server #67-Ubuntu SMP Mon Mar 5 21:13:25 UTC 2012 x86_64 GNU/Linux $ ruby -v ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-linux] $ unicorn -v unicorn v4.3.1 $ nginx -V nginx version: nginx/1.2.1 built by gcc 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) TLS SNI support enabled
Current kernel tweaks:
net.core.rmem_default = 65536 net.core.wmem_default = 65536 net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 65536 16777216 net.ipv4.tcp_mem = 16777216 16777216 16777216 net.ipv4.tcp_window_scaling = 1 net.ipv4.route.flush = 1 net.ipv4.tcp_no_metrics_save = 1 net.ipv4.tcp_moderate_rcvbuf = 1 net.core.somaxconn = 8192 net.netfilter.nf_conntrack_max = 524288
Ulimit settings for the nginx user:
core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 20 file size (blocks, -f) unlimited pending signals (-i) 16382 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 65535 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) unlimited virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
-
Khaled almost 12 yearsDid you check the output of
ulimit
, specifically number of open files? -
Ben Lee almost 12 years@Khaled,
ulimit -n
says65535
.
-
-
jmw almost 12 yearsAny progress with this?
-
Ben Lee almost 12 yearsThanks for the idea, but this didn't appear to make any difference.
-
Ben Lee almost 12 yearsI figured out the problem. See the answer I posted. It actually was the app bottlenecking, not the socket, just as you posit. I had ruled this out earlier due to a mis-diagnosis, but turns out the problem was throughput to another server. Figured this out just a couple hours ago. I'm going to award you the bounty, since you pretty much nailed the source of the problem even despite the mis-diagnosis I put in the question; however, going to give the checkmark to my answer, because my answer describes the exact circumstances so might help someone in the future with a similar issue.
-
Ben Lessani over 11 yearsYour issue isn't nginx, it is more than capable - but that's not to say you might not have a rogue setting. Sockets are particularly sensitive under high load when the limits aren't configured correctly. Can you try your app with tcp/ip instead?
-
Ben Lee over 11 yearssame problem with even a worse magnitude using tcp/ip (write queue climbs even faster). I have nginx / unicorn / kernel all set up exactly the same (as far as I can tell) on a different machine, and that other machine is not exhibiting this problem. (I can switch dns between the two machines, to get live load testing, and have dns on a 60-sec ttl)
-
Ben Lee over 11 yearsThroughput between each machine and a db machine is the same now, and latency between the new machine and db machine is about 30% more than between old machine and db. But 30% more that a tenth of a millisecond is not the problem.
-
Ben Lee over 11 yearsNope, ulimit settings on both machines are the same (specifically open files is 65535, everything else looks fine too).
-
Ben Lee over 11 yearsI added my ulimit settings to the end of the question.
-
tarkeshwar about 11 years@BenLee Did you figure this one out? I may be facing similar problem.
-
Ben Lee about 11 years@tarkeshwar, no, never figured it out. Eventually ended up going with different hardware and and somewhat different server stack instead of solving the problem.