How to reduce number of sockets in TIME_WAIT?

ubuntu nginx linux-networking high-load

78,793

Solution 1

One thing you should do to start is to fix the net.ipv4.tcp_fin_timeout=1. That is way to low, you should probably not take that much lower than 30.

Since this is behind nginx. Does that mean nginx is acting as a reverse proxy? If that is the case then your connections are 2x (one to client, one to your web servers). Do you know which end these sockets belong to?

Update:
fin_timeout is how long they stay in FIN-WAIT-2 (From networking/ip-sysctl.txt in the kernel documentation):

tcp_fin_timeout - INTEGER
        Time to hold socket in state FIN-WAIT-2, if it was closed
        by our side. Peer can be broken and never close its side,
        or even died unexpectedly. Default value is 60sec.
        Usual value used in 2.2 was 180 seconds, you may restore
        it, but remember that if your machine is even underloaded WEB server,
        you risk to overflow memory with kilotons of dead sockets,
        FIN-WAIT-2 sockets are less dangerous than FIN-WAIT-1,
        because they eat maximum 1.5K of memory, but they tend
        to live longer. Cf. tcp_max_orphans.

I think you maybe just have to let Linux keep the TIME_WAIT socket number up against what looks like maybe 32k cap on them and this is where Linux recycles them. This 32k is alluded to in this link:

Also, I find the /proc/sys/net/ipv4/tcp_max_tw_buckets confusing. Although the default is set at 180000, I see a TCP disruption when I have 32K TIME_WAIT sockets on my system, regardless of the max tw buckets.

This link also suggests that the TIME_WAIT state is 60 seconds and can not be tuned via proc.

Random fun fact:
You can see the timers on the timewait with netstat for each socket with netstat -on | grep TIME_WAIT | less

Reuse Vs Recycle:
These are kind of interesting, it reads like reuse enable the reuse of time_Wait sockets, and recycle puts it into TURBO mode:

tcp_tw_recycle - BOOLEAN
        Enable fast recycling TIME-WAIT sockets. Default value is 0.
        It should not be changed without advice/request of technical
        experts.

tcp_tw_reuse - BOOLEAN
        Allow to reuse TIME-WAIT sockets for new connections when it is
        safe from protocol viewpoint. Default value is 0.
        It should not be changed without advice/request of technical
        experts.

I wouldn't recommend using net.ipv4.tcp_tw_recycle as it causes problems with NAT clients.

Maybe you might try not having both of those switched on and see what effect it has (Try one at a time and see how they work on their own)? I would use netstat -n | grep TIME_WAIT | wc -l for faster feedback than Munin.

Solution 2

tcp_tw_reuse is relatively safe as it allows TIME_WAIT connections to be reused.

Also you could run more services listening on different ports behind your load-balancer if running out of ports is a problem.

78,793

Alexander Gladysh

:-) mailto: [email protected]

Updated on September 17, 2022

Comments

Alexander Gladysh almost 2 years
Ubuntu Server 10.04.1 x86

I've got a machine with a FCGI HTTP service behind nginx, that serves a lot of small HTTP requests to a lot of different clients. (About 230 requests per second in the peak hours, average response size with headers is 650 bytes, several millions of different clients per day.)

As a result, I have a lot of sockets, hanging in TIME_WAIT (graph is captured with TCP settings below):

I'd like to reduce the number of sockets.

What can I do besides this?
```
$ cat /proc/sys/net/ipv4/tcp_fin_timeout
1
$ cat /proc/sys/net/ipv4/tcp_tw_recycle
1
$ cat /proc/sys/net/ipv4/tcp_tw_reuse
1
```
Update: some details on the actual service layout on the machine:
```
client -----TCP-socket--> nginx (load balancer reverse proxy) 
       -----TCP-socket--> nginx (worker) 
       --domain-socket--> fcgi-software
                          --single-persistent-TCP-socket--> Redis
                          --single-persistent-TCP-socket--> MySQL (other machine)
```
I probably should switch load-balancer --> worker connection to domain sockets as well, but the issue about TIME_WAIT sockets would remain — I plan to add a second worker on a separate machine soon. Won't be able to use domain sockets in that case.
- Alexander Gladysh over 13 years
  
  Now it looks like that Munin is not lying, but rather I'm looking at the wrong plot...
Alexander Gladysh over 13 years

@Kyle: what value for net.ipv4.tcp_fin_timeout would you recommend?
Alexander Gladysh over 13 years

@Kyle: client --TCP-socket--> nginx (load balancer reverse proxy) --TCP-socket--> nginx (worker) --domain-socket--> fcgi-software
Alexander Gladysh over 13 years

@Kyle: I probably should replace TCP socket between load balancer and worker on the same machine with domain socket as well.
Kyle Brandt over 13 years

I would say 30 or maybe 20. Try it and see. You have a lot of load so a lot of TIME_WAIT kind of makes sense.
Alexander Gladysh over 13 years

@Kyle: sorry for a stupid question (I'm on a cargo-cult level here so far, unfortunately), but what exactly should I expect to see when I change net.ipv4.tcp_fin_timeout from 1 to 20?
Alexander Gladysh over 13 years

@Kyle: The graph was captured when the TCP settings, listed in question, were in effect.
Alexander Gladysh over 13 years

@Kyle: sorry, I'm confused about your Update 2. How to keep net.ipv4.tcp_fin_timeout "at the 32k cap", when it is in seconds?
Alexander Gladysh over 13 years

@Kyle: it seems that by increasing net.ipv4.tcp_fin_timeout, I'll increase the memory consumption. What will I gain instead?
Alexander Gladysh over 13 years

@Kyle: I'm not sure that there is a cap at 32K sockets in TIME_WAIT. (At least, my graph does not show it — it seems to be just the natural load profile that incidentally gets flat.) Well, I expect higher load tomorrow, will see...
Alexander Gladysh over 13 years

@Kyle: TCP disruption? I don't like the sound of this... :-(
Kyle Brandt over 13 years

@Alexander: You will probably hit conntrack issues before that I bet :-P Just make sure to keep a close eye on /var/log/messages for any of these sort of things
Kyle Brandt over 13 years

Also I should monitor this information as well! Right now I am at 14k TIME_WAIT on my lb
Alexander Gladysh over 13 years

If 32K TIME_WAIT is a hard limit, I feel that getting yet another IP for machine would not work, and I would have to get a whole second front-end machine... That's a pity, event the first one is working at most at 20% of estimated capacity, judging by CPU and memory load...
Alexander Gladysh over 13 years

@Kyle: by conntrack you mean something like this? serverfault.com/questions/189796/error-cant-alloc-conntrack/‌…
Kyle Brandt over 13 years

@Alex: Exactly, I actually just bypassed it on my LB since the NOTRACK iptables target since my LBs are behind firewalls already.
Kyle Brandt over 13 years

@Alex: More on the conntrack here serverfault.com/questions/202157/… .. make sure you are not bumping up against your max.
Alexander Gladysh over 13 years

@Kyle: Thanks. It seems that I'm not bumping against it (yet). I've got conntrack_max at 65536 and no messages in dmesg since boot 3 days ago.
Alexander Gladysh over 13 years

@Kyle: Please advise, what line should I set my automonitoring to watch for in /var/log/messages to catch possible conntrack issues? Will "Can't alloc conntrack" do? Or something broader/narrower?
Kyle Brandt over 13 years

@Alex: Conn is probably fine. If you really want to get it right, grab the Kernel source code and look for the kprintf (Kernel print) function and the exact current text for you kernel version ;-)
Kyle Brandt over 13 years

@Alex: How close are you currently? cat /proc/sys/net/netfilter/nf_conntrack_count
Alexander Gladysh over 13 years

@Kyle: My nf_conntrack_count is 33789 now.
Alexander Gladysh over 13 years

@Kyle: even if my conntrack_count is far from the limit, perhaps the extra monitoring wouldn't hurt — to play safe.
Kyle Brandt over 13 years

@Alex: Are you running netstat as root? Don't think that matters though.. Maybe pipe it to less and see if there is just a grep issue or something.
Alexander Gladysh over 13 years

@Kyle: No. Just tried running it as root (with sudo, and then from the root shell) — no significant difference. 429—480 sockets.
Kyle Brandt over 13 years

Oh, here is a nice one liner: netstat -an|awk '/tcp/ {print $6}'|sort|uniq -c. So @Alex, if Munin doesn't like up, maybe drill into just how it monitors these stats. Maybe the only issue is that Munin is giving you bad data :-)
Alexander Gladysh over 13 years

@Kyle: 10% according to netstat, instead of 400% according to Munin. I wonder, is this a Munin bug?
Alexander Gladysh over 13 years

@Kyle: Nice one-liner indeed! Numbers are the same though. 5000/500.
Kyle Brandt over 13 years

@Alex: I use this n2rrd tool to do my graphing. So I write the rrd data files and the graph files myself, as well as often the checks. What I can tell you from that is that it is very easy to chose the wrong type (like the GAGUE vs COUNTER issue) or all sort of other typos. So there could be a bug there.
Alexander Gladysh over 13 years

@Kyle: about nf_conntrack_count and the string for monitoring: I guess I'll just monitor current count vs. maximum.
Alexander Gladysh over 13 years

@Kyle: Here is what Munin does: munin-monitoring.org/browser/tags/1.4.4/plugins/node.d.linux‌/…
Alexander Gladysh over 13 years

@Kyle: Munin looks at /proc/net/ip_conntrack. Currently it has 19K sockets in TIME_WAIT out of total 20K. Netstat says 300/3000. Who's right? o_O
Alexander Gladysh over 13 years

@Kyle: Seems that I look at the wrong graph for Munin. That looks like some iptables state, not actual sockets.
msanford almost 10 years

Indeed, net.netfilter.nf_conntrack_tcp_timeout_time_wait does not affect TIME_WAIT timeout: vincent.bernat.im/en/blog/2014-tcp-time-wait-state-linux.htm‌l
Scott混合理论 over 3 years

btw, what is the problem about time_wait ??? if it is a problem, why not set tcp_max_tw_buckets to a very low value, say 1000??