How can I monitor the length of the accept queue?
Solution 1
To check if your queue is overflowing use either netstat or nstat
[centos ~]$ nstat -az | grep -i listen
TcpExtListenOverflows 3518352 0.0
TcpExtListenDrops 3518388 0.0
TcpExtTCPFastOpenListenOverflow 0 0.0
[centos ~]$ netstat -s | grep -i LISTEN
3518352 times the listen queue of a socket overflowed
3518388 SYNs to LISTEN sockets dropped
Reference: https://perfchron.com/2015/12/26/investigating-linux-network-issues-with-netstat-and-nstat/
To monitor your queue sizes, use the ss command and look for SYN-RECV sockets.
$ ss -n state syn-recv sport = :80 | wc -l
119
Reference: https://blog.cloudflare.com/syn-packet-handling-in-the-wild/
Solution 2
Sysdig will provide some of this information at the end of each accept
syscall, as the queuelen
argument. It also shows the length of the queue as queuemax
.
7598971 21:05:30.322229280 1 gunicorn (6451) < accept fd=13(<4t>127.0.0.1:45882->127.0.0.1:8003) tuple=127.0.0.1:45882->127.0.0.1:8003 queuepct=0 queuelen=0 queuemax=10
As far as I'm aware, it provides no mechanism to know exactly when or how many times the queue has overflowed. And it would be cumbersome to integrate this with periodic monitoring by collectd
or similar.
Solution 3
What you are looking for is the entry in output of sysctl -a
command as such:
net.ipv4.tcp_max_syn_backlog = 4096
In the above example case, the backlog of SYN state connections is at most 4096. You can increase that based on how much RAM is in your server. I consider 32K worth of backlog to be a good start for tuning of heavily loaded web servers.
Also make sure the following is NOT set to 1:
net.ipv4.tcp_abort_on_overflow = 0
Otherwise it will definitely drop packets if there is a backlog overflow.
You can easily check values with sysctl -a | grep backlog
or sysctl -a | grep overflow
.
Additionally, you can find "dropped" label under the
ifconfig -a
command's output. That shows how many packets were dropped for each interface along with other data and errors etc.
For logging dropped packets there is a [paywalled] article for RHEL 7: https://access.redhat.com/solutions/1191593
For further research you may read http://veithen.io/2014/01/01/how-tcp-backlog-works-in-linux.html
It states here, as per Steven's Book Illustrated TCP/IP:
The queue limit applies to the sum of […] the number of entries on the incomplete connection queue […] and […] the number of entries on the completed connection queue […]."
It also states that:
The completed connection queue is almost always empty because when an entry is placed on this queue, the server’s call to accept returns, and the server takes the completed connection off the queue.
The accept queue may hence seem completely empty and you will have to tune your Web server to accept the connections placed on the "total aggregate" queue, faster.
Related videos on Youtube
Phil Frost
Updated on September 18, 2022Comments
-
Phil Frost almost 2 years
I have a hypothesis: sometimes TCP connections arrive faster than my server can
accept()
them. They queue up until the queue overflows and then there are problems.How can I confirm this is happening?
Can I monitor the length of the accept queue or the number of overflows? Is there a counter exposed somewhere?
-
Satō Katsura over 7 yearsYou're looking for
netstat
. -
Phil Frost over 7 yearsAs far as I can tell,
netstat
only shows the send and receive queue lengths, which is not the same as the accept queue. -
Satō Katsura over 7 yearsRight, looking at the sources it seems those flags are for UNIX sockets. For TCP you could just count
SYN_RECV
though. There is no other queue beyond that. I suppose the kernel can be told somehow to log dropped packets because of too many half-open connections, but there have been some 10+ years since I looked at networking with Linux, so I have no idea how to do that. On a side note: you aren't waiting foraccept()
to do its job, you're waiting forACK
s to arrive from the connecting hosts to complete the connections.
-
-
Scott - Слава Україні about 5 yearsWhile there seems to be some useful information here, I’m not sure it answers the question. If I ask, “What’s the most number of people that have ever been in this auditorium at one time?”, and you point to a sign on the wall that gives the maximum capacity, you haven’t answered the question.
-
Phil Frost about 5 yearsIndeed I'm looking for the current length of the queue, not the maximum length of the queue.
-
DevilaN over 4 yearsIt should be tcp_max_syn_backlog, not tcp_max_SYNC_backlog as in your answer
-
Aaron C. de Bruyn about 4 yearsYeah...and StackOverflow gives you a retarded error message when you try to change it: "Edits must be at least 6 characters; is there something else to improve in this post?"