keep-alive or not keep-alive
Why not to set keepalive timeout to, say, 15 seconds? I don't see a reason to keep every connection for 2 minutes. And I don't think the browser will keep the connection for 2 mins according to this link: http://en.wikipedia.org/wiki/HTTP_persistent_connection#Use_in_web_browsers, 1 minute timeout seems to be more realistic.
Related videos on Youtube
Comments
-
Julien Vehent almost 2 years
My company is launching a new website with potentially large waves of visitors in very short windows (estimate is around 14k visitors in a 2 minutes window).
So, I'm reviewing our configuration, and my biggest problem right now is our single node HTTP frontend that uses keep-alive. The frontend is running lighttpd 1.4 on CentOS 5.4.
Some assumptions:
- a browser usually opens 6 parallels TCP connections to keep-alive
- the browser will keep the connection open until the timeout is reached, even if the tab is closed (observed in FF, might not be true on every browser)
- on the server side, each connection will consume ~150K of memory in the kernel (I use conntrack and want to keep it, is that estimation correct ?)
- all of our servers are hosted on the east coast. the RTT from a server in las vegas is around 80ms.
- The home page with keep-alive uses ~25 TCP connections and 1500 packets. Without keep-alive, this number rises to ~210 TCP conenctions and over 3200 packets.
So, 6*14000 = 84,000 TCP connections. 84,000 * 150KB ~= 12GB of memory. Here is the problem: 1. I don't have that amount of memory available on the front end. 2. lighttpd 1.4 is not very comfortable with that amount of connections to manage. it hurts the hits/s a lot.
But on the other end, I'm concerned about the 80ms RTT if I deactivate keepalive.
I am going to mitigate some of these issues with a CDN and a secondary www record with a secondary lighttpd. but the debate concerns the keep-alive feature. I'd like to turn it off, but I'm worried that the impact on page opening time is going to be high (high RTT, and double the amount of packets).
Once of the content retrieval done, we have a lot of ajax requests for browsing the site that usually fit in a single tcp connection. But I'm not certain that the browser will free the other connections and just keep one open.
I know there have been a number of discussion about keep-alive consuming to much resources. I kind of agree with that, but given the assumptions and the situation (a RTT between 80ms and 100ms for half our users), do you think it's wise to deactivate it ?
As a side question: do you know where I can find the information regarding connection size and conntrack size in the kernel ? (other than printf size_of(sk_buff) ).
--- edit: some test results I configured conntrack to accept 500k connections (given the memory footprint, it shouldn't exceed 200MB) and launch an ab test.
ab -n 20000 -c 20000 -k http://website.com/banner.jpg
From what I saw in tcpdump, ab establishes all connections before doing the GET. So I get an idea of how much memory is consumed by those 20k connections.
slabtop returns
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 40586 40586 100% 0.30K 3122 13 12488K ip_conntrack
and top
PR NI VIRT RES SHR S %CPU %MEM TIME+ SWAP CODE DATA COMMAND 15 0 862m 786m 780 S 22.9 3.3 1:44.86 76m 172 786m lighttpd
12MB for ip_conntrack and 786MB for lighttpd are OK for my setup. I can easily manage 4x that.
So, keepalive that is, with a idle timeout set to 5 seconds.
-
Admin almost 13 yearsHave you tried doing any benchmarking on your infrastructure to see what the impact is?
-
Admin almost 13 years150KB for each connection seems a little enormous, is that accurate? There are more aspects to consider for keep-alive's performance than just RAM usage for hot TCP connections. Also - why keep conntrack on?
-
Admin almost 13 years@Shane Madden, That's one of my concern: I need a better estimate of the memory footprint. This number is probably wrong, but I remember reading it somewhere.
-
Admin almost 13 years@Julien From what I can find, the numbers for conntrack are more in the few-hundred-bytes range, not few-hundred-kilobytes.
-
Admin almost 13 yearsI got an estimate around 350bytes max on x86_64 somewhere else. but that's for conntrack only, what about the rest ?
-
Admin almost 13 yearsNowhere near 150KB; but hypothetical questions don't really belong here. Test and find out. I haven't seen mention of anything that needs conntrack, so I'd take a long look at your requirement for that, and I'm not sure why you're so eager to trade the nebulous, possible resource usage of a short keep-alive connection (as discussed below, 5 seconds should be fine) for the clear and obvious performance harm of a full TCP handshake for each of 210 resources on a page.
-
Admin almost 13 yearsI'm doing both :) benchmarking using ab to test with and without keep alive, and question on a questions site to get some community comments. We host multiple domains on the same platform, and I use conntrack with a string matching rule to get the bandwidth usage per domain (www.dom1.com, www.dom2.com, ...). Yes, I know, I'd be better off parsing the access.log, that's in the todo list.
-
Dana the Sane almost 13 yearsThis link suggests an even lower keep-alive linuxgazette.net/123/vishnu.html 2-5s
-
Julien Vehent almost 13 yearsin firefox 6: network.http.keep-alive.timeout;115 that's almost 2 minutes
-
Julien Vehent almost 13 yearsI though about setting the keepalive timeout to 5 seconds. But considering the nature of the load (it's really a wave of visitors in a very short period of time), I'm not even sure that will be enough to mitigate the problem.
-
Dana the Sane almost 13 yearsThat will be enough to encapsulate the page load for each user though. I don't know if it's reasonable to try to keep the connection alive for multiple page loads (in this case anyway).