keep-alive or not keep-alive

performance lighttpd keepalive

7,831

Why not to set keepalive timeout to, say, 15 seconds? I don't see a reason to keep every connection for 2 minutes. And I don't think the browser will keep the connection for 2 mins according to this link: http://en.wikipedia.org/wiki/HTTP_persistent_connection#Use_in_web_browsers, 1 minute timeout seems to be more realistic.

7,831

Julien Vehent

Security @ Mozilla https://jve.linuxwall.info

Updated on September 18, 2022

Comments

Julien Vehent almost 2 years
My company is launching a new website with potentially large waves of visitors in very short windows (estimate is around 14k visitors in a 2 minutes window).

So, I'm reviewing our configuration, and my biggest problem right now is our single node HTTP frontend that uses keep-alive. The frontend is running lighttpd 1.4 on CentOS 5.4.

Some assumptions:
- a browser usually opens 6 parallels TCP connections to keep-alive
- the browser will keep the connection open until the timeout is reached, even if the tab is closed (observed in FF, might not be true on every browser)
- on the server side, each connection will consume ~150K of memory in the kernel (I use conntrack and want to keep it, is that estimation correct ?)
- all of our servers are hosted on the east coast. the RTT from a server in las vegas is around 80ms.
- The home page with keep-alive uses ~25 TCP connections and 1500 packets. Without keep-alive, this number rises to ~210 TCP conenctions and over 3200 packets.
So, 6*14000 = 84,000 TCP connections. 84,000 * 150KB ~= 12GB of memory. Here is the problem: 1. I don't have that amount of memory available on the front end. 2. lighttpd 1.4 is not very comfortable with that amount of connections to manage. it hurts the hits/s a lot.

But on the other end, I'm concerned about the 80ms RTT if I deactivate keepalive.

I am going to mitigate some of these issues with a CDN and a secondary www record with a secondary lighttpd. but the debate concerns the keep-alive feature. I'd like to turn it off, but I'm worried that the impact on page opening time is going to be high (high RTT, and double the amount of packets).

Once of the content retrieval done, we have a lot of ajax requests for browsing the site that usually fit in a single tcp connection. But I'm not certain that the browser will free the other connections and just keep one open.

I know there have been a number of discussion about keep-alive consuming to much resources. I kind of agree with that, but given the assumptions and the situation (a RTT between 80ms and 100ms for half our users), do you think it's wise to deactivate it ?

As a side question: do you know where I can find the information regarding connection size and conntrack size in the kernel ? (other than printf size_of(sk_buff) ).

--- edit: some test results I configured conntrack to accept 500k connections (given the memory footprint, it shouldn't exceed 200MB) and launch an ab test.
```
ab -n 20000 -c 20000 -k http://website.com/banner.jpg
```
From what I saw in tcpdump, ab establishes all connections before doing the GET. So I get an idea of how much memory is consumed by those 20k connections.

slabtop returns
```
  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME                   
 40586  40586 100%    0.30K   3122       13     12488K ip_conntrack
```
and top
```
 PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  SWAP CODE DATA COMMAND
 15   0  862m 786m  780 S 22.9  3.3   1:44.86  76m  172 786m lighttpd
```
12MB for ip_conntrack and 786MB for lighttpd are OK for my setup. I can easily manage 4x that.

So, keepalive that is, with a idle timeout set to 5 seconds.
- Admin almost 13 years
  
  Have you tried doing any benchmarking on your infrastructure to see what the impact is?
- Admin almost 13 years
  
  150KB for each connection seems a little enormous, is that accurate? There are more aspects to consider for keep-alive's performance than just RAM usage for hot TCP connections. Also - why keep conntrack on?
- Admin almost 13 years
  
  @Shane Madden, That's one of my concern: I need a better estimate of the memory footprint. This number is probably wrong, but I remember reading it somewhere.
- Admin almost 13 years
  
  @Julien From what I can find, the numbers for conntrack are more in the few-hundred-bytes range, not few-hundred-kilobytes.
- Admin almost 13 years
  
  I got an estimate around 350bytes max on x86_64 somewhere else. but that's for conntrack only, what about the rest ?
- Admin almost 13 years
  
  Nowhere near 150KB; but hypothetical questions don't really belong here. Test and find out. I haven't seen mention of anything that needs conntrack, so I'd take a long look at your requirement for that, and I'm not sure why you're so eager to trade the nebulous, possible resource usage of a short keep-alive connection (as discussed below, 5 seconds should be fine) for the clear and obvious performance harm of a full TCP handshake for each of 210 resources on a page.
- Admin almost 13 years
  
  I'm doing both :) benchmarking using ab to test with and without keep alive, and question on a questions site to get some community comments. We host multiple domains on the same platform, and I use conntrack with a string matching rule to get the bandwidth usage per domain (www.dom1.com, www.dom2.com, ...). Yes, I know, I'd be better off parsing the access.log, that's in the todo list.
Dana the Sane almost 13 years

This link suggests an even lower keep-alive linuxgazette.net/123/vishnu.html 2-5s
Julien Vehent almost 13 years

in firefox 6: network.http.keep-alive.timeout;115 that's almost 2 minutes
Julien Vehent almost 13 years

I though about setting the keepalive timeout to 5 seconds. But considering the nature of the load (it's really a wave of visitors in a very short period of time), I'm not even sure that will be enough to mitigate the problem.
Dana the Sane almost 13 years

That will be enough to encapsulate the page load for each user though. I don't know if it's reasonable to try to keep the connection alive for multiple page loads (in this case anyway).