HAProxy tuning - can't support > 50 concurrent users

linux haproxy

6,775

Solution 1

Please run dmesg and ensure your iptables' conntrack table is not full... You may have many messages like this one: "ip_conntrack: table full, dropping packet"

If so, tune your sysctl: net.ipv4.netfilter.ip_conntrack_max Default value is very low. You can set it up to 50000, maybe more, depends on your workload.

Baptiste

Solution 2

Felix is right. You need maxconn on your back end servers set low and your global maxconn is way to high. Put that to something like 4000.

It is critical that you understand the difference in global and server maxconn.

Willy Tarreau (HAProxy's author) describes is very clearly here: https://stackoverflow.com/questions/8750518/difference-between-global-maxconn-and-server-maxconn-haproxy

I have been using HAProxy for years and my default is 64 maxcon on back end servers.

HAProxy is very high performance and is certainly capable of overloading a web server if mis-configured. Take a look at the webservers' network connections and error logs to see if their hitting max connections. I would not be surprised if that is the case.

6,775

Author by

user1799

Updated on September 18, 2022

Comments

user1799 over 1 year
I'm investigating replacing a proprietary, software load balancer with HAProxy. As part of this investigation, I am attempting to test HAProxy under load. While my HAProxy configuration works fine when testing it as a single user, as soon as I put any load on it, the speed of the site starts to drop dramatically, and before long (~ 100 simulated users) our load testing tool starts to report failures.

It's quite a straight forward configuration, with only notable points being we're using HAProxy 1.5.4 with OpenSSL and PCRE support compiled in and used. We also have some ACLs to match on URLs, although that frontend isn't being used in this load test.

This is running on a CentOS 6.5 machine.

Our (sanitised) configuration for the frontend/backend combination in the load test, along with global and defaults:
```
global 
  daemon
  tune.ssl.default-dh-param 2048
  maxconn 100000
  maxsessrate 100000
  log /dev/log local6

defaults
  mode http
  option forwardfor
  option http-server-close
  timeout client 61s
  timeout server 61s
  timeout connect 13s  
  log global
  option httplog

frontend stats
  bind xxx.xxx.xxx.xxx:80
  default_backend stats-backend

backend stats-backend
  stats enable
  server stats 127.0.0.1:80

frontend portal-frontend
  bind xxx.xxx.xxx.xxx:80
  default_backend portal-backend

frontend portal-frontend-https 
  bind xxx.xxx.xxx.xxx:443 ssl crt /path/to/pem
  default_backend portal-backend

backend portal-backend
  redirect scheme https if !{ ssl_fc }
  appsession session len 140 timeout 4h request-learn
  server web1.example.com web1.example.com:80 check
  server web2.example.com web2.example.com:80 check

[...snip...]
```
During the load test, we're getting some information from the logs, but not huge amounts. Relevant snippets:
```
Sep  4 11:06:12 xxxx haproxy[15609]: xxx.xxx.xxx.xxx:30983 [04/Sep/2014:11:05:42.984] portal-frontend-https~ portal-frontend-https/<NOSRV> -1/-1/-1/-1/28782 408 212 - - cR-- 1840/1840/0/0/0 0/0 "<BADREQ>"
...
Sep  4 11:06:03 xxxx haproxy[15609]: xxx.xxx.xxx.xxx:61502 [04/Sep/2014:11:05:47.810] portal-frontend-https~ portal-frontend-https/<NOSRV> -1/-1/-1/-1/14345 400 187 - - CR-- 1715/1693/0/0/0 0/0 "<BADREQ>"
...
Sep  4 11:06:03 xxxx haproxy[15609]: xxx.xxx.xxx.xxx:43939 [04/Sep/2014:11:05:59.553] portal-frontend portal-backend/<NOSRV> 314/-1/-1/-1/2602 302 181 - - LR-- 1719/22/223/0/3 0/0 "GET /mon/login.php?C=1&LID=15576783&TID=8145&PID=8802 HTTP/1.1"
```
On the basis of these log entries, we've tried things like adjusting timeout http-request, but without any improvement (the load test will run for longer before failures are reported by our tool, but the slow down occurs in a similar way).

I'm confident HAProxy is capable of doing far better than this, but I really don't know where to turn now to start diagnosing what the problem (or limitation) is.
- Felix Frank over 9 years
  
  Also, when discovering load balancing issues, first point of order is examining the stats during peak times, which your config makes available via a dedicated IP apparently, in frontend stats.