HAProxy tuning - can't support > 50 concurrent users

6,775

Solution 1

Please run dmesg and ensure your iptables' conntrack table is not full... You may have many messages like this one: "ip_conntrack: table full, dropping packet"

If so, tune your sysctl: net.ipv4.netfilter.ip_conntrack_max Default value is very low. You can set it up to 50000, maybe more, depends on your workload.

Baptiste

Solution 2

Felix is right. You need maxconn on your back end servers set low and your global maxconn is way to high. Put that to something like 4000.

It is critical that you understand the difference in global and server maxconn.

Willy Tarreau (HAProxy's author) describes is very clearly here: https://stackoverflow.com/questions/8750518/difference-between-global-maxconn-and-server-maxconn-haproxy

I have been using HAProxy for years and my default is 64 maxcon on back end servers.

HAProxy is very high performance and is certainly capable of overloading a web server if mis-configured. Take a look at the webservers' network connections and error logs to see if their hitting max connections. I would not be surprised if that is the case.

Share:
6,775
user1799
Author by

user1799

Updated on September 18, 2022

Comments

  • user1799
    user1799 over 1 year

    I'm investigating replacing a proprietary, software load balancer with HAProxy. As part of this investigation, I am attempting to test HAProxy under load. While my HAProxy configuration works fine when testing it as a single user, as soon as I put any load on it, the speed of the site starts to drop dramatically, and before long (~ 100 simulated users) our load testing tool starts to report failures.

    It's quite a straight forward configuration, with only notable points being we're using HAProxy 1.5.4 with OpenSSL and PCRE support compiled in and used. We also have some ACLs to match on URLs, although that frontend isn't being used in this load test.

    This is running on a CentOS 6.5 machine.

    Our (sanitised) configuration for the frontend/backend combination in the load test, along with global and defaults:

    global 
      daemon
      tune.ssl.default-dh-param 2048
      maxconn 100000
      maxsessrate 100000
      log /dev/log local6
    
    defaults
      mode http
      option forwardfor
      option http-server-close
      timeout client 61s
      timeout server 61s
      timeout connect 13s  
      log global
      option httplog
    
    frontend stats
      bind xxx.xxx.xxx.xxx:80
      default_backend stats-backend
    
    backend stats-backend
      stats enable
      server stats 127.0.0.1:80
    
    frontend portal-frontend
      bind xxx.xxx.xxx.xxx:80
      default_backend portal-backend
    
    frontend portal-frontend-https 
      bind xxx.xxx.xxx.xxx:443 ssl crt /path/to/pem
      default_backend portal-backend
    
    backend portal-backend
      redirect scheme https if !{ ssl_fc }
      appsession session len 140 timeout 4h request-learn
      server web1.example.com web1.example.com:80 check
      server web2.example.com web2.example.com:80 check
    
    [...snip...]
    

    During the load test, we're getting some information from the logs, but not huge amounts. Relevant snippets:

    Sep  4 11:06:12 xxxx haproxy[15609]: xxx.xxx.xxx.xxx:30983 [04/Sep/2014:11:05:42.984] portal-frontend-https~ portal-frontend-https/<NOSRV> -1/-1/-1/-1/28782 408 212 - - cR-- 1840/1840/0/0/0 0/0 "<BADREQ>"
    ...
    Sep  4 11:06:03 xxxx haproxy[15609]: xxx.xxx.xxx.xxx:61502 [04/Sep/2014:11:05:47.810] portal-frontend-https~ portal-frontend-https/<NOSRV> -1/-1/-1/-1/14345 400 187 - - CR-- 1715/1693/0/0/0 0/0 "<BADREQ>"
    ...
    Sep  4 11:06:03 xxxx haproxy[15609]: xxx.xxx.xxx.xxx:43939 [04/Sep/2014:11:05:59.553] portal-frontend portal-backend/<NOSRV> 314/-1/-1/-1/2602 302 181 - - LR-- 1719/22/223/0/3 0/0 "GET /mon/login.php?C=1&LID=15576783&TID=8145&PID=8802 HTTP/1.1"
    

    On the basis of these log entries, we've tried things like adjusting timeout http-request, but without any improvement (the load test will run for longer before failures are reported by our tool, but the slow down occurs in a similar way).

    I'm confident HAProxy is capable of doing far better than this, but I really don't know where to turn now to start diagnosing what the problem (or limitation) is.

    • Felix Frank
      Felix Frank over 9 years
      Also, when discovering load balancing issues, first point of order is examining the stats during peak times, which your config makes available via a dedicated IP apparently, in frontend stats.