HAproxy roundrobin balancing does not appear to be distributing evenly

7,086

You should definitely observe the load you're expecting. I suspect that some of your servers are regularly failing to respond to health checks and that they're regularly pulled out of the farm then reinserted once their load drops and they can respond again.

Also, how do you observe the unbalance ? The logs only log warnings (basically only downs).

Could you also check the stats page, you'll see how many sessions are sent to each server, how many times they're up/down, and if there has been some queuing or at least if they have reached their maxconn.

Share:
7,086

Related videos on Youtube

andrew
Author by

andrew

I surf, skate, and program.

Updated on September 17, 2022

Comments

  • andrew
    andrew over 1 year

    I know that with loaded servers, roundrobin in HAproxy (1.4.4) does not evenly distribute, but my servers are currently getting NO traffic (test setup), and roundrobin balancing does www1,www1,www1,www1,www1,...www2,www2,www2,...,www1...

    I'm verifying this by having the script being run on each server cat /etc/HOSTNAME (slackware). I need to have it switch back and forth each time to test some session stuff (stored in shared memcached) but am having trouble getting it to switch between my two web servers on each request.

    global
        log     127.0.0.1 local0 warning
        maxconn     4096
        chroot      /usr/share/haproxy
        pidfile     /var/run/haproxy.pid
        uid     99
        gid     99
        daemon
    
    defaults
        balance     roundrobin
        fullconn    100
        maxconn     4096
        mode        http
        option      dontlognull
        option      http-server-close
        option      forwardfor
        option      redispatch
        retries     3
        timeout     connect 5000
        timeout     client  20000
        timeout     server  60000
        timeout     queue   60000
        stats       enable
        stats       uri /haproxy
        stats       auth ***:***
    
    frontend www *:80
        log     global
        acl     is_upload hdr_dom(host) -i uploads.site.com
        acl     is_api hdr_dom(host) -i api.site.com
        acl     is_dev hdr_dom(host) -i dev.site.com
        use_backend uploads.site.com if is_upload
        use_backend api.site.com if is_api
        use_backend dev.site.com if is_dev
        default_backend site.com
    
    backend site.com
        option  httpchk HEAD /alive.php HTTP/1.1\r\nHost:site.com
        server  www1 1.1.1.1:8080 weight 10 minconn 5 maxconn 25 check inter 2000 rise 2 fall 2
        server  www2 1.1.1.2:8080 weight 10 minconn 5 maxconn 25 check inter 2000 rise 2 fall 2
    
    backend api.site.com
        option  httpchk HEAD /alive.php HTTP/1.1\r\nHost:api.site.com
        server  www1 1.1.1.1:8080 weight 10 minconn 5 maxconn 25 check inter 2000 rise 2 fall 2
        server  www2 1.1.1.2:8080 weight 10 minconn 5 maxconn 25 check inter 2000 rise 2 fall 2
    
    backend dev.site.com
        option  httpchk HEAD /alive.php HTTP/1.1\r\nHost:dev.site.com
        server  www1 1.1.1.1:8080 weight 10 minconn 5 maxconn 25 check inter 2000 rise 2 fall 2
        server  www2 1.1.1.2:8080 weight 10 minconn 5 maxconn 25 check inter 2000 rise 2 fall 2
    
    backend uploads.site.com
        option  httpchk HEAD /alive.php HTTP/1.1\r\nHost:uploads.site.com
        server  www1 1.1.1.1:8080 weight 10 minconn 5 maxconn 25 check inter 2000 rise 2 fall 2
        server  www2 1.1.1.2:8080 backup weight 10 minconn 5 maxconn 25 check inter 2000 rise 2 fall 2
    

    So basically, I have some different back-ends (I've verified the ACLs are working), with the default option "roundrobin" selected. I've tried removing weights, removing the minconn/maxconn/fullconn attributes for all servers (not just the backend I'm testing), tried removing the ACLs, etc. I've been testing on dev.site.com BTW.

    Anyone see a reason why I can't get something like www1,www2,www1,www2,...? Also, this is one of my first questions on here, so please let me know if I left anything needed out of my post.

    Thanks!

  • andrew
    andrew almost 14 years
    Willy, thanks for the response. I think I've narrowed this down. When I was running my tests, the servers in the farm (www1, www2) were always up. I monitored the stats page very closely. I just observed that when I click a link on the page that refreshes it, haproxy does distribute the load correctly (www1,www2,www1,www2...) but when I hit the browser refresh button, it sticks to one server. The only difference is that when I hit refresh: "Cache-Control: max-age=0" is sent. Oddly enough, when the refresh is hit, HAproxy seems to send data to both servers (bytes in/out). Any ideas?
  • Willy Tarreau
    Willy Tarreau almost 14 years
    It's just because you have multiple objects on the page you refresh, and the requests are half of them are fetched from one server, and the other half from the other server. If you enable cookie-based persistence, you'll see that once you go to a server, you stick to it as long as it remains up.
  • andrew
    andrew almost 14 years
    That explains it...can't believe I forgot about elements. I also tried to figure out why it was doing it on pages with no elements...turns out it was a request for favicon.ico =). Thanks a bunch.