Debugging HAProxy

6,545

It looks like your connection tracking table is filling up. Removing iptables rules which use connection tracking would solve the problem.

If that is not an option and you have RAM available you can increase the table size:

cat /proc/sys/net/netfilter/nf_conntrack_max
echo 131072 > /proc/sys/net/netfilter/nf_conntrack_max

You should probably increase the hashsize as well:

cat /sys/module/nf_conntrack/parameters/hashsize
echo 32768 > /sys/module/nf_conntrack/parameters/hashsize

Those numbers are just double the default settings on my desktop, I'm not sure what exactly you would need. You'll also want to add that to sysctl.conf.

I would be really careful using net.ipv4.tcp_tw_recycle it can cause serious problems with NAT.

Share:
6,545

Related videos on Youtube

Eumcoz
Author by

Eumcoz

Updated on September 18, 2022

Comments

  • Eumcoz
    Eumcoz almost 2 years

    I have tested / been testing a server cluster locally for quite a while with no problem. I have recently set my server cluster up for a live test, and I have noticed problems, and believe that the HAProxy in my cluster may be running into some problems.

    First I will go over a little bit of the structure of the cluster, maybe there is a problem with how I have them setup, maybe I will need multiple proxies.

    I have two server clusters the HAProxy is balancing. We will call them SC1 and SC2. The main cluster is SC1, anything on port 80 for the HAProxy will be sent to SC1. SC1 will process the request, and send another request to SC2 through the proxy on port 8080. I wouldn't think this would be a problem, but I notice on my logs on my server often say SC1 cannot connect to SC2, I believe this is because my HAProxy is being overloaded.

    The reason I am thinking the HAProxy is being overloaded is because when I look at my stats page, it often takes > 1sec to respond. Because of this I decided to take a look at the HAProxy logs. I have noticed an abnormality in the logs, that I believe may be linked to my problems. Every minute or so(sometimes more sometimes less), I get the following message:

    Oct  8 15:58:52 haproxy rsyslogd-2177: imuxsock begins to drop messages from pid 3922 due to rate-limiting
    Oct  8 15:58:52 haproxy kernel: [66958.500434] net_ratelimit: 2997 callbacks suppressed
    Oct  8 15:58:52 haproxy kernel: [66958.500436] nf_conntrack: table full, dropping packet
    

    I was wondering what the repercussions of this were. Would this just cause dropped packets, or could this cause delays as well? How can I fix this problem? I am running on Ubuntu 12.04LTS Server.

    Here are my sysctl modifications:

    fs.file-max = 1000000
    net.ipv4.tcp_tw_reuse = 1
    net.ipv4.tcp_tw_recycle = 1
    

    Here is my config file:

    global
       log /dev/log   local0 info
       log /dev/log   local0 notice
       maxconn 50000
       user u1
       group g1
       #debug
    
    defaults
       log     global
       mode    http
       option  httplog
       option  dontlognull
       option  forwardfor
       retries 3
       option redispatch
       option http-server-close
       maxconn 50000
       contimeout      10000
       clitimeout      50000
       srvtimeout      50000
       balance roundrobin
    
    listen  sc1 255.255.255.1:80
        maxconn 20000
        server  sc1-1 10.101.13.68:80 maxconn 10000
        server  sc1-2 10.101.13.66:80 maxconn 10000
    listen  sc1-1_Update  255.255.255.1:8181
        maxconn 20000
        server  sc1-1 10.101.13.66:80 maxconn 20000
    listen  sc1-2_Update  255.255.255.1:8282
        maxconn 20000
        server  sc1-2 10.101.13.68:80 maxconn 20000
    listen  sc2 255.255.255.1:8080
        maxconn 30000
        server  sc2-1 10.101.13.74:80 maxconn 10000
        server  sc2-2 10.101.13.78:80 maxconn 10000
        server  sc2-3 10.101.13.82:80 maxconn 10000
    listen  sc2-1_Update 255.255.255.1:8383
        maxconn 30000
        server  sc2-2 10.101.13.78:80 maxconn 15000
        server  sc2-3 10.101.13.82:80 maxconn 15000
    listen  sc2-2_Update 255.255.255.1:8484
        maxconn 30000
        server  sc2-1 10.101.13.74:80 maxconn 15000
        server  sc2-3 10.101.13.82:80 maxconn 15000
    listen  sc2-3_Update 255.255.255.1:8585
        maxconn 30000
        server  sc2-1 10.101.13.74:80 maxconn 15000
        server  sc2-2 10.101.13.78:80 maxconn 15000
    listen  stats :8888
        mode http
        stats enable
        stats hide-version
        stats uri /
        stats auth user:pass
    

    The sc1 and sc2 are the main clusters. All of the other ones I use when I have to update my servers(forward port 80 to 8181 on the haproxy for example to update server sc1-1).

    Any help with this issue would be greatly appreciated.

    Thank you