Need help troubleshooting intermittant TCP timeout in HAProxy

8,504

There should be no reason for early close of connection, I don't even see how that can happen. Your timeouts are set to 60s so it should be 60s.

Hmmm wait a minute, aren't you running inside a VM with a fast-running clock ? It's a problem in some VMs, where the clock sometimes runs far too fast (more than twice the correct speed) or instead too slow with large jumps once a minute. Haproxy knows how to defend against too long pauses and time jumps that it can detect, but obviously it cannot defend against clocks running too fast without being reported by the system.

If you're in a VM, you can try this :

$  while sleep 1; do date; done

And let this run for one or two minutes. Check by yourself if it's running at the correct speed. It's been a while since I last observed this nasty issue, but it does not mean it won't happen again.

BTW, you should set "option tcplog" in your TCP section and check the logs. You will then see there if from haproxy's point of view, it was a timeout, a client or server abort, and after how long a time.

Share:
8,504

Related videos on Youtube

imaginative
Author by

imaginative

Updated on September 18, 2022

Comments

  • imaginative
    imaginative over 1 year

    I have an application where the client connects to a server via a simple TCP based protocol over TLS/SSL. In development, this has worked great for many months while we were building our application out. Recently as we prepare for launch, I've gone ahead and added HAProxy into the mix in order to facilitate some order of load distribution. Everything works, technically, but the issue is, the client is now seeing completely random time outs. They are not typically consistent, but happen at roughly 60 seconds long. Sometimes it can happen after 25 seconds. The server that haproxy forwards the TCP connection to notices and does a clean disconnect, the problem is you do not want a bunch of simultaneous connections getting disrupted and reconnected over and over without any reason. This has implications on our publish/subscribe infrastructure in addition to other things. The client is smart enough to reconnect right away - however that's not the behavior we desire. The server that's responsible for accepting these TCP connections over SSL does not require a keep alive. I'm going to go ahead and assume that there is some implicit value I'm not seeing in my HAProxy config causing these random timeouts, or something that requires a TCP keep alive. The fact that the timeouts are not always consistent however, makes me wonder otherwise. If it was 60 seconds on the dot each and every time I'd be convinced it's a configuration issue. In this particular case, it is not always 60 seconds. Here's what my configuration looks like right now:

    global
    stats socket /home/haproxy/status user haproxy group haproxy
        log 127.0.0.1   local1 info
    #   log 127.0.0.1   local5 info 
        maxconn 4096
        ulimit-n 8250
            # typically: /home/haproxy
        chroot /home/haproxy
        user haproxy    
        group haproxy
        daemon
        quiet
        pidfile /home/haproxy/haproxy.pid
    
    defaults
        log global
        mode    http
        option  httplog
        option  dontlognull
        retries 3
        option redispatch
        maxconn 2000
        contimeout  5000
        clitimeout  60000
        srvtimeout  60000
    
    # Configuration for one application:
    # Example: listen myapp 0.0.0.0:80
    listen www 0.0.0.0:443
            mode tcp
            balance leastconn
        # Example server line (with optional cookie and check included)
        # server    srv3.0 10.253.43.224:8000 srv03.0 check inter 2000 rise 2 fall 3
    # Status port (by default, localhost only...for debugging purposes)
        server ANID3 10.0.1.2:8888 check inter 3000 rise 2 fall 3 maxconn 500
        server ANID1 10.0.1.3:8888 check inter 3000 rise 2 fall 3 maxconn 500
        server ANID2 10.0.1.4:8888 check inter 3000 rise 2 fall 3 maxconn 500
    
    listen health 0.0.0.0:9999
            mode http
            balance roundrobin
            stats uri /haproxy-status
    

    I verified that HAProxy is the issue by having our client bypass it and go directly to a single app server where there is no time outs and everything is nice and dandy. As soon as I route it through one of our two haproxy servers, the random disconnects happen ranging anywhere between 25-60 seconds.

    Thanks for taking a look at this. It is quite frustrating, but I'm sure it is a lack of understanding in what exactly HAProxy expects from my client.

    • Greg Askew
      Greg Askew almost 12 years
      Version of haproxy?
  • kasperd
    kasperd almost 10 years
    You should explain how and why that configuration solves the problem.