HAProxy check says server is down when it is up

14,484

Solution 1

The answer is that Glassfish in the latest versions splits the response into multiple packets.

I posted on the haproxy mailing list and had a remarkably quick response.

Krzysztof Oledzki confirmed that haproxy assumes that the response will all be contained within the the first packet as that is the behavior of most known web servers. He built a patch with a quick and dirty fix which is available in the mailing list archives if you search for Glassfish and can be applied to the beta or latest stable version 1.3.22

I also tried to find out why Glassfish has started to behave this way but without paid support I got nowhere. If anyone can answer that, the bounty is still open.

Solution 2

run tcpdump and capture the checks and their responses to each server. compare the results from server1 to the results from server2.

if it works on server1 but not on server2, or server3, then server1 must be returning something different. If they aren't returning something different, then something is wrong with haproxy or your haproxy configuration.

Share:
14,484

Related videos on Youtube

JamesRyan
Author by

JamesRyan

I am weasel! all of the hats, all of the hats...

Updated on September 17, 2022

Comments

  • JamesRyan
    JamesRyan over 1 year

    I am trying to setup 2 Glassfish servers in a load balanced configuration using UCARP and HAProxy

    Server1 has 2 IPs x.x.x.17 and x.x.x.18

    HAProxy is listening on only x.x.x.18 and Glassfish listening on only x.x.x.17 running with the following configuration...

    global
    
    maxconn 4096
    debug
    user haproxy
    group haproxy
    
    defaults
    
    mode http
    retries 3
    option redispatch
    
    listen wms x.x.x.18:8080
    source x.x.x.18
    option httpchk
    balance leastconn
    server Server1 x.x.x.17:8080 check inter 2000 fastinter 500 fall 2 weight 50
    server Server2 x.x.x.19:8080 check inter 2000 fastinter 500 fall 2 weight 50
    

    Server2 with 1 IP x.x.x.19 is running Glassfish

    Even though I can manually wget the page from x.x.x.17:8080 and receive a 200 OK response, HAProxy says Server1 is DOWN and doesn't direct any requests to it. I can't find any reason why.

    Here is an excerpt from the Server1 access log with the checks...

    "x.x.x.18" "NULL-AUTH-USER" "14/Jan/2010:14:44:23 +0000" "OPTIONS / HTTP/1.0" 200 0
    "x.x.x.18" "NULL-AUTH-USER" "14/Jan/2010:14:44:23 +0000" "OPTIONS / HTTP/1.0" 200 0
    "x.x.x.18" "NULL-AUTH-USER" "14/Jan/2010:14:44:23 +0000" "OPTIONS / HTTP/1.0" 200 0
    "x.x.x.18" "NULL-AUTH-USER" "14/Jan/2010:14:44:29 +0000" "OPTIONS / HTTP/1.0" 200 0
    "x.x.x.18" "NULL-AUTH-USER" "14/Jan/2010:14:44:29 +0000" "OPTIONS / HTTP/1.0" 200 0
    

    Here is an excerpt from the Server2 access log with the checks...

    "x.x.x.18" "NULL-AUTH-USER" "14/Jan/2010:14:58:25 +0000" "OPTIONS / HTTP/1.0" 200 0
    "x.x.x.18" "NULL-AUTH-USER" "14/Jan/2010:14:58:25 +0000" "OPTIONS / HTTP/1.0" 200 0
    "x.x.x.18" "NULL-AUTH-USER" "14/Jan/2010:14:58:31 +0000" "OPTIONS / HTTP/1.0" 200 0
    "x.x.x.18" "NULL-AUTH-USER" "14/Jan/2010:14:58:31 +0000" "OPTIONS / HTTP/1.0" 200 0
    

    If I remove the httpchk option then Server1 checks as UP, however this is not a permanent solution because we need it to fail over properly if the response really fails.

    Any ideas?

    (HAProxy is v1.3.22)

    Addn: I just tried adding server3 x.x.x.13 running Glassfish but on Windows and that also says down when it is up and accessible from the proxy machine.

    Addn2: After installing v1.4 of haproxy to get error codes, the error is Layer7 invalid response info: "HTTP/1.1 ". When we retrieve the page manually both the UP and DOWN server return HTTP/1.1 200 OK as the first line.

    So after running wireshark to see what is going on. On the glassfish server which works (and all the other webservers I've checked) the response HTTP/1.1 200 OK all comes in the first packet. On the glassfish servers that don't work the response comes in 3 packets of HTTP/1.1 then 200 then OK.

    So any idea why HAProxy is not dealing with multiple packets or how to configure glassfish not to split it? (maxKeepAliveRequests=1 already)

    • womble
      womble over 14 years
      So, just to confirm, server2 is being fed requests from haproxy, whilst server1 isn't? Also, what does the haproxy web interface say about all this?
    • JamesRyan
      JamesRyan over 14 years
      yes in the stats page it says server2 status is DOWN and no pages being fed to it
    • Vineet Kasat
      Vineet Kasat over 14 years
      It should not matter how many packets it takes. From HAProxy, it should be a TCP stream, not a packet level.
    • Justin
      Justin over 14 years
      It shouldn't matter, but perhaps this is triggering a bug in HAProxy? I can't say for certain, I'm not too familiar with HAProxy. However, if you have the packet traces that seem to show haproxy doing the wrong thing, sending them to the upstream author might not be a bad next move.
  • JamesRyan
    JamesRyan over 14 years
    well I had to try option httpchk HEAD /test.php HTTP/1.1\r\nHost:\ xxxxxxxx:8080 but it made no difference
  • Lennert
    Lennert over 14 years
    Justin's right here, you gotta compare exactly what its seeing. My bet is glassfish is responding slightly differently on localhost than it is to remote clients.
  • Björn
    Björn over 14 years
    I guess another step, rule out the obvious, configs the same on both glass fish? Same versions? Can you post configurations? How about the packet captures?
  • Willy Tarreau
    Willy Tarreau almost 14 years
    Just for completeness, version 1.4.4 is able to handle multi-packet responses.