Apache load balancer: health check with long timeout

7,665

My problem was not about the timeout: the health check started to work as desired once I set a ProxyHCExpr.

I noticed this by looking at the requests made to the status page on the backend nodes: only after I defined a ProxyHCExpr did they arrive more or less regularly every second. Note that I don't need to use the expression, adding hcexpr=23, but an expression needs to be set anyway in order for mod_proxy_hcheck to do its job. I would have expected that either it could be safely omitted or that apachectl -t would throw an error or warning.

In the docs, it says:

hcexpr Name of expression, created via ProxyHCExpr, used to check response headers for health. If not used, 2xx thru 3xx status codes imply success

Based on this, I had wrongly assumed that setting ProxyHCExpr would also be optional. Unfortunately, it was not working as desired until I set it. My configuration file went from:

<Proxy "balancer://tomcat">
    BalancerMember "http://10.0.0.1:8080" hcmethod=HEAD hcinterval=1 hcpasses=9 hcuri=/app/status
    BalancerMember "http://10.0.0.2:8080" hcmethod=HEAD hcinterval=1 hcpasses=9 hcuri=/app/status
</Proxy>

To:

ProxyHCExpr 23 {%{REQUEST_STATUS} =~ /^[23]/}
<Proxy "balancer://ifis-tomcat-col">
    BalancerMember "http://10.0.0.1:8080" hcmethod=HEAD hcinterval=1 hcpasses=9 hcuri=/app/status
    BalancerMember "http://10.0.0.2:8080" hcmethod=HEAD hcinterval=1 hcpasses=9 hcuri=/app/status
</Proxy>

This was the configuration as seen from the balancer-manager:

MaxMembers  StickySession   DisableFailover Timeout FailoverAttempts    Method      Path    Active
2 [2 Used]  (None)          Off             0       1                   bybusyness  /app    Yes

Worker URL          Route   RouteRedir  Factor  Set Status  Elected Busy    Load    To  From    HC Method   HC Interval Passes  Fails   HC uri      HC Expr
http://10.0.0.1:8080                    1.00    0   Init Ok 0       0       0       0   0       HEAD        1000ms      9 (0)   1 (0)   /app/status
http://10.0.0.2:8080                    1.00    0   Init Ok 0       0       0       0   0       HEAD        1000ms      9 (0)   1 (0)   /app/status

Then it became:

MaxMembers  StickySession   DisableFailover Timeout FailoverAttempts    Method      Path    Active
2 [2 Used]  (None)          Off             0       1                   bybusyness  /app    Yes

Worker URL          Route   RouteRedir  Factor  Set Status  Elected Busy    Load    To  From    HC Method   HC Interval Passes  Fails   HC uri      HC Expr
http://10.0.0.1:8080                    1.00    0   Init Ok 0       0       0       0   0       HEAD        1000ms      9 (0)   1 (0)   /app/status 
http://10.0.0.2:8080                    1.00    0   Init Ok 0       0       0       0   0       HEAD        1000ms      9 (0)   1 (0)   /app/status 

Health check cond. expressions:
Expr name   Expression
23          %{REQUEST_STATUS} =~ /^[23]/

7,665

simlev

Fencing athlete and opera singer who happens to work as a SysAdmin. Il y a beaucoup de bouches qui parlent et fort peu de têtes qui pensent.

Updated on September 18, 2022

Comments

simlev over 1 year
I'm using Apache HTTP Server as a reverse proxy for a couple of Tomcat instances. I've setup load balancing as follows:
```
<Proxy "balancer://tomcat-app">
    BalancerMember "http://10.0.0.1:8080" hcmethod=HEAD hcuri=/status
    BalancerMember "http://10.0.0.2:8080" hcmethod=HEAD hcuri=/status
</Proxy>
ProxyPass        "/app" "balancer://tomcat-app"
ProxyPassReverse "/app" "balancer://tomcat-app"
```
The problem is that the tomcat containers take around 15 minutes each to restart, due to the app taking as much to be redeployed. Ideally, during this time the load balancer would detect that one of the backend servers is offline and temporarily send all incoming requests to the other, healthy backend server. Unfortunately, I have another line in my httpd.conf:
```
ProxyTimeout 600
```
This is apparently needed because the app can legitimately take as much to respond to some requests. The consequence is, the load balancer is unable to detect that the app is not "ready" in less than 10 minutes.

Question: Is there a way to set a different timeout for the healthcheck than for the proxied requests?

Note: Any suggestion on how to better approach this scenario will be welcome.
- c3st7n over 5 years
  
  As mentioned in bz.apache.org/bugzilla/show_bug.cgi?id=60948 there is no way to set a timeout for the healthcheck, the user in the bug created their own patch to solve their issue but the bug hasn't even been commented on.
- simlev over 5 years
  
  @c3st7n I can take no for an answer. Is there maybe a different approach that would allow to work around this limitation? Could a different load balancer software be a better choice in this case?
- c3st7n over 5 years
  
  sorry I don't have any time to look into this, so just thinking off the top of my head, HAProxy can do health check timeouts. Maybe there is another apache module that does what you need as well, I haven't looked.
- Gordon Liang over 2 years
  
  I got confused with your question and your own answer. Is ProxyTimeout actually controlling the timeout of health check, or not at all?
- simlev over 2 years
  
  @GordonLiang Not at all.
ezra-s about 3 years

Good findings, this was due to a bug,supposedly solved already. It seemed to work with HCTemplate even it you didn't use them as well as the directive you mention. Fixed in 2.4.40 - Reported in https://bz.apache.org/bugzilla/show_bug.cgi?id=60757
Gordon Liang over 2 years

With this setting, do you observe that the health check is having a timeout, if BE starting slow?