Apache problems in performance test with mod_reqtimeout
It is unusual that you are using "prefork" MPM in what seems to be essentially a reverse-proxy, the hybrid "worker" MPM has better scalability, that's not the problem though.
DoS mitigation is usually best handled (if not by your ISP) on the front-end device that terminates the client requests, from your description this is a load-balancer, not Apache. Any competent load balancer will be HTTP aware (i.e. "Layer 7"), this will likely result in buffering of HTTP requests. This also applies if the load-balancer is terminating HTTPS, but less so if the load-balancer is simply relaying the HTTPS connections (as it cannot see the HTTP requests to buffer them). Nor does it apply if your load-balancer is a simple "Layer 3/4" NAT type load-balancer.
As to the possible cause of the timeouts:
- your SSL instance
KeepAliveTimeout
is the same as theRequestReadTimeout
header read timeout, it's possible there's a race where keepalive timeout is reached mid-way through the incoming client request/headers. If I try to reproduce this, in addition to AH01382 errors I also get AH01991 (SSL input filter read failed)and AH00567 (request failed: error reading the headers). This may not explain all the problems though. - malformed client requests, not uncommon in the past (e.g. extra CR/LF after POST, incomplete requests when retrying after error). I don't know of anything current though, it depends on your client base, and more importantly, their connectivity.
- there could be a bug similar to this recent one which caused spurious timeouts with "event" MPM.
To reproduce timeouts:
$ openssl s_client -connect myhost:443
GET / HTTP/1.1
Host: myhost.whatever.com
[server reply goes here]
GET / HTTP/1.1
Host:
You can script this to make it easier, otherwise you must type/paste the first request and headers within your configured 10s, then type but not complete the second request within the next 10s, you must have at least one full line (the request) submitted for the second request, then just wait.
Decreasing the KeepAliveTimeout (default is 5 seconds) may help. Note that KeepAliveTimeout is the time to receive a complete a request.
I think the next step may be mod_log_forensic
.
Regarding the connections to the back-end via AJP, are you using "ping" on the Apache Balancer
configuration? If I understand your system correctly, the Tomcat configuration you have given won't apply to connections from Apache httpd to Tomcat. See the options here.
/-> apache httpd + ajp -\ /-> tomcat/jboss
client -> load-balancer < > firewall <
\-> apache httpd + ajp -/ \-> tomcat/jboss
Related videos on Youtube
Michael Niemand
Updated on September 18, 2022Comments
-
Michael Niemand almost 2 years
We have 2 Apache webserver behind a load balancer wich are connected to 2 (JBoss) application servers via mod ajp.
To those webservers, mobile devices connect via a REST API.
In our performance test we rather quickly ran into a lot of NonHttpResponse: errors which we identified to be coming from mod_reqtimeout:
[Mon Mar 16 14:42:49.324705 2015] [reqtimeout:info] [pid 27914:tid 140628428449536] [client 1.2.3.4:48280] AH01382: Request header read timeout
... which is configured as follows:
<IfModule reqtimeout_module> RequestReadTimeout header=10-20,minrate=500 RequestReadTimeout body=10,minrate=500 </IfModule>
I was able to get rid of those errors by increasing those values to
RequestReadTimeout header=20-60,minrate=100
But this can't be the solution, since with some more simultaneous users the problem occured again (There is a requirement to be able to serve 300 concurrent users - 100 worked quite ok, with 300 we had over 10,000 of those Request header read timeout errors). I suspect it's the interaction of apaches KeepAlive, our mod_ajp configuration and mod_reqtimeout that leads mod_reqtimeout to the conclusion that there is a slowloris attack ongoing (to many open connections that do nothing) and I kindly ask for your help in tweaking those parameters.
Additional problem is a firewall between webserver and application server, which I suspect to kill open idle connections. I read about deactivating KeepAlive completely to solve this, but as I said, all our clients are mobile devices, so that's probably not an option (?).
Here are the other configs (parts of):
workers.properties:
worker.list=server worker.maintain=60 worker.server.type=ajp13 worker.server.host=server worker.server.port=15869 worker.server.socket_keepalive=True worker.server.connection_pool_timeout=600 worker.server.ping_mode=A worker.server.connection_ping_interval=60
mod_prefork:
<IfModule prefork.c> StartServers 5 MinSpareServers 5 MaxSpareServers 10 #MaxClients 256 MaxClients 300 MaxRequestsPerChild 0 </IfModule>
mainserver.conf:
Timeout 300 KeepAlive On MaxKeepAliveRequests 100 KeepAliveTimeout 15
ssl.conf (mobile devices connect via ssl):
Timeout 1200 KeepAlive On MaxKeepAliveRequests 0 KeepAliveTimeout 10
-
Michael Niemand about 9 yearsWow - thank you very much for your detailed answer! I haven't been able to implement your suggestions yet. I'll be back as soon as I do. Thanks again!!