Apache web server intermittent stalls

5,057

I've now resolved this so I'm posting the solution in case others experience the same issue.

I neglected to mention that all of our web traffic goes over HTTPS, and that appears to be the cause. During a stall I used strace and pstack to see what one of the idle Apache processes was doing. It was stuck waiting on a mutex for the SSL session cache.

Looking at Apache config I noticed we had SSLSessionCache enabled with a timeout of 5 minutes. Disabling this is the fix.

My guess is that the session cache was filling up, then Apache was waiting for older sessions to time out before continuing.

Share:
5,057

Related videos on Youtube

Mark Williams
Author by

Mark Williams

Updated on September 18, 2022

Comments

  • Mark Williams
    Mark Williams over 1 year

    Our SOAP web server is running in PHP on Apache on CentOS and makes heavy use of MySQL. There is heavy demand on the server - most requests are very small and involve only two or three MySQL queries, but there are an awful lot of them - potentially a couple of hundred per second at peak times. Data traffic with each request is usually less than 1Kb, often only a few bytes.

    The hardware this is running on is pretty decent, 18 cores with 32 Gb RAM, and it generally copes really well. CPU usage never really goes above 30%, physical RAM consumption never above 50%. However, every so often, the server appears to stall and Apache chokes up. This can last for around a minute before it loosens up again and normal service resumes.

    I've analysed this in quite some depth to see what is going on during the stalls. Apache is maxed out on it's connections, pretty much all of which are in the 'reading' state. CPU usage drops to pretty much nothing, memory usage doesn't change, network and disk IO both plummet, so it looks like the system is just completely idle.

    After doing a lot of Googling, I was led to believe this could be to do with some time out settings - network connections not being freed up quickly enough, and Apache running out. This would explain why Apache will resume normal operation after a while, it waits for them all to time out, then carries on. Doing a 'netstat -an' would support this as I do see a lot of connections in TIME_WAIT. However, I've reduced all sorts of timeout settings in the Apache configuration, and also with various net settings in sysctl.conf, but nothing appears to resolve the issue.

    There is nothing at all in Apache's error logs. I've tried using 'ab' to stress test Apache - it appears to cause the intermittent stall to happen slightly sooner, but that's all I can really gauge from it. The max connections for Apache and MySQL are both set to high values - actual concurrent connections never come close except during the stall when the Apache connections max out.

    I'm not really sure what else to try. Any ideas or pointers on things I might be missing here?

    --edit--

    A couple of extra observations. As the stall is occurring, I notice the number of connections in the ESTABLISHED state rise considerably, then the number in CLOSE_WAIT follows a few seconds later.

    Also, when the stall occurs, the number of 'times the listen queue of a socket overflowed' and the 'SYNs to LISTEN sockets ignored' increases quite rapidly. During the intervals between stalls, these numbers do not change at all.

    I'm not sure whether these numbers are a cause or a consequence of the stall. Any further help would be much appreciated.

    • Marcelo
      Marcelo almost 10 years
      A performance drop from the DB engine would cause an increase on connections on the Apache side as they would last longer. Have you checked MySQL logs to see if the DB is causing the wait? (A record lock perhaps?)
    • Mark Williams
      Mark Williams almost 10 years
      Doing 'SHOW PROCESSLIST' in MySQL shows an empty list during the stall. MySQL CPU usage drops from 20% to 0% during the stall. I've also tried pointing 'ab' at an empty PHP file that does no MySQL stuff, and that can cause the issue. Strangely, it copes better pointing it at an empty HTML file, so I wonder if this is something to do with PHP?
    • Mark Williams
      Mark Williams almost 10 years
      If the web server was under attack, would I not notice an increase in network traffic? If I restart Apache right at the start of one of the stalls, it picks up and works as normal - almost feels like periodically restarting Apache could be a fix ;-).
  • Mark Williams
    Mark Williams almost 10 years
    Thanks Ron, my Keep-Alive Timeout is currently set to 2, so I don't think that is the issue. When the stall occurs, it can usually last for up to a minute.
  • Ron Lawhorn
    Ron Lawhorn almost 10 years
    Well, we can assume it's not the concurrent connections hanging...
  • Ron Lawhorn
    Ron Lawhorn almost 10 years
    For the moment, let's assume it's not concurrent connections hanging, have you tried running an ipstate when your server hangs and see what's happening on your external interface?