Ubuntu's garbage collection cron job for PHP sessions takes 25 minutes to run, why?

8,992

Solution 1

Removing of fuser should help. This job runs a fuser command (check if a file is currently opened) for every session file found, which can easily take several minutes on a busy system with 14k sessions. This was a Debian bug (Ubuntu is based on Debian).

Instead of memcached you can also try to use tmpfs (a filesystem in memory) for session files. Like memcached this would invalidate sessions on reboot (this can be worked around by backing up this directory somewhere in shutdown script and restoring in startup script), but will be much easier to setup. But it will not help with fuser problem.

Solution 2

Congratulations on having a popular web site and managing to keep it running on a virtual machine for all this time.

If you're really pulling in two million pageviews per day, then you're going to stack up a LOT of PHP sessions in the filesystem, and they're going to take a long time to delete no matter whether you use fuser or rm or a vacuum cleaner.

At this point I'd recommend you look into alternate ways to store your sessions:

  • One option is to store sessions in memcached. This is lightning fast, but if the server crashes or restarts, all your sessions are lost and everyone is logged out.
  • You can also store sessions in a database. This would be a bit slower than memcached, but the database would be persistent, and you could clear old sessions with a simple SQL query. To implement this, though, you have to write a custom session handler.

Solution 3

So, the Memcached and database session storage options suggested by users here are both good choices to increase performance, each with their own benefits and drawbacks.

But by performance testing, I found that the huge performance cost of this session maintenance is almost entirely down to the call to fuser in the cron job. Here's the performance graphs after reverting to the Natty / Oneiric cron job which uses rm instead of fuser to trim old sessions, the switchover happens at 2:30.

CPU usage

Elapsed IO time

Disk operations

You can see that the periodic performance degradation caused by Ubuntu's PHP session cleaning is almost entirely removed. The spikes shown in the Disk Operations graph are now much smaller in magnitude, and about as skinny as this graph can possibly measure, showing a small, short disruption where previously server performance was significantly degraded for 25 minutes. Extra CPU usage is entirely eliminated, this is now an IO-bound job.

(an unrelated IO job runs at 05:00 and CPU job runs at 7:40 which both cause their own spikes on these graphs)

The modified cron job I'm now running is:

09 *     * * *     root   [ -x /usr/lib/php5/maxlifetime ] && \
   [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 \
   -maxdepth 1 -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 \
   | xargs -n 200 -r -0 rm

Solution 4

I came across this post when doing some research on sessions. While the accepted answer is very good (and the fuser call has been removed from the gc script for some time) I think its worth noting a few other considerations should anyone else come across a similar issue.

In the scenario described, the OP was using ext4. Directories in ext4 store file data in an htree database format - which means there is negligible impact in holding lots of files in a single directory compared with distributing them across mutliple directories. This is not true of all filesystems. The default handler in PHP allows you to use multiple sub-directories for session files (but note that you should check that the controlling process is recursing into those directories - the cron job above does not).

A lot of the cost of the operation (after removing the call to fuser) arises from looking at files which are not yet stale. Using (for example) a single level of subdirectories, and 16 cron jobs looking in each sub directory ( 0/, 1/, ...d/, e/, f/) will smooth out the load bumps arising.

Using a custom session handler with a faster substrate will help - but there's lots to choose from (memcache, redis, mysql handler socket...) leaving aside the range in quality of those published on the internet, which you choose depends on the exact requirements with regard to your application, infrastructure and skills, not to forget that there are frequently differences in the handling of semantics (notably locking) compared with the default handler.

Share:
8,992

Related videos on Youtube

thenickdude
Author by

thenickdude

Updated on September 18, 2022

Comments

  • thenickdude
    thenickdude almost 2 years

    Ubuntu has a cron job set up which looks for and deletes old PHP sessions:

    # Look for and purge old sessions every 30 minutes
    09,39 *     * * *     root   [ -x /usr/lib/php5/maxlifetime ] \
       && [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 \
       -maxdepth 1 -type f -cmin +$(/usr/lib/php5/maxlifetime) ! -execdir \
       fuser -s {} 2> /dev/null \; -delete
    

    My problem is that this process is taking a very long time to run, with lots of disk IO. Here's my CPU usage graph:

    CPU usage graph

    The cleanup running is represented by the teal spikes. At the beginning of the period, PHP's cleanup jobs were scheduled at the default 09 and 39 minutes times. At 15:00 I removed the 39 minute time from cron, so a cleanup job twice the size runs half as often (you can see the peaks get twice as wide and half as frequent).

    Here are the corresponding graphs for IO time:

    IO time

    And disk operations:

    Disk operations

    At the peak where there were about 14,000 sessions active, the cleanup can be seen to run for a full 25 minutes, apparently using 100% of one core of the CPU and what seems to be 100% of the disk IO for the entire period. Why is it so resource intensive? An ls of the session directory /var/lib/php5 takes just a fraction of a second. So why does it take a full 25 minutes to trim old sessions? Is there anything I can do to speed this up?

    The filesystem for this device is currently ext4, running on Ubuntu Precise 12.04 64-bit.

    EDIT: I suspect that the load is due to the unusual process "fuser" (since I expect a simple rm to be a damn sight faster than the performance I'm seeing). I'm going to remove the use of fuser and see what happens.

    • Michael Hampton
      Michael Hampton almost 12 years
      Just how much traffic does your web site get to generate that many sessions?
  • thenickdude
    thenickdude almost 12 years
    Memcached is certainly an option, although it would have to be a separate pool from our main memcached instance, otherwise sessions would be getting randomly evicted from our cache pressure. I'm not convinced that deleting 14,000 files should take 25 minutes, though. That sounds way too slow to me. I'm going to wait a couple of hours and see what the performance of a simple rm is like.
  • Michael Hampton
    Michael Hampton almost 12 years
    Without knowing more about your overall architecture, I hesitate to recommend one over the other.
  • thenickdude
    thenickdude almost 12 years
    It sounds like the bug in fuser was that an earlier version forked but then was never reaped upon completion, leaving thousands of fuser processes in a zombie state consuming memory, which leads to a server crash. I think that has already been fixed in the version of psmisc that I'm using.
  • Tometzky
    Tometzky almost 12 years
    That's another bug. You have a simple problem of starting thousands of fuser processes, which all must search the whole /proc/ for open files.
  • Tometzky
    Tometzky almost 12 years
    -print0 | xargs ... isn't necessary - you could simply leave -delete there. But it will work both ways with comparable speed.
  • Ji woong Yu
    Ji woong Yu about 10 years
    You can pool Memcached servers for redundancy by setting memcache.session_redundancy=2. See serverfault.com/questions/164350/… . Redis is a good option if you are concerned about persistence and much faster than SQL database stores.