Squid or Other HTTP Caches with SSD Cache Store?

6,374

Solution 1

We've been using varnish on ssd drives for the last 9 months, it has worked extremely well for us. We previously used a squid memory only cache with a carp layer. It worked, but memory fragmentation was a real problem requiring frequent restarts. Squid 2.x also will only use one core which makes it rather inefficient on current hardware.

For our site, which is very cache friendly, we see about about 10% cpu usage on an 8 core machine serving 100Mbit/s of traffic. In our tests we run out of bandwidth before we hit cpu limits with 2 1Gb ports.

I do have some advice for running varnish with an ssd cache.

  • Random write performance really matters. We tried several vendor's for ssd drives before settling on the intel x-25m. We've seen some post as little as .1MB/s for 4k random writes, we get 24MB/s 4k random writes with the x-25m.

  • Raid0. The cache in 2.0 is not persistent, so no need to worry about redundancy. This does make restarts hurt, but those are rare. You can do things like load a new config and purge objects without restart.

  • mmap mode. The varnish cache can be mmap'd to a file or use swap space. Using swap has not worked well for us, it tends to use more i/o bandwidth to serve the same amount of traffic. There is a 4 sector readahead in the linux swapin code, we wrote a patch to remove this but have not tried it in production.

  • Deadline scheduler. With 2.6.28+ this is ssd aware and performs well. We tried noop but found that deadline was fairer as i/o bandwidth becomes limited.

  • Disable read ahead. Since there is no rotational delay, no point in reading extra data just because you might need it. i/o bandwidth is precious on these things.

  • Run 2.6.28+. mmap of a lot of space on linux gives the memory manager a good workout, but the split lru patches help a lot. kswapd cpu usage dropped a lot when we updated.

We've posted our vcl file as well as several tools we use with varnish at link text. The vcl also includes a neat hack implementing a very fast geoiplookup server based on the maxmind database.

Solution 2

I'm not using SSDs as HTTP caches, but I can make these observations:

Not all SSDs are equal, so you have to be very careful about picking decent ones. FusionIO make PCIe-backed SSDs which are really high-end performers (with relatively low capacity), but costly. Intel's X25-E SLC SSDs perform really well, and are more affordable, but still low capacity. Do your research! I can definitely recommend the X25-E SLC variants, as I'm using these in production systems.

There are other SSDS out there which may give you great sequantial read/write speed, but the important thing for something like a cache is random IO, and a lot of SSDs will give approximately the same random performance as spinning disks. Due to write amplification effects on SSDs, spinning disks will often perform better. Many SSDs have poor quality controllers (eg, older JMicron controllers), which can suffer from significantly degraded performance in some situations. Anandtech and other sites do good comparisons with tools like iometer, check there.

And, of course, SSDs are small. The Intel X25-E, which I would say are the best SATA SSD I've seen, only come in 32 and 64 GB variants.

For RAID levels, standard RAID performance notes still apply. A write to a RAID 5 baically involves reading the data block you're going to modify, reading the parity block, updating the parity, writing the data block, and writing the parity, so it is still going to give worse performance than other RAID levels, even with SSDs. However, with drives like the X25-E having such high random IO performance, this probably matters less - as it's going to still outperform random IO on spinning disks for a similarly sized array.

From what I've seen, RAID controller bandwidth is saturated too soon for getting the most benefit out of a 7 disk RAID set, at least as far as sequential performance is concerned. You can't get more than about 800MB/s out of current models of SATA controllers (3ware, areca etc). Having more smaller arrays, across multiple controllers (eg, several RAID1s rather than a single RAID10) will improve this, although the individual performance of each array will suffer.

Regarding an HTTP cache, I think you'd be better served with a decent array of spinning disks, and plenty of ram. Frequently accessed objects will stay in memory cache - either in squid's internal cache, or in your OS's fs cache. Simply giving a machine more ram can significantly reduce the disk loading due to this. If you're running a large squid cache you'll probably want lots of disk space, and the high-performing SSDs still only come in relatively low capacity.

Solution 3

I'm not very familiar with SSD drives, but I can talk about the sort of architecture I've used which may help solve some of your problems.

Siblings

In my case I built four servers with 16GB of RAM each. I set 9GB as the the in memory cache for Squid to use. I configured them as a set of siblings so a query to one server would query the others before looking for the data. Altogether I had 36GB of in memory cache. I would not got over four siblings as the communication between them starts to bog down.

VIPs

I configured a VIP for the four servers for client to talk to. This solved what happens when one server goes down.

Children

I set my web application to query a local Squid server running on 127.0.0.1. Then configured the parent of this Squid instance to be the VIP. This allows for very quick failover in the event of the entire VIP going down. If the parents don't respond, the child queries the services directly. It's also handy if you're using a single Squid server and don't have a VIP. Of course if the local Squid instance on your webserver goes down everything grinds to halt.

Squid itself

I haven't really looked at 3.0, but 2.x is still single threaded. At some point you're going to run out of CPU or TCP buffers. I'd spread the cache across 2-3 less boxes if possible. Also you may want to make plans to partition your Squid farms in the future if you see the system growing.

In any case good luck with your SSD build. I'm interested to hear how it turns out as I'll probably go that route in the future.

Share:
6,374
Andras Balázs Lajtha
Author by

Andras Balázs Lajtha

Bay Area techie since 1999.

Updated on September 17, 2022

Comments

  • Andras Balázs Lajtha
    Andras Balázs Lajtha over 1 year

    I'm contemplating setting up a squid (or possibly varnish) cache on a system with SSD drives.

    The obvious benefit is that these systems have great READ speeds and I expect my hit ratios to be fairly high.

    Let's assume I can put 7 SSDs in to a RAID configuration. (there are some cases that will let me pack in much much more)

    Implementation questions:

    • Should I use RAID0? (I expect a drive to fail eventually, so this seems dangerous.)

    • Should I use RAID10? (This halves my disk footprint, which is costly.)

    • Should I use RAID5? (SSDs are known to have "bad" write performance and write limits, and all the extra parity writes may slow this down considerably.)

    • Should I just treat each disk as it's own squid datastore? (how well does squid handle multiple data stores? and what happens if/when one fails?)

    • Should I ignore datastores and just make the SSDs in to large SWAP partitions and let the linux VM do it's thing? (seems sloppy)

    Any advice from folks using SSDs in production environments would be greatly appreciated. (esp if you're using them for HTTP caches)

    • Bob
      Bob almost 15 years
      +1 for an interesting question, I never considered making drives just into a large swap partition
    • Oskar Duveborn
      Oskar Duveborn almost 15 years
      Yeah definitely interesting... though I'm heavily inclined not to fall into the SSD bandwagon and simply add more RAM for that money instead.
    • Andras Balázs Lajtha
      Andras Balázs Lajtha almost 15 years
      Sadly, the cache footprint I need won't fit in RAM. I already have RAM-backed squid caches in place for those objects.
  • Pyrolistical
    Pyrolistical almost 15 years
    Even the X25-M are useable
  • Andras Balázs Lajtha
    Andras Balázs Lajtha almost 15 years
    I've done my homework and know to avoid the JMicrons. I was mostly considering the X25-Ms (Intel MLC) and possibly the newer (non JMicron) OCZ Vertex series.
  • Andras Balázs Lajtha
    Andras Balázs Lajtha almost 15 years
    How well does Squid recover if a single data store falls out? (obviously I need to test this) RAID5 is a compromise if Squid isn't graceful about a datastore failing.
  • Pyrolistical
    Pyrolistical almost 15 years
    wow, the ocz vertex has lower maximum random write than even the x25-m!!!
  • Andrew Schulman
    Andrew Schulman over 10 years
    Please provide a link to the relevant section of the Squid documentation.
  • Amos Jeffries
    Amos Jeffries over 7 years