Why cache static files with Varnish, why not pass

18,533

Solution 1

There are a few advantages to Varnish. The first one you note is reducing load on a backend server. Typically by caching content that is generated dynamically but changes rarely (compared to how frequently it is accessed). Taking your Wordpress example, most pages presumably do not change very often, and there are some plugins that exist to invalidate a varnish cache when the page changes (i.e. new post, edit, comment, etc). Therefore, you cache indefinitely, and invalidate on change - which results in the minimum load to your backend server.

The linked article not-withstanding, most people would suggest that Varnish performs better than Nginx if setup properly - although, (and I really hate to admit it) - my own tests seem to concur that nginx can serve a static file faster than varnish (luckily, I don't use varnish for that purpose). I think that the problem is that if you end up using Varnish, you have added an extra layer to your setup. Passing through that extra layer to the backend server will always be slower than just serving directly from the backend - and this is why allowing Varnish to cache may be faster - you save a step. The other advantage is on the disk-io front. If you setup varnish to use malloc, you don't hit the disk at all, which leaves it available for other processes (and would usually speed things up).

I think that one would need a better benchmark to really gauge the performance. Repeatedly requesting the same, single file, triggers file system caches which begin to shift the focus away from the web-servers themselves. A better benchmark would use siege with a few thousand random static files (possibly even from your server logs) to simulate realistic traffic. Arguably though, as you mentioned, it has become increasingly common to offload static content to a CDN, which means that Varnish probably won't be serving it to begin with (you mention S3).

In a real-world scenario, you would likely prioritize your memory usage - dynamic content first, as it is the most expensive to generate; then small static content (e.g. js/css), and lastly images - you probably wouldn't cache other media in memory, unless you have a really good reason to do so. In this case, with Varnish loading files from memory, and nginx loading them from disk, Varnish will likely out-perform nginx (note that nginx's caches are only for proxying and fastCGI, and those, by default are disk based - although, it is possible to use nginx with memcached).

(My quick - very rough, not to be given any credibility - test showed nginx (direct) was the fastest - let's call it 100%, varnish (with malloc) was a bit slower (about 150%), and nginx behind varnish (with pass) was the slowest (around 250%). That speaks for itself - all or nothing - adding the extra time (and processing) to communicate with the backend, simply suggests that if you are using Varnish, and have the RAM to spare, you might as well just cache everything you can and serve it from Varnish instead of passing back to nginx.)

Solution 2

I think you might be missing something.

By definition, dynamic files change. Typically, they change by doing some sort of database query that affects the content of the page being served up to the user. Therefore, you do not want to cache dynamic content. If you do, it simply becomes static content and most likely static content with incorrect content.

As a simple example, let's say you have a page with the logged in user's username at the top of the page. Each time that page is loaded, a database query is run to determine what username belongs to the logged in user requesting the page which ensures that the proper name is displayed. If you were to cache this page, then the the database query would not happen and all users would see the same username at the top of the page and it likely will not be their username. You need that query to happen on every page load to ensure that the proper username is displayed to each user. It is therefore not cacheable.

Extend that logic to something a little more problematic like user permissions and you can see why dynamic content should not be cached. If the database is not hit for dynamic content, the CMS has no way to determine whether the user requesting the page has permissions to see that page.

Static content is, by definition, the same for all users. Therefore no database query needs to take place to customize that page for each user so it makes sense to cache that to eliminate needless database queries. Images are a really great example of static content - you want all users to see the same header image, the same login buttons, etc, so they are excellent candidates for caching.

In your code snippet above you're seeing a very typical Varnish VCL snippet which forces images, css and javascript to be cached. By default, Varnish will not cache any request with a cookie in it. The logic being that if there is a cookie in the request, then there must be some reason the server needs that cookie so it is required on the back end and must be passed through the cache. In reality, many CMSes (Drupal, Wordpress, etc) attach cookies to almost everything whether or not it is needed so it is common to write VCL to strip the cookies out of content that is known to be static which in turn causes Varnish to cache it.

Make sense?

Solution 3

For dynamic contents, some kind like stock quotes actually change often (updated each second on an SaaS server from a backend server) but might be queried even more often (by tens of thousands of subscription clients):

[stock calculation / backend server] ----- [SaaS server] ------ [subscription clients]

In this case, caching on the SaaS server the per-second update from backend servers makes it possible to satisfy the queries of the tens of thousands of subscription users.

Without a cache on the SaaS server then this model would just not work.

Solution 4

Caching static files with Varnish would benefit in terms of offloading Nginx. Of course, if you have lots of static files to cache, it will waste RAM. However, Varnish has a nice feature - it supports multiple storage backends for its cache.

For static files: cache to HDD For everything else: cache to RAM.

This should give you more insight on how to implement this scenario: http://www.getpagespeed.com/server-setup/varnish-static-files-cache

Share:
18,533

Related videos on Youtube

Bob Whitelock
Author by

Bob Whitelock

I am a Full Stack Web Developer with industry experience building websites and web applications. I specialize in JavaScript and have professional experience in working with PHP, Symfony, NodeJS, React, Redux and Apollo GraphQL. To ensure high quality and standards I have extensive knowledge on CI/CD pipelines such as GitLab CI and testing frameworks such as JUnit, PHPUnit and Cypress.

Updated on September 18, 2022

Comments

  • Bob Whitelock
    Bob Whitelock almost 2 years

    I have a system runnning nginx / php-fpm / varnish / wordpress and amazon s3.

    Now I have looked at a lot of configuration files while setting up the system, and in all of them I found something like this:

        /* If the request is for pictures, javascript, css, etc */
        if (req.url ~ "\.(jpg|jpeg|png|gif|css|js)$") {
            /* Remove the cookie and make the request static */
            unset req.http.cookie;
            return (lookup);
        }
    

    I do not understand why this is done. Most of the examples also run NginX as a webserver. Now the question is, why would you use the varnish cache to cache these static files.

    It makes much more sense to me to only cache the dynamic files so that php-fpm / mysql don't get hit that much.

    Am I correct or am I missing something here?

    UPDATE

    I want to add some info to the question based on the answer given.

    If you have a dynamic website, where the content actually changes a lot, chaching does not make sense. But if you use WordPress for a static website for example, this can be cached for long periods of time.

    That said, more important to me is static conent. I have found a link with some test and benchmarks on different cache apps and webserver apps.

    http://nbonvin.wordpress.com/2011/03/14/apache-vs-nginx-vs-varnish-vs-gwan/

    NginX is actually faster in getting your static content, so it makes more sense to just let it pass. NginX works great with static files.

    --

    Apart from that, most of the time static content is not even in the webserver itself. Most of the time this content is stores on a CDN somewhere, maybe AWS S3, something like that. I think the varnish cache is the last place where you want to have you static content stored.

  • Bob Whitelock
    Bob Whitelock over 12 years
    Thanks for the answer, but I am still not sure. I am aware about the fact that dynamic content changes on some website, but on others, like mine, it does not change often. I just use a CMS to make life simpler. So my dynamic pages can be cached for a week. Important, let's forget about dynamic, I don't understand why to cache static content if you have nginx as a backend. If I am correct nginx and varnish are just as fast in static content, or am I wrong. A static lookup can be handled just as fast with nginx as with varnish. I updated the question a little.
  • Bob Whitelock
    Bob Whitelock over 12 years
    I like the points you made, I just started to digg into this and I find it odd that most of the online guides just let you cache the static content with varnish, I bet some people are caching MB's of static content. It is true what you say, if it are small files, and if you have the memory to spare it is ok.
  • Bob Whitelock
    Bob Whitelock over 12 years
    That said, I for one don't have the memory to spare, and I have some template layout files that I do not want to put on CDN, I just want them in my template directory. I will remove the snippet from my varnish config that caches them, so the memory I have can be used better. I liked the tip about the 3 different setups. I think Ill just open the port to nginx directly and serve the template files from there. having varnish handle html, nginx handle static, and if nessesary php/mysql for some fresh content.
  • cyberx86
    cyberx86 over 12 years
    You'll note that many Varish setups use many GBs of memory - properly setup, and under real life scenarios, I don't doubt that it out-performs nginx; I may suggest though, that it is the flexibility and options Varnish offers that makes it popular - it is specifically designed for caching afterall. With Wordpress, my preferred setup is Wordpress + W3TC (+ Cloudfront) + Varnish + Nginx + PHP-FPM + APC. It actually isn't as fast in some cases as other setups, but it handles load quite well with good performance. Keep in mind that corporate firewalls often block non standard ports.
  • cyberx86
    cyberx86 over 12 years
    Out of curiosity, why not keep your templates (presumably meaning CSS/JS - PHP, of course, must stay on your server) on your CDN? Also, one of my ec2-instances is setup with the same premise in mind, and includes the following: if (req.url ~ "\.(png|gif|jp(e?)g|avi|flv|mp(e?)g|mp4|mp3)"){return(pass);‌​} in vcl_recv(). Essentially, I don't want to cache media - but definitely do want to cache html (php) and even js/css (theory being that images contribute less to perceived page load time than layout does).
  • Bob Whitelock
    Bob Whitelock over 12 years
    I have seen the w3tc, but I don't really like to use plugins . I just create small plugins of my own that take care of specific options for each specific site, so I know what everything does. From a programmers POV I have looked at some plugins, and some are horrible designed. I created my own minify plugin, direct smushing and uploading of media files to s3 and cf, small memcached plugin, and some others. I just haven't got to the point to create the final plugin that takes care of uploading the templates to the CDN.
  • Bob Whitelock
    Bob Whitelock over 12 years
    Now I just set up different nginx server blocks with different hostnames for each of the template dirs of the sites. This enables paralized downloading of the files, because they have a different hostname, but they remain in the same project. I just think the files belong there, I can not really give a reason. I will eventually upload them all to s3 and cf I think. But for now I just want to load them from my server. One other thing is gzip compression. It just seams so messy to have gzipped versions of everything. I think nginx does the job well.
  • Bob Whitelock
    Bob Whitelock over 12 years
    As for the snippet you gave, I think it is really good, I wish more guides had that snippet. Maybe ill implement it one some projects. For now, as I said I work with different nginx server blocks. I had it set up that x.domain.com gets passed directly, and in nginx there is a server x.domain.com with a root of wp-admin/themes/mytheme. But after the small benchmark you gave, x.domain.com:8008 is now the server block, now varnish is skipped altogether. I don't know yet if this is a good approach in terms of browser caching and SEO, I am checking that out now.
  • Shane N
    Shane N over 8 years
    Just curious why you could put static files on a HDD cache - isn't that essentially the same thing as just serving them from disk without a cache?
  • Shane N
    Shane N over 8 years
    Ah ok, that makes sense - thanks @danila-v