Forcing CloudFront to pass-through the latest HTML file from S3

23,223

Solution 1

Firstly, the point of Cloudfront is to serve cached content - if you try to serve uncached content from Cloudfront it is slower than serving it directly from S3, in almost all cases (something like streaming content would be the exception). Consider for a moment what needs to happen to serve content from Cloudfront - it needs to be retrieved from the origin server to a location that is geographically close to the user - which means that for a request where Cloudfront has to retrieve content from the origin server, you add extra latency into the request, and the user receives content slower. It is only once the content is available at the edge location that subsequent requests are faster.

The best approach to this problem is to change your filenames when you update a page - this will force Cloudfront to retrieve the new content. Again, keep in mind that Cloudfront is typically used for media files (including images) and style/javascript - and not so much for html. Esssentially, you would have your HTML on S3, and your images on Cloudfront - with any changes you make, you can change the name of the file on Cloudfront (e.g. file-v1.jpg, file-v2.jpg, etc). Another common way is including a query string with version information.

Also, keep in mind that Cloudfront does not serve gzipped content - which may result in a slower response than from a regular server (although, in your case, S3 doesn't identify gzip capable browsers either).

Finally, if you want to, you can use invalidation to force Cloudfront to discard its existing copy and fetch a new one from the origin server. Note, however, that Cloudfront gives you only 1000 free invalidations per month, after which the cost is $0.005/invalidation.

The lowest time Cloudfront will keep content is 1hr, although, the default is 24hr. I'd therefore try to set the max-age to at least 3600. Consider also an s-maxage header (for shared - i.e. proxied content). Amazon recommends this caching tutorial.

There was a recent problem with this, rectified a few days ago

Solution 2

I believe the answers so far, while correct at the time, are now out of date, as Cloudfront now supports a minimum TTL of 0, and the OP's original attempt to use cache-age=0 should now work.

You would want to look into whether to use those other cache-control headers, in terms of whether they will produce the result you are looking for - you may only need max-age. What you probably want is for Cloudfront to check S3 to see if the HTML file has changed. If it has, Cloudfront can fetch and return the new file. If not, it can serve the client from its existing cache (conserving S3 bandwidth, and serving the client faster, and more locally).

The point of Cloudfront is to serve cached content, yes, but now this includes content that sometimes changes, but can be cached if it has not changed.

P.s. query strings also work with Cloudfront now (if you configure a 'behaviour' for the relevant origin - another new feature), however some proxies may still fail to cache any files with query strings.

Amazon Developer Guide: Expiration1

Share:
23,223

Related videos on Youtube

Gibo
Author by

Gibo

Updated on September 18, 2022

Comments

  • Gibo
    Gibo over 1 year

    Background

    I'm hosting a static site on S3, with CloudFront over the top. The issue I have is with my HTML files.

    According to CloudFront's FAQ:

    Amazon CloudFront uses these cache control headers to determine how frequently it needs to check the origin for an updated version of that file

    What I've done so far

    With this in mind I've set the HTML files in my S3 Bucket to add in the following headers:

    Cache-Control: no-cache, no-store, max-age=0, must-revalidate
    Expires: Fri, 01 Jan 1990 00:00:00 GMT
    

    On my first call to my samplefile.htm, I see the following response headers (I've excluded obvious headers (e.g. Content-Type) in order to keep to the point:

    Cache-Control:no-cache, no-store, max-age=0, must-revalidate
    Date:Sat, 10 Dec 2011 14:16:51 GMT
    ETag:"a5890ace30a3e84d9118196c161aeec2"
    Expires:Fri, 01 Jan 1990 00:00:00 GMT
    Last-Modified:Sat, 10 Dec 2011 14:16:43 GMT
    Server:AmazonS3
    X-Cache:Miss from cloudfront
    

    As you can see, my Cache-Control header is in there. The problem is, if I update this file and refresh I get the cached content (rather than the latest file), and I can see that CloudFront is serving its cached version by looking at the response headers:

    X-Cache:Hit from cloudfront

    Summary/question

    With the above in mind, how can I achieve automatic retrieval of the latest HTML when using CloudFront?

    As per its FAQ I should be able to do this with Cache-Control headers, but I can't seem to get this working.

    Following the answers below

    In the end I decided to change my www CNAME to point to my S3 bucket directly. Then added a new CNAME called "static", which points to CloudFront.

    This means that HTML is direct from S3, which then has all its CSS/JS/IMG references pointing to static.mydomain.com

    • Sajag Kc
      Sajag Kc about 3 years
      shouldn't 01 Jan 1990 be a Monday?
  • Gibo
    Gibo over 12 years
    The reason for sticking CF over S3 was from Werner Vogels mentioning it himself in his blog post allthingsdistributed.com/2011/02/website_amazon_s3.html. I might consider routing the html direct from s3 as you say. One minor note: adding a query string at the end of files for cache-busting isn't a good idea as it can cause some proxies to never cache.
  • Gibo
    Gibo over 12 years
    This guy seems to use invalidation on every upload which seems overkill jmlacroix.com
  • cyberx86
    cyberx86 over 12 years
    Query strings will not work with Cloudfront - it will not cache the files, but they can be effective if you serve your content directly. HTML from S3 would be your best bet. You definitely don't want to invalidate everything on every upload, but invalidating the files that have changed is not without merit in some cases. The merits of Cloudfront only really become relevant on heavily trafficked sites - for your average site, S3 may even offer better performance (try them both and see - especially for small objects Cloudfront can be slow).
  • Greg Sadetsky
    Greg Sadetsky almost 8 years
    Cloudfront now supports Gzip compression. Announcement here.
  • xvga
    xvga over 6 years
    @cyberx86 limits are different nowadays: The minimum expiration time CloudFront supports is 0 seconds for web distributions and 3600 seconds for RTMP distributions. docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/…