MISS from Cloudfront after HIT from Cloudfront

10,680

This is similar to this issue: Why are two requests with different clients from the same computer cache misses on cloudfront?

Depending on whether you provide the "Accept-Encoding: gzip" header or not, CloudFront edge server caches the object separately. Since browsers provides this header by default, and your site is likely to be accessed majorly via browser, I will suggest changing your curl call to include this header.

I was facing the same problem, after making the change in my curl call, I started to get a Hit from the browser on my first try via browser (after making a curl call).

Another thing I noticed is that CloudFront requires the full requested object to be downloaded before it will be cached. If you try to download the file partially by specifying the byte range in the curl, the intended object does not get cached, only the downloaded part gets cached as a different object. Same goes for a curl that was terminated in between. The other options I tried were wget call with spider option, but that internally does a HEAD call only and thus does not get the content cached on the edge server.

Share:
10,680
manu4543
Author by

manu4543

An Open source enthusiast who gets excited about almost everything related to web. Drupal, Mozilla are some of those..

Updated on June 04, 2022

Comments

  • manu4543
    manu4543 almost 2 years

    I am switching to Amazon Cloudfront for serving images on my website. To reduce load when we finally make it live, I thought of warming up the cache by hitting image URLs (I am making these request from India and expect majority of users to request from the same region so no need to have a copy of object on all edge locations worldwide).

    The problem is that script uses curl to request image and when I access the same URL in browser I get MISS from Cloudfront. So Cloudfront is making two copies of object for these two request.

    My current Cloudfront configuration forwards Content-Type request Header to origin.

    How should I configure Cloudfront so that it doesn't care about request headers at all and once I made a request (whether curl or using browser) it should serve all future request for same resource from edge and not origin.

    Request/Response headers- I am afraid that the Cloudfront url won't be accessible from outside (until we go live) but I am posting request/response headers, this should give you fair idea. Also you can check out caching headers at origin - https://origin.ixigo.com/image/upload/t_thumb,f_auto/r7y6ykuajvlumkp4lk2a.jpg

    Response after two successive request using browser
    Remote Address:54.230.156.66:443
    Request URL:https://youcannotaccess.com/image/upload/t_thumb,f_auto/r7y6ykuajvlumkp4lk2a.jpg
    Request Method:GET
    Status Code:200 OK
    Response Headers
    view source
    Accept-Ranges:bytes
    Age:23
    Cache-Control:public, max-age=31557600
    Connection:keep-alive
    Content-Length:8708
    Content-Type:image/jpg
    Date:Fri, 27 Nov 2015 09:16:03 GMT
    ETag:"-170562206"
    Last-Modified:Sun, 29 Jun 2014 03:44:59 GMT
    Vary:Accept-Encoding
    Via:1.1 7968275877e438c758292828c0593684.cloudfront.net (CloudFront)
    X-Amz-Cf-Id:fcbGLv8uBOP89qfR52OWa-NlqWkEREJPpZpy9ix0jdq8-a4oTx7lNw==
    X-Backend:image6_40
    X-Cache:Hit from cloudfront
    X-Cache-Hits:0
    X-Device:pc
    X-DeviceType:pc
    X-Powered-By:xyz

    Now same url requested using curl but gave me miss
    curl manu-mdc:cache manuc$ curl -I https://youcannotaccess.com/image/upload/t_thumb,f_auto/r7y6ykuajvlumkp4lk2a.jpg
    HTTP/1.1 200 OK
    Content-Type: image/jpg
    Content-Length: 8708
    Connection: keep-alive
    Age: 0
    Cache-Control: public, max-age=31557600
    Date: Fri, 27 Nov 2015 09:16:47 GMT
    ETag: "-170562206"
    Last-Modified: Sun, 29 Jun 2014 03:44:59 GMT
    X-Backend: image6_40
    X-Cache-Hits: 0
    X-Device: pc
    X-DeviceType: pc
    X-Powered-By: xyz
    Vary: Accept-Encoding
    X-Cache: Miss from cloudfront
    Via: 1.1 4d42171c56a4c8b5c627040e6aa0938d.cloudfront.net (CloudFront)
    X-Amz-Cf-Id: fY0LXhp7NlqB-I8F5-1TIMnA6bONjPD3CEp7dsyVdykP-7N2mbffvw==

    Now this will give HIT
    manu-mdc:cache manuc$ curl -I https://youcannotaccess.com/image/upload/t_thumb,f_auto/r7y6ykuajvlumkp4lk2a.jpg
    HTTP/1.1 200 OK
    Content-Type: image/jpg
    Content-Length: 8708
    Connection: keep-alive
    Cache-Control: public, max-age=31557600
    Date: Fri, 27 Nov 2015 09:16:47 GMT
    ETag: "-170562206"
    Last-Modified: Sun, 29 Jun 2014 03:44:59 GMT
    X-Backend: image6_40
    X-Cache-Hits: 0
    X-Device: pc
    X-DeviceType: pc
    X-Powered-By: xyz
    Age: 3
    Vary: Accept-Encoding
    X-Cache: Hit from cloudfront
    Via: 1.1 6877899d48ba844a34ea4378ce336f06.cloudfront.net (CloudFront)
    X-Amz-Cf-Id: qpPhbLX_5t2Xj0XZuZdjWD2w-BI80DUVyL496meQkLfSEn3ikt7hNg==