How to pre-warm CloudFront edge servers' cache?

10,294

Solution 1

Here is a post from an Amazon employee from a few years ago regarding pre-warming:

The issue is, if everyone pre-cached objects to a edge location all of the edge locations would run out of space because they would be flooded with objects that may or may not be being accessed. Right now the edge location only caches objects that are being accessed by users and will dump objects that have no been accessed recently if there is no more room on the server. I am not sure how a pre-cached scenario would work if it were a service.

Unless you know that you're going to be getting a steady flow of traffic over an extended period of time then pre-warming simply may not be effective. CloudFront may very well expire your objects to make room for other customers objects if they're getting more (or more recent) traffic than your own. Once that happens you've completely lost the advantage of any pre-warming you're able to perform.

Solution 2

We distribute our app downloads via CloudFront and I need to pre-warm two downloads on the website. We use Pingdom for that. They ping the url I provided from random locations across the globe and thus pre-warm the cloudfront cache for us.

Solution 3

https://github.com/chrismay/warmfront looks like a suitable solution. Resolving the CloudFront distribution over multiple DNS servers around the world and sending requests to the resolved IP addresses.

Solution 4

I don’t think you can do this worldwide. Unless you can figure out the IP address of your site around the world from cloudfront. But if you have a sitemap.xml in your site, you can use wget to get every page in your site cached. It will download the sitemap and then request every page from the sitemap. You can set it up in cron:

yoursite=https://yoursite.com
wget --quiet $yoursite/sitemap.xml --output-document - | egrep -o "$yoursite[^<]+" | wget -O/dev/null --header="User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36" --header='Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'  --header='Accept-Encoding:gzip, deflate, sdch, br' --header='Accept-Language:en-US,en;q=0.8,fr;q=0.6' --header='Cache-Control:no-cache' --header='Pragma:no-cache' --header='Upgrade-Insecure-Requests:1' $yoursite/ -i -

Feel free to modify the headers. I noticed unless you specify gzip, it won’t cache gzip. Also if you want to limit the pages -> do it in sitemap.

Get wget for mac by brew or linux by using yum.

Solution 5

The only way to pre-warm the CF cache is to request a specific list of files, such as the list of chunks or fragments of the transcoded video. Those requests may come from different ISPs which connect to different edge caches of CF.

Even if we crate an EC2 instance and RDP or SSH into that instance and wget or curl or otherwise request the video chunks, that will only populate 1-2 edge caches that may or may not be related to the ones you are using and complaining about.

The request cannot specify the edge cache. The request can specify the region of AWS CF, but not the specific edge cache.

Share:
10,294
Ashish Goel
Author by

Ashish Goel

Updated on July 18, 2022

Comments

  • Ashish Goel
    Ashish Goel almost 2 years

    We have a requirement where we want to pre-warm the cache of all the edge server for a specific CloudFront distribution. For this, we will need to direct a few calls to all the edge servers. Is there any way of identifying all edge servers within a region or globally?

    For example:

    When going through the information available here: http://aws.amazon.com/cloudfront/details/

    I found that there are multiple edge servers within Asia, lets take Mumbai (India) and Hong Kong for example, I am interested in directing a few HTTP calls to these edge servers. How can I achieve this? If I can get the identity of all these servers, that will be great.

    Also, we originally thought of the following approach: Have a micro instance in all the regions, and access our assets from them.

    But there is a problem with this approach. From CloudFront documentation, I have come to understand that there is no hierarchal structure between the edge servers, if any of the edge server does not have the required asset, it directly ask Origin server for that asset. Now, a single region can have multiple edge servers. If within a region, all the edge servers are not having the required assets, then it defeats the purpose of doing this exercise. So, my question is how to ensure that all the edge servers in a region have the required assets in their cache?

    Please note that the content is static in nature. And if covering all edge servers is not possible then even if there is an approach which can cover, lets say, 70% of the edge servers, I would like to try that.