Resize images on the fly in CloudFront and get them in the same URL instantly: AWS CloudFront -> S3 -> Lambda -> CloudFront

10,582

Solution 1

Finally I was able to solve it. Although this is not really a structural solution, it does what we need.

First, thanks to the answer of Michael, I have used path patterns to match all media types. Second, the Cache Behavior page was a bit misleading to me: indeed the Lambda association is for Lambda@Edge, although I did not see this anywhere in all the tooltips of the cache behavior: all you see is just Lambda. This feature cannot help us as we do not want to extend our AWS service scope with Lambda@Edge just because of that particular problem.

Here is the solution approach:
I have defined multiple cache behaviors, one per media type that we support:

enter image description here

For each cache behavior I set the Default TTL to be 0.

And the most important part: In the Lambda function, I have added a Cache-Control header to the resized images when putting them in S3:

s3_resource.Bucket(BUCKET).put_object(Key=new_key, 
                                      Body=edited_image_obj,
                                      CacheControl='max-age=12312312',
                                      ContentType=content_type)

To validate that everything works, I see now that the new image dimention is served with the cache header in CloudFront:

enter image description here

Solution 2

You're on the right track... maybe... but there are at least two problems.

The "Lambda Function Association" that you're configuring here is called Lambda@Edge, and it's not yet available. The only users who can access it is users who have applied to be included in the limited preview. The "maximum allowed is 0" error means you are not a preview participant. I have not seen any announcements related to when this will be live for all accounts.

But even once it is available, it's not going to help you, here, in the way you seem to expect, because I don't believe an Origin Response trigger allows you to do anything to trigger CloudFront to try a different destination and follow the redirect. If you see documentation that contradicts this assertion, please bring it to my attention.

However... Lambda@Edge will be useful for setting Cache-Control: no-cache on the 307 so CloudFront won't cache it, but the redirect itself will still need to go all the way back to the browser.

Note also, Lambda@Edge only supports Node, not Python... so maybe this isn't even part of your plan, yet. I can't really tell, from the question.

Read about the Lambda@Edge limited preview.

The second problem:

I am trying to set path pattern -\d+x\d+\..+$

You can't do that. Path patterns are string matches supporting * wildcards. They are not regular expressions. You might get away with /*-*x*.jpg, though, since multiple wildcards appear to be supported.

Share:
10,582

Related videos on Youtube

katericata
Author by

katericata

Come to me, all you who are weary and burdened, and I will give you rest

Updated on July 01, 2022

Comments

  • katericata
    katericata almost 2 years

    TLDR: We have to trick CloudFront 307 redirect caching by creating new cache behavior for responses coming from our Lambda function.

    You will not believe how close we are to achieve this. We have stucked so badly in the last step.

    Business case:

    Our application stores images in S3 and serves them with CloudFront in order to avoid any geographic slow downs around the globe. Now, we want to be really flexible with the design and to be able to request new image dimentions directly in the CouldFront URL! Each new image size will be created on demand and then stored in S3, so the second time it is requested it will be served really quickly as it will exist in S3 and also will be cached in CloudFront.

    Lets say the user had uploaded the image chucknorris.jpg. Only the original image will be stored in S3 and wil be served on our page like this:

    //xxxxx.cloudfront.net/chucknorris.jpg

    We have calculated that we now need to display a thumbnail of 200x200 pixels. Therefore we put the image src to be in our template:

    //xxxxx.cloudfront.net/chucknorris-200x200.jpg

    When this new size is requested, the amazon web services have to provide it on the fly in the same bucket and with the requested key. This way the image will be directly loaded in the same URL of CloudFront.

    I made an ugly drawing with the architecture overview and the workflow on how we are doing this in AWS:

    enter image description here

    Here is how Python Lambda ends:

    return {
        'statusCode': '301',
        'headers': {'location': redirect_url},
        'body': ''
    }
    

    The problem:

    If we make the Lambda function redirect to S3, it works like a charm. If we redirect to CloudFront, it goes into redirect loop because CloudFront caches 307 (as well as 301, 302 and 303). As soon as our Lambda function redirects to CloudFront, CloudFront calls the API Getaway URL instead of fetching the image from S3:

    enter image description here

    I would like to create new cache behavior in CloudFront's Behaviors settings tab. This behavior should not cache responses from Lambda or S3 (don't know what exactly is happening internally there), but should still cache any followed requests to this very same resized image. I am trying to set path pattern -\d+x\d+\..+$, add the ARN of the Lambda function in add "Lambda Function Association" and set Event Type Origin Response. Next to that, I am setting the "Default TTL" to 0.

    But I cannot save the behavior due to some error:

    enter image description here

    Are we on the right way, or is the idea of this "Lambda Function Association" totally different?

  • katericata
    katericata about 7 years
    Thank you for all the clarifications. I changed the path pattern as you proposed and I provided multiple behaviors for all media file types. Regarding Lambda@Edge, that's totally unfair of Amazon to define it like this. They never mention the word "Edge" in the tooltip info dialog there. I do not think that Lambda@Edge will suit our needs, moreover that currently it only has Node, but we want to stay with Python as we have our crop/resize algorithms using PIL.
  • Michael - sqlbot
    Michael - sqlbot about 7 years
    I don't understand what you mean by calling it "unfair." You can use a Lambda function in any language with API Gateway behind CloudFront, but you can only internally "hook" and modify CloudFront request/response behavior with Lambda@Edge... and this (along with the fact that Node.js is the supported language) is quite clearly stated in both the Lambda documentation and the CloudFront documentation. AWS has not been deceptive, here.
  • katericata
    katericata about 7 years
    I agree with you that the official documentation states it clear, however, what I meant with my statement was that this is not stated on the Cache Behavior page (also not in the exception message). There you do not see the word Edge mentioned anywhere, and I found this a bit misleading. In our case, as we use only Lambda, this was surprising as Lambda@Edge is new service in the AWS family and it is currently not in our scope. :)
  • Yuchen
    Yuchen over 6 years
    Why is it so important to add the Cache-Control header to the resize images? It feels to me that by setting Default TTL to be 0 is already enough. No?
  • Yuchen
    Yuchen over 6 years
    To answer my own stupid question. I read another blog post about this sketchboard.io/blog/serverless-image-resize-with-amazon-lamb‌​da which explains why the max-age is needed. It will be used by CloudFront to determine how long to cache the object (after the first run, which is using default 0). Excellent question & answer here! @katericata.
  • formatkaka
    formatkaka almost 6 years
    I have 2 doubts : 1) If we add more media types , you will have to update coudfront config each time. 2) If we decide to change Cache-Control duration, then we will have to update it for every object in the Bucket. That kind of makes this solution non-scalable. Did you have these problems ? If yes, then what is the workaround ?
  • katericata
    katericata almost 6 years
    You are right with both assumptions you have. This solution perfectly fits our project as we do not change the media types that often, actually not at all. Regarding the cache control duration, we do not suffer from it as we have a invalidation mechanism in place when removing a media object from the application. To be honest I do not know how to be flexible at changing the Cache-Control duration.
  • formatkaka
    formatkaka almost 6 years
    Would this solution be better according to you - aws.amazon.com/blogs/networking-and-content-delivery/…