How do you set a default root object for subdirectories for a statically hosted website on Cloudfront?

37,016

Solution 1

UPDATE: It looks like I was incorrect! See JBaczuk's answer, which should be the accepted answer on this thread.

Unfortunately, the answer to both your questions is no.

1. Is it possible to specify a default root object for all subdirectories for a statically hosted website on Cloudfront?

No. As stated in the AWS CloudFront docs...

... If you define a default root object, an end-user request for a subdirectory of your distribution does not return the default root object. For example, suppose index.html is your default root object and that CloudFront receives an end-user request for the install directory under your CloudFront distribution:

http://d111111abcdef8.cloudfront.net/install/

CloudFront will not return the default root object even if a copy of index.html appears in the install directory.

...

The behavior of CloudFront default root objects is different from the behavior of Amazon S3 index documents. When you configure an Amazon S3 bucket as a website and specify the index document, Amazon S3 returns the index document even if a user requests a subdirectory in the bucket. (A copy of the index document must appear in every subdirectory.)

2. Is it possible to setup an origin access identity for content served from Cloudfront where the origin is an S3 website endpoint and not an S3 bucket?

Not directly. Your options for origins with CloudFront are S3 buckets or your own server.

It's that second option that does open up some interesting possibilities, though. This probably defeats the purpose of what you're trying to do, but you could setup your own server whose sole job is to be a CloudFront origin server.

When a request comes in for http://d111111abcdef8.cloudfront.net/install/, CloudFront will forward this request to your origin server, asking for /install. You can configure your origin server however you want, including to serve index.html in this case.

Or you could write a little web app that just takes this call and gets it directly from S3 anyway.

But I realize that setting up your own server and worrying about scaling it may defeat the purpose of what you're trying to do in the first place.

Solution 2

There IS a way to do this. Instead of pointing it to your bucket by selecting it in the dropdown (www.example.com.s3.amazonaws.com), point it to the static domain of your bucket (eg. www.example.com.s3-website-us-west-2.amazonaws.com):

enter image description here

Thanks to This AWS Forum thread

Solution 3

Activating S3 hosting means you have to open the bucket to the world. In my case, I needed to keep the bucket private and use the origin access identity functionality to restrict access to Cloudfront only. Like @Juissi suggested, a Lambda function can fix the redirects:

'use strict';

/**
 * Redirects URLs to default document. Examples:
 *
 * /blog            -> /blog/index.html
 * /blog/july/      -> /blog/july/index.html
 * /blog/header.png -> /blog/header.png
 *
 */

let defaultDocument = 'index.html';

exports.handler = (event, context, callback) => {
    const request = event.Records[0].cf.request;

    if(request.uri != "/") {
        let paths = request.uri.split('/');
        let lastPath = paths[paths.length - 1];
        let isFile = lastPath.split('.').length > 1;

        if(!isFile) {
            if(lastPath != "") {
                request.uri += "/";
            }

            request.uri += defaultDocument;
        }

        console.log(request.uri);
    }

    callback(null, request);
};

After you publish your function, go to your cloudfront distribution in the AWS console. Go to Behaviors, then chooseOrigin Request under Lambda Function Associations, and finally paste the ARN to your new function.

Solution 4

(New Feature May 2021) CloudFront Function

Create a simple JavaScript function below

function handler(event) {
    var request = event.request;
    var uri = request.uri;
    
    // Check whether the URI is missing a file name.
    if (uri.endsWith('/')) {
        request.uri += 'index.html';
    } 
    // Check whether the URI is missing a file extension.
    else if (!uri.includes('.')) {
        request.uri += '/index.html';
    }

    return request;
}

Read here for more info

Solution 5

There is an "official" guide published on AWS blog that recommends setting up a Lambda@Edge function triggered by your CloudFront distribution:

Of course, it is a bad user experience to expect users to always type index.html at the end of every URL (or even know that it should be there). Until now, there has not been an easy way to provide these simpler URLs (equivalent to the DirectoryIndex Directive in an Apache Web Server configuration) to users through CloudFront. Not if you still want to be able to restrict access to the S3 origin using an OAI. However, with the release of Lambda@Edge, you can use a JavaScript function running on the CloudFront edge nodes to look for these patterns and request the appropriate object key from the S3 origin.

Solution

In this example, you use the compute power at the CloudFront edge to inspect the request as it’s coming in from the client. Then re-write the request so that CloudFront requests a default index object (index.html in this case) for any request URI that ends in ‘/’.

When a request is made against a web server, the client specifies the object to obtain in the request. You can use this URI and apply a regular expression to it so that these URIs get resolved to a default index object before CloudFront requests the object from the origin. Use the following code:

'use strict';
exports.handler = (event, context, callback) => {

    // Extract the request from the CloudFront event that is sent to Lambda@Edge
    var request = event.Records[0].cf.request;

    // Extract the URI from the request
    var olduri = request.uri;

    // Match any '/' that occurs at the end of a URI. Replace it with a default index
    var newuri = olduri.replace(/\/$/, '\/index.html');

    // Log the URI as received by CloudFront and the new URI to be used to fetch from origin
    console.log("Old URI: " + olduri);
    console.log("New URI: " + newuri);

    // Replace the received URI with the URI that includes the index page
    request.uri = newuri;

    // Return to CloudFront
    return callback(null, request);

};

Follow the guide linked above to see all steps required to set this up, including S3 bucket, CloudFront distribution and Lambda@Edge function creation.

Share:
37,016
wyer33
Author by

wyer33

Updated on October 07, 2021

Comments

  • wyer33
    wyer33 over 2 years

    How do you set a default root object for subdirectories on a statically hosted website on Cloudfront? Specifically, I'd like www.example.com/subdir/index.html to be served whenever the user asks for www.example.com/subdir. Note, this is for delivering a static website held in an S3 bucket. In addition, I would like to use an origin access identity to restrict access to the S3 bucket to only Cloudfront.

    Now, I am aware that Cloudfront works differently than S3 and amazon states specifically:

    The behavior of CloudFront default root objects is different from the behavior of Amazon S3 index documents. When you configure an Amazon S3 bucket as a website and specify the index document, Amazon S3 returns the index document even if a user requests a subdirectory in the bucket. (A copy of the index document must appear in every subdirectory.) For more information about configuring Amazon S3 buckets as websites and about index documents, see the Hosting Websites on Amazon S3 chapter in the Amazon Simple Storage Service Developer Guide.

    As such, even though Cloudfront allows us to specify a default root object, this only works for www.example.com and not for www.example.com/subdir. In order to get around this difficulty, we can change the origin domain name to point to the website endpoint given by S3. This works great and allows the root objects to be specified uniformly. Unfortunately, this doesn't appear to be compatable with origin access identities. Specifically, the above links states:

    Change to edit mode:

    Web distributions – Click the Origins tab, click the origin that you want to edit, and click Edit. You can only create an origin access identity for origins for which Origin Type is S3 Origin.

    Basically, in order to set the correct default root object, we use the S3 website endpoint and not the website bucket itself. This is not compatible with using origin access identity. As such, my questions boils down to either

    1. Is it possible to specify a default root object for all subdirectories for a statically hosted website on Cloudfront?

    2. Is it possible to setup an origin access identity for content served from Cloudfront where the origin is an S3 website endpoint and not an S3 bucket?

  • jwerre
    jwerre almost 8 years
    @schickling is right. This should be the accepted answer.
  • fideloper
    fideloper over 7 years
    Anyone know if this charges differently when having an s3 origin vs a web origin?
  • webjay
    webjay over 7 years
    the problem with this is if you have an origin path other than / and the user forgets the last / like /blog vs /blog/ then S3 will redirect to [origin path]/blog/
  • Manjit Kumar
    Manjit Kumar over 7 years
    Does this work fine if I want to serve my whole website and files over HTTPS only?
  • Anthony Kong
    Anthony Kong almost 7 years
    Does it mean that the S3 has to be enabled as a web server?
  • Ron
    Ron over 6 years
    Do you need to have a /install folder inside this custom S3 bucket? or will the files be served from the root? How do you specifically set up the subfolder path?
  • Ben Elgar
    Ben Elgar about 6 years
    It works fine if you want to serve over HTTPS only—just make sure to set up Viewer Protocol Policy to redirect HTTP to HTTPS.
  • Elad Amsalem
    Elad Amsalem almost 6 years
    BTW s3 will remove the query params for origin paths other than root, if someone has a solution for that this will be great :)
  • marcanuy
    marcanuy over 5 years
    There is a ready to deploy lambda function similar to that one: serverlessrepo.aws.amazon.com/applications/…
  • Peter Lada
    Peter Lada over 5 years
    @JBaczuk: I think www.example.com.s3-website-us-west-2.amazonaws.com should be www.example.com.s3-website.us-west-2.amazonaws.com, notice the dash to dot change after the s3-website.
  • Jason Kleban
    Jason Kleban over 5 years
    Yup, awesome! Also, do not specify the CloudFront's 'Default Root Object'. That will only make the root case work with the intuitive setup, but it will interfere with this awesome trick from working for all folders.
  • Henrik Aasted Sørensen
    Henrik Aasted Sørensen about 5 years
    More details regarding this approach here: aws.amazon.com/blogs/compute/…
  • Hayden
    Hayden about 5 years
    The one problem I have with this is that do get this to work means you would have two (2) URLs capable of accessing your web site on s3. Your cloud front URL and your s3 url (bucket_name.s3-website-us-east-1.amazonaws.com)
  • VitalyB
    VitalyB about 5 years
    Note: This doesn't work if you want to keep your S3 private and use "Restricted access"
  • jacobfogg
    jacobfogg about 5 years
    S3 converst subdir/ to subdir; when you try to upload the HTML. Also, when you try to access example.com/subdir/ it fails, and if you try to access example.com/subdir; it downloads the HTML file instead of rendering it.
  • icyitscold
    icyitscold about 5 years
    OP explicitly stated this approach wont work for him: "In order to get around this difficulty, we can change the origin domain name to point to the website endpoint given by S3. This works great and allows the root objects to be specified uniformly. Unfortunately, this doesn't appear to be compatable with origin access identities". AWS themselves seem to be recommending lamda@edge for this - aws.amazon.com/blogs/compute/…
  • GuiTeK
    GuiTeK almost 5 years
    If you use Terraform with the AWS provider, the attribute you want to use in the CloudFront settings is: website_endpoint (C.F. terraform.io/docs/providers/aws/d/s3_bucket.html). WARNING: you will also need to update the origin settings in your CloudFront distribution (C.F. terraform.io/docs/providers/aws/r/…): indeed, by default if nothing is specified it is an S3 Origin, but you now need a Custom Origin.
  • rocketspacer
    rocketspacer almost 5 years
    This is not compatible Cloud Front - Origin Access Identity. You won't be able to restrict access to your S3 bucket this way.
  • Hey Teacher
    Hey Teacher over 4 years
    this is a workaround. The correct solutions which protect your s3 bucket are stackoverflow.com/a/49742870/1123065 or stackoverflow.com/a/52615662/1123065
  • Renato Gama
    Renato Gama over 4 years
    The problem here is that this function needs to be deployed to us-east-1, so if you have a company under strict GDPR regulation that doesn't allow a single bit outside Germany then this is not for you.
  • mruanova
    mruanova over 4 years
    unfortunately Lambda@Edge only works on us-east-1 region, source: github.com/awslabs/serverless-application-model/issues/635
  • Ian Hunter
    Ian Hunter over 3 years
    you saved me from having to write some goofy lambda function to do something that every other static web server in the world has been doing for years. thank you.
  • Jeremie
    Jeremie over 3 years
    This is actually one of the best approaches I found, as it works on all S3 distributions, without specific configuration, and does need to use lambda@edge (that generates extra redirect and slow down serving the pages while the lambda is executed). I disagree with @jacobfogg comment. It works perfectly well, when use programatically. I did a small lambda function triggered by a S3 event. See my reply below.
  • runwuf
    runwuf about 3 years
    Thanks Johan and @Jeremie this approach works! you can do this with awscli as well.
  • runwuf
    runwuf about 3 years
    @rocketspacer there is a solution using aws s3api copy-object which works in S3 bucket with restricted access, please see my answer below.
  • Aidin
    Aidin about 3 years
    For the detailed comparison of this solution vs the others, as well as the challenges here, see my answer below: stackoverflow.com/a/66977399
  • SBKDeveloper
    SBKDeveloper about 3 years
    the Lambda@Edge function is only deployed on us-east-1, the function is replicated and runs at Edge locations worldwide and where it runs depends on the closest Edge location to the user.
  • jacobfogg
    jacobfogg almost 3 years
    Ahh, I missed the connection that this will only work programatically. I had tested this solution via the web interface which had the behavior I indicated. I'll keep this in my back pocket next time I encounter this kind of issue.
  • Seba Illingworth
    Seba Illingworth over 2 years
    This works perfectly, and much cheaper than using Lambda. Here's an example of how to setup CF functions inside Serverless Framework deploy script (just replace the function code with the code/link in above answer).
  • shearn89
    shearn89 over 2 years
    This worked flawlessly - simply created a Function in the relevant section (left menu) of CF, then associated it with the default behaviour -> Viewer Request of my distribution. Hugo site now working as intended!
  • r123
    r123 over 2 years
    I have tried this solution but no joy. Can you see what I am doing wrong?stackoverflow.com/questions/70717168/…