AWS Fargate service: scale to zero?

amazon-web-services autoscaling amazon-alb aws-fargate

12,544

Solution 1

I’m not sure how exactly it would work. When there are no healthy ALB targets the ALB returns 503 error, hence your visitors would see an error page instead of your website. That may trigger a Fargate container start but that often takes tens of seconds, sometimes even over a minute. By the time your container is up your visitor is probably gone.

If you want a truly serverless website with zero idle costs you’ll have to implement it using API.

Put your frontend files (HTML, CSS, JS) in S3
Load your dynamic content through API
Implement the dynamic functionality in Lambda functions
Use API gateway to call the Lambdas
The DB can be Aurora Serverless or DynamoDB On-Demand

This architecture costs nothing when idle and provides instant response to your visitors.

Update: if you still want to scale down the Fargate Service to 0 Tasks you can certainly do it through setting the Service's DesiredCount to 0. That can be done e.g. through aws-cli:

~ $ aws ecs update-service ... --service xyz --desired-count 0

If you want to do this in Dev I suggest you run this UpdateService either manually, or from a cron-job, or from a scheduled Lambda function. Either way you can set the task to 0 at night and back to 1 the next working day. That'll be easier than relying on AutoScaling which may not be that reliable for very low traffic.

Hope that helps :)

Solution 2

If re-writing your app to fit the above response it's not an option or costly, you could look into GCP CloudRun

CloudRun it's serverless containers developed by GCP. You can pack your website in a container and then CloudRun only bills you per CPU usage during requests and boot-up. It even has a really good free tier that will make running your app at minimum costs.

So you could combine Amazon Aurora with GCP CloudRun for minimum costs and no need to rewrite your app.

12,544

computmaxer

I'm Max, a 23 year old Software Engineer at Workiva. I'm passionate about technology - specifically developing rich web applications in Python, JavaScript, and Dart.

Updated on September 18, 2022

Comments

computmaxer over 1 year

I've recently migrated a small web application to AWS using Fargate and Aurora Serverless. The application doesn't get much traffic so my goal is to save cost while no one is using it. Aurora Serverless seems to do this for me on the DB side automatically.

However, I'm struggling to find any resources on how to scale a Fargate service to zero.

There is an ALB in front of it and I know ALB request count can be used in scaling... so ideally when there is an average of 0 requests over a period of say 10 minutes, the service would scale down to zero tasks. Then when a request comes in, it would trigger a scale-up of one task for the service.
- Tim over 5 years
  
  I suspect you might be better off with Lambda if you really need to do this. Scaling to zero means you have to boot your container / OS / application, which means any request could time out before it's serviced.
MLu over 5 years

@Tim since the static content is ready on S3 the visitor will only wait for the dynamic bits from API. That’s a much better customer experience than 503 error ;)
Tim over 5 years

I was just referring to the word instant, the way it's worded can mean zero latency, which is obviously not possible. The serverless pattern you described is a good one, I hinted towards it with "Lambda" in my somewhat lazy comment above :)
computmaxer over 5 years

I agree - this is an ideal setup. I am going to look more into moving this to Lambda; definitely seems like the best solution available for a low-use application where we're trying to save on cost. --- However, for posterity, it'd still be nice to know if it is possible to auto-scale to zero in Fargate. Say for example you have a staging environment that is rarely used, and when it is used, it's just developers. It'd be great to have it auto-scale to zero when no one is using it, and the developers can deal with 503s for a bit while it starts up.
MLu over 5 years

@computmaxer Added info about ECS update-service --desired-count 0 API call. That answers your question.
computmaxer over 5 years

Thanks. So it sounds like it's not really possible to do automatically out-of-the box with Fargate. If this changes in the future via new features/functionality I will update this question with a new answer.
Tom almost 5 years

But isn't CloudRun functionally equivalent to AWS Lambda for the purposes of this question? Being open-source addresses the issues of flexibility and vendor lock-in but it still requires that the app have its compute requirements be compatible with FaaS.
Jimmy almost 5 years

@Tom No, Cloud Functions is GCP's equivalent of AWS Lambda. CloudRun is Serverless Containers. If you have a wordpress inside a docker container, you can upload that straight into CloudRun (in theory) without any modifications.
Tom almost 5 years

But Cloud Run is designed for transactions and has time limits to enforce that (10 or 15m I believe). That's why I say it is equivelent to lambda for the purposes of this question - you can't just package up an app into a container and put it on cloudrun unless you make the app conform to that model.
Jimmy almost 5 years

@Tom I agree. However it's far easier (In my opinion) to adhere to Cloud Run restrictions compared to the refactor of your code to adhere to Lambda/Cloud Functions. Most simple web apps (dockerized) should be plug & play in CR, while rewriting Joomla or Wordpress into Lambda it's just imposible.
Yahya Uddin over 4 years

Amazon also now supports dockerised lambda applications. docs.aws.amazon.com/lambda/latest/dg/runtimes-custom.html
Julian H over 3 years

Yes Lambda / Docker is a thing now but for those in the machine learning space, you are still out of luck if you need a GPU (huge boost for both training and inference).
Bobík over 2 years

Dockerzed Lambda is not the same - it requires proprietary API. Cloud Run supports any dockerized web server (no websockets). You can literally use any image with web app from Docker Hub and it will work (without persistent volumes).
Admin almost 2 years

BTW Cloud Run timeout can be increased to up to 60 min. If the aim of a serverless workload is to scale to zero after processing a request, 60 min is quite a decent limit. Another benefit is that the Cloud Run workload can serve multiple concurrent requests (80 by default, can be increased to hundreds).