Cloudformation template for creating ECS service stuck in CREATE_IN_PROGRESS
Solution 1
Your AWS::ECS::Service
needs to register the full ARN for the TaskDefinition
(Source: See the answer from ChrisB@AWS on the AWS forums). The key thing is to set your TaskDefinition
with the full ARN, including revision. If you skip the revision (:123
in the example below), the latest revision is used, but CloudFormation still goes out to lunch with "CREATE_IN_PROGRESS" for about an hour before failing. Here's one way to do that:
"MyService": {
"Type": "AWS::ECS::Service",
"Properties": {
"Cluster": { "Ref": "ECSClusterArn" },
"DesiredCount": 1,
"LoadBalancers": [
{
"ContainerName": "myContainer",
"ContainerPort": "80",
"LoadBalancerName": "MyELBName"
}
],
"Role": { "Ref": "EcsElbServiceRoleArn" },
"TaskDefinition": {
"Fn::Join": ["", ["arn:aws:ecs:", { "Ref": "AWS::Region" },
":", { "Ref": "AWS::AccountId" },
":task-definition/my-task-definition-name:123"]]}
}
}
}
Here's a nifty way to grab the latest revision of MyTaskDefinition
via the aws cli and jq:
aws ecs list-task-definitions --family-prefix MyTaskDefinition | jq --raw-output .taskDefinitionArns[0][-1:]
Solution 2
I found another related scenario that will cause this and thought I'd put it here in case anyone else runs into it. If you define a TaskDefinition
with an Image that doesn't actually exist in its ContainerDefinition
and then you try to run that TaskDefinition
as a Service, you'll run into the same hang issue (or at least something that looks like the same issue).
NOTE: The example YAML chunks below were all in the same CloudFormation template
So as an example, I created this Repository
:
MyRepository:
Type: AWS::ECR::Repository
And then I created this Cluster
:
MyCluster:
Type: AWS::ECS::Cluster
And this TaskDefinition
(abridged):
MyECSTaskDefinition:
Type: AWS::ECS::TaskDefinition
Properties:
# ...
ContainerDefinitions:
# ...
Image: !Join ["", [!Ref "AWS::AccountId", ".dkr.ecr.", !Ref "AWS::Region", ".amazonaws.com/", !Ref MyRepository, ":1"]]
# ...
With those defined, I went to create a Service
like this:
MyECSServiceDefinition:
Type: AWS::ECS::Service
Properties:
Cluster: !Ref MyCluster
DesiredCount: 2
PlacementStrategies:
- Type: spread
Field: attribute:ecs.availability-zone
TaskDefinition: !Ref MyECSTaskDefinition
Which all seemed sensible to me, but it turns out there two issues with this as written/deployed that caused it to hang.
- The
DesiredCount
is set to 2 which means it will actually try to spin up the service and run it, not just define it. If I setDesiredCount
to 0, this works just fine. - The
Image
defined inMyECSTaskDefinition
doesn't exist yet. I made the repository as part of this template, but I didn't actually push anything to it. So when theMyECSServiceDefinition
tried to spin up theDesiredCount
of 2 instances, it hung because the image wasn't actually available in the repository (because the repository literally just got created in the same template).
So, for now, the solution is to create the CloudFormation stack with a DesiredCount
of 0 for the Service
, upload the appropriate Image
to the repository and then update the CloudFormation stack to scale up the service. Or alternately, have a separate template that sets up core infrastructure like the repository, upload builds to that and then have a separate template to run that sets up the Services
themselves.
Hope that helps anyone having this issue!
Solution 3
No need to register the full ARN for the TaskDefinition, because when the logical ID of this resource is provided to the Ref intrinsic function, Ref returns the Amazon Resource Name (ARN).
In the following sample, the Ref function returns the ARN of the MyTaskDefinition task, such as arn:aws:ecs:us-west-2:123456789012:task/1abf0f6d-a411-4033-b8eb-a4eed3ad252a.
{ "Ref": "MyTaskDefinition" }
Source http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-ecs-taskdefinition.html
Solution 4
I think I had similar issue. Try looking at the "DesiredCount" property in the Service template. I think CloudFormation will indicate that the creation/update is still in progress until the Service reach that number of "DesiredCount" in your cluster.
Solution 5
To add another data point, I've seen AWS::ECS::Service
get permanently stuck in CREATE_IN_PROGRESS
if the ECR docker image is not both a) available from the ECR repo and b) pass the health check.
I've tried multiple times to boot an AWS::ECS::Service
with a valid-image-hash-but-failing-health-check container, then fix the image and do the various "set desired count to zero", "set it back", etc., and nothing AFAICT gets it unstuck.
I eventually have to delete the stack, and start over with an image that immediately passes the health check. Then it works fine.
Super flakey.
Related videos on Youtube
Anvar
Updated on July 09, 2022Comments
-
Anvar almost 2 years
I am creating an AWS ECS service using Cloudformation.
Everything seems to complete successfully, I can see the instance being attached to the load-balancer, the load-balancer is declaring the instance as being healthy, and if I hit the load-balancer I am successfully taken to my running container.
Looking at the ECS control panel, I can see that the service has stabilised, and that everything is looking OK. I can also see that the container is stable, and is not being terminated/re-created.
However, the Cloudformation template never completes, it is stuck in
CREATE_IN_PROGRESS
until about 30-60 minutes later, when it rolls back claiming that the service did not stabilise. Looking at CloudTrail, I can see a number ofRegisterInstancesWithLoadBalancer
instantiated byecs-service-scheduler
, all with the same parameters i.e. same instance id and load-balancer. I am using standard IAM roles and permissions for ECS, so it should not be a permissions issue.Anyone had a similar issue?
-
Anvar almost 9 yearsThere seems to be other people having the same issue: forums.aws.amazon.com/thread.jspa?threadID=190250
-
-
Anvar over 8 yearsThe service is reporting as stabilised in the ECS UI, and both the desired count and the running count is set to 1. Hitting the container works as expected as well, and the ELB is reporting the instance correctly. It is like the notification just is not getting through to Cloudformation
-
erik258 over 7 yearsworks great ... as long as the task definition is in the same stack. Otherwise, the Fn::ImportValue is a nice way to do this across stacks. docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/…
-
Jeremie almost 6 yearsmy command to retrieve the latest revision:
aws ecs list-task-definitions --family-prefix dev-device-settings --sort DESC | jq --raw-output .taskDefinitionArns[0] | tr ':' '\n' | tail -1
-
domdambrogia over 5 yearsA much simpler way would be to use the
!Ref
function to return the ARN of yourAWS::ECS::TaskDefinition
. Building the ARN like that is very overly complicated. Look at the return values on this page: docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/… -
Gary Holiday about 5 yearsAlso if the Task Definition doesn't have the appropriate
ExecutionRole
permissions, the service will hang in theCREATING
state. I had this happen when I tried creating aLogConfiguration
. -
Dennis almost 5 yearsAlso happens if image tag doesn't exist in the repository, e.g. perhaps a typo
-
Moneer81 almost 5 years"Hope that helps anyone having this issue!" It indeed did! Thank you so much!
-
GuoJunjun almost 4 yearsI have everything in one stack, set
DesiredCount
to 0 fixed ECS::ServiceCREATE_IN_PROGRESS
take long time then build feil, thanks :) -
Steve Chambers almost 3 yearsAn alternative if you just want to have one script that doesn't have to be updated is to take advantage of the long time CloudFormation hangs for (it is actually retrying and retrying to find the image when it hangs). This gives ample time to manually upload the image to ECR and then CloudFormation will find it pretty much as soon as it has been uploaded.