AWS Fargate ResourceInitializationError: unable to pull secrets or registry auth: pull command failed: : signal: killed

35,076

Solution 1

One of the potential problems for ResourceInitializationError: unable to pull secrets or registry auth: pull command failed: : signal: killed is disabled Auto-assign public IP. After I enabled it (recreating service from the scrath), task run properly without issues.

enter image description here

Solution 2

I was facing the same issue. But in my case, I was triggering the Fargate Container from the Lambda function using the RunTask operation. So In the RunTask operation, I was not passing the below parameter:

assignPublicIp: ENABLED

After adding this, Container was triggering without any issues.

Solution 3

Edited answer based of feedback from @nathan and @howard-swope

checklist:

  • The VPC has "DNS hostnames" and "DNS resolution" enabled
  • "Task execution role" has access to ECR. e.g. has role AmazonECSTaskExecutionRolePolicy

if task is running on a PUBLIC subnet:

  • The subnets have access to internet. i.e. assigning internet gateway to the subnets.

  • Enable "assign public IP" when creating the task.

if task is running on a PRIVATE subnet:

  • The subnets have access to internet. i.e. assigning NAT gateway to the subnets. ... NAT gateway resides on a public subnet

Solution 4

For those unlucky souls, there is one more thing to check.

I already had an internet gateway in my VPC, DNS was enabled for that VPC, all containers were getting public IPs and the execution role already had access to ECR. But even so, I was still getting the same error.

Turns out the problem was about Routing Table. The routing table of my VPC didn't include a route for directing outbound traffic to internet gateway so my subnet had no internet access.

Adding the second line to the table that routes 0.0.0.0/0 traffic to internet gateway solved the issue.

enter image description here

Solution 5

For AWS Batch using Fargate, this error was triggered by the 'Assign public IP' setting being disabled.

This setting is configurable during Job Definition step. However, it is not configurable in the UI after the Job Definition had already been created.

enter image description here

Share:
35,076

Related videos on Youtube

user2800708
Author by

user2800708

Updated on February 14, 2022

Comments

  • user2800708
    user2800708 about 2 years

    Slightly tearing my hair out with this one... I am trying to run a Docker image on Fargate in a VPC in a Public subnet. When I run this as a Task I get:

    ResourceInitializationError: unable to pull secrets or registry auth: pull
    command failed: : signal: killed
    

    If I run the Task in a Private subnet, through a NAT, it works. It also works if I run it in a Public subnet of the default VPC.

    I have checked through the advice here:

    Aws ecs fargate ResourceInitializationError: unable to pull secrets or registry auth

    In particular, I have security groups set up to allow all traffic. Also Network ACL set up to allow all traffic. I have even been quite liberal with the IAM permissions, in order to try and eliminate that as a possibility:

    The task execution role has:

       {
            "Action": [
                "kms:*",
                "secretsmanager:*",
                "ssm:*",
                "s3:*",
                "ecr:*",
                "ecs:*",
                "ec2:*"
            ],
            "Resource": "*",
            "Effect": "Allow"
        }
    

    With trust relationship to allow ecs-tasks to assume this role:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "Service": "ecs-tasks.amazonaws.com"
          },
          "Action": "sts:AssumeRole"
        }
      ]
    }
    

    The security group is:

    sg-093e79ca793d923ab All traffic All traffic All 0.0.0.0/0
    

    And the Network ACL is:

    Inbound
    Rule number Type Protocol Port range Source Allow/Deny
    100 All traffic All All 0.0.0.0/0    Allow
    *   All traffic All All 0.0.0.0/0    Deny
    
    Outbound
    Rule number Type Protocol Port range Destination Allow/Deny
    100 All traffic All All 0.0.0.0/0    Allow
    *   All traffic All All 0.0.0.0/0    Deny
    

    I set up flow logs on the subnet, and I can see that traffic is Accept Ok in both directions.

    I do not have any Interface Endpoints set up to reach AWS services without going through the Internet Gateway.

    I also have Public IP address assigned to the Fargate instance upon creation.

    This should work, since the Public subnet should have access to all needed services through the Internet Gateway. It also works in the default VPC or a Private subnet.

    Can anyone suggest what else I should check to debug this?

  • Chez
    Chez almost 3 years
    Hi valdem, where should I enable the Auto-assign public IP?
  • valdem
    valdem almost 3 years
    Hi Chez. I updated the answer, adding the screenshot where you can configure Auto-assign public IP
  • TheRennen
    TheRennen almost 3 years
    But what if you don't want the task to have a public IP?
  • santamanno
    santamanno almost 3 years
    This solves the issue, but if you want fargate in a private subnet then it still does not reach ERC (in my case not even with DNS on and Private Link Endpoint to ECR)
  • santamanno
    santamanno almost 3 years
    Did you have to add VPC endpoints as well for each service the container uses?
  • santamanno
    santamanno almost 3 years
    Sorry, does not work. I have a private subnet with NAT (and tried without) and all the endpoint added to the VPC, still unreachable...
  • Ryan Walls
    Ryan Walls almost 3 years
    This solved my issue for a public subnet. Private subnet is a different beast.
  • FredG
    FredG almost 3 years
    If you have a NAT gateway for egress traffic on a private network (without Internet gateway/public IP on instances/tasks), it's not even necessary to use VPC endpoints. I'd recommand you launch a EC2 instance on your subnet, ssh to it, and test your connectivity there. AWS network setup could be quite frustrating to get right.
  • morras
    morras almost 3 years
    For private subnets you will likely need to have a NAT gateway. That will also allow you to have tasks without a public IP. Note that NAT gateways are pretty expensive. You are often better off with a public IP and a locked down security group.
  • Irtiza
    Irtiza over 2 years
    @santamanno Yes, you need to create a VPC endpoint for each service.
  • santamanno
    santamanno over 2 years
    Yes, thank you. It must be either on a public subnet, a private subnet with NAT or private VPC endpoints to the required services. In any case, as the OP points out, DNS resolution must be enabled in my experience.
  • Outpox
    Outpox over 2 years
    This is helpful, the main answers did not specify where to enable this parameter and I did not face the "Create Service" interface because I'm creating my job definitions with CDK.
  • Nathan
    Nathan over 2 years
    This is a good checklist for people running a container which will fire tasks in a private subnet with a VPC routing table configured to route outbound traffic via a NAT gateway which resides in a public subnet.
  • Vitaly Karasik DevOps
    Vitaly Karasik DevOps over 2 years
    @valdem - many thanks, you save my day! (BTW, this issue seems really weird - AFAIK, we should be able to run instances w/o public IP into public subnet)
  • Kicsi
    Kicsi over 2 years
    Without a public IP, your instance can't communicate with the internet (or in this case the ECR registry, which is outside of the vpc), because the receiving end does not know where the send the packets back. In case of private subnet, the NAT gateway has the public IP (and it can route the packet back to the original instance, because the NAT is inside the subnet).
  • Joshua Marble
    Joshua Marble over 2 years
    This finally made it work!! Thank you!!!!!
  • Beraki
    Beraki about 2 years
    This finally worked!!
  • Howard Swope
    Howard Swope about 2 years
    @Nathan I am not sure this is accurate. If you are speaking of ECS tasks I don't believe they are fired from a container, it is the other way around. The task pulls and launches the container. And if your containers are running in a private subnet they should not of have public IP's. That is the point of the private subnet, is it not?
  • Nathan
    Nathan about 2 years
    You are correct about the tasks pulling the containers. The ECR is not located in the private subnet and something needs to handle that. For tasks that run in a private subnet, either a NAT gateway handles the packet resolution OR a public IP address needs to be assigned.
  • Koroslak
    Koroslak about 2 years
    @HowardSwope you're correct. My original post assumes that the task is in a PUBLIC subnet. I'll edit my answer. THANKS FOR THE FEEDBACK! :)
  • Moemars
    Moemars about 2 years
    For boto3 it took me a bit to find it. For the JobDefinition it is under ContainerProperties > NetworkConfiguration > AssignPublicIp: ENABLED. docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/…
  • Raj
    Raj almost 2 years
    Thank you. You saved my day!!! While cleaning up, I may have accidentally got this association removed and was wondering what went wrong.
  • isudarsan
    isudarsan almost 2 years
    This is helpful, this can be done while creating a new revision of existing "Job definition".