HealthCheck on ECS task without an ELB

21,120

Solution 1

The answer by raja and edite by Andrew are slightly off for ECS/FARGATE. It's without the brackets and without the quotes:

CMD-SHELL, curl -f http://localhost/ || exit 1

That is the correct format if entering Health check information inside Task Definitions of ECS.

VALID DOCUMENTATION https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html#container_definition_healthcheck

NOT VALID DOCUMENTATION for ECS/FARGATE https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_HealthCheck.html

Solution 2

The Given command is syntatically invalid.

it should be

[ "CMD-SHELL", "curl -f http://localhost/ || exit 1" ]
  • CMD or CMD-SHELL - to run the command with the container's default shell
  • curl -f http://localhost/ - actual command that need to be executed inside the container to validate the health check.
  • exit 1 - if the curl command fail then it will exit the shell

so you should change your command like below.

[ "CMD-SHELL", "echo hi || exit 1" ]

echo hi is the helath check command in my example you can execute any command instead of "echo hi " whihc should return exit status 0 if that runs successfully in your container.

Solution 3

If you use ecs-cli to deploy your fargate services, I found that you must upgrade to something that supports the healthcheck in the task definition. I also found that using CMD-SHELL is not required -- in fact, it breaks when you add it, wrapping your CMD-SHELL with another CMD-SHELL in the resulting json of the generated task definition (as seen in the aws console).

So what worked for me was upgrading from 1.4.0 to 1.7.0 of ecs-cli and then adding healthcheck in the ecs-params.yml file under the service:

task_definition:
  ecs_network_mode: awsvpc
  task_role_arn: arn:aws:iam::........
  task_execution_role: arn:aws:iam::........
  task_size:
    cpu_limit: 2048
    mem_limit: 4GB
  services:
    foo:
      healthcheck:
        command: ps cax | grep "[p]ython"
        interval: 30s
        timeout: 10s
        retries: 2
      essential: true

Solution 4

For example, if your "Port mappings" is 8000:8000 for "ECS EC2" or 8000 tcp for "ECS Fargate", the "Command" of "HEALTHCHECK" is:

CMD-SHELL, curl -f http://localhost:8000/ || exit 1

Don't forget 8000 after http://localhost: in this case.

Share:
21,120

Related videos on Youtube

chris_fitz
Author by

chris_fitz

Updated on September 18, 2022

Comments

  • chris_fitz
    chris_fitz over 1 year

    We have a Docker container(Spring Boot) that runs in an ECS cluster. We run it without Elastic Load Balancing.

    We want to update the service without downtime, so when the new task is up and healthy, the old task stops. We have been trying to add a health check on the task definition, however it refuses to work. I have tried these basic healthcheck commands.

    [ "CMD-SHELL","exit 0" ]
    [ "CMD-SHELL","exit 1" ]
    

    I would expect the former to result in a task with a HEALTHY health status, and the latter to fail the health checks.. In both cases, the new task starts fine, with an UNKNOWN health status.

    Has this anything to do with us not using an ELB? The documentation is not very good, and my Google searches have not returned anything useful.

  • xs2rashid
    xs2rashid over 5 years
    Just a suggestion if [ "CMD-SHELL", "curl -f localhost || exit 1" ] does not work then localhost ip can be used. [ "CMD-SHELL", "curl -f 127.0.0.1 || exit 1" ]
  • Ariful Haque
    Ariful Haque almost 4 years
    but what if the container don't have port 80 exposed? for example, queue or redis container?
  • nopuck4you
    nopuck4you almost 4 years
    @ArifulHaque Then use a port number in your address localhost:1200 for example. If there is no port exposed then you may need to expose one for health check verification.
  • Durja
    Durja about 3 years
    Would the double quotes cause any issue? Example: ["CMD-SHELL", "curl -s localhost:8080 | jq -e '.scheduler.status=="healthy' || exit 1"]
  • Juliano
    Juliano about 3 years
    This is not a invalid format. If you use the UI to configure the helath-check, then yes, the format is like this: "CMD-SHELL, curl -f localhost || exit 1" But if you use the raw JSON configuration it is [ "CMD-SHELL", "curl -f localhost || exit 1" ]
  • Admin
    Admin almost 2 years
    Why add || exit 1? The only function that can have is to translate higher non-zero exit codes (say, code 20) to 1, which shouldn't be needed.
  • Admin
    Admin almost 2 years
    Why add || exit 1? The only function that can have is to translate higher non-zero exit codes (say, code 20) to 1, which shouldn't be needed.
  • Admin
    Admin almost 2 years
    Why add || exit 1? The only function that can have is to translate higher non-zero exit codes (say, code 20) to 1, which shouldn't be needed.