HealthCheck on ECS task without an ELB
Solution 1
The answer by raja and edite by Andrew are slightly off for ECS/FARGATE. It's without the brackets and without the quotes:
CMD-SHELL, curl -f http://localhost/ || exit 1
That is the correct format if entering Health check information inside Task Definitions of ECS.
VALID DOCUMENTATION https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html#container_definition_healthcheck
NOT VALID DOCUMENTATION for ECS/FARGATE https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_HealthCheck.html
Solution 2
The Given command is syntatically invalid.
it should be
[ "CMD-SHELL", "curl -f http://localhost/ || exit 1" ]
-
CMD
orCMD-SHELL
- to run the command with the container's default shell -
curl -f http://localhost/
- actual command that need to be executed inside the container to validate the health check. -
exit 1
- if the curl command fail then it will exit the shell
so you should change your command like below.
[ "CMD-SHELL", "echo hi || exit 1" ]
echo hi is the helath check command in my example you can execute any command instead of "echo hi " whihc should return exit status 0 if that runs successfully in your container.
Solution 3
If you use ecs-cli to deploy your fargate services, I found that you must upgrade to something that supports the healthcheck in the task definition. I also found that using CMD-SHELL is not required -- in fact, it breaks when you add it, wrapping your CMD-SHELL with another CMD-SHELL in the resulting json of the generated task definition (as seen in the aws console).
So what worked for me was upgrading from 1.4.0 to 1.7.0 of ecs-cli and then adding healthcheck in the ecs-params.yml file under the service:
task_definition:
ecs_network_mode: awsvpc
task_role_arn: arn:aws:iam::........
task_execution_role: arn:aws:iam::........
task_size:
cpu_limit: 2048
mem_limit: 4GB
services:
foo:
healthcheck:
command: ps cax | grep "[p]ython"
interval: 30s
timeout: 10s
retries: 2
essential: true
Solution 4
For example, if your "Port mappings" is 8000:8000
for "ECS EC2" or 8000 tcp
for "ECS Fargate", the "Command" of "HEALTHCHECK" is:
CMD-SHELL, curl -f http://localhost:8000/ || exit 1
Don't forget 8000
after http://localhost:
in this case.
Related videos on Youtube
chris_fitz
Updated on September 18, 2022Comments
-
chris_fitz over 1 year
We have a Docker container(Spring Boot) that runs in an ECS cluster. We run it without Elastic Load Balancing.
We want to update the service without downtime, so when the new task is up and healthy, the old task stops. We have been trying to add a health check on the task definition, however it refuses to work. I have tried these basic healthcheck commands.
[ "CMD-SHELL","exit 0" ] [ "CMD-SHELL","exit 1" ]
I would expect the former to result in a task with a HEALTHY health status, and the latter to fail the health checks.. In both cases, the new task starts fine, with an UNKNOWN health status.
Has this anything to do with us not using an ELB? The documentation is not very good, and my Google searches have not returned anything useful.
-
xs2rashid over 5 years
-
Ariful Haque almost 4 yearsbut what if the container don't have port 80 exposed? for example, queue or redis container?
-
nopuck4you almost 4 years@ArifulHaque Then use a port number in your address localhost:1200 for example. If there is no port exposed then you may need to expose one for health check verification.
-
Durja about 3 yearsWould the double quotes cause any issue? Example: ["CMD-SHELL", "curl -s localhost:8080 | jq -e '.scheduler.status=="healthy' || exit 1"]
-
Juliano about 3 years
-
Admin almost 2 yearsWhy add
|| exit 1
? The only function that can have is to translate higher non-zero exit codes (say, code 20) to 1, which shouldn't be needed. -
Admin almost 2 yearsWhy add
|| exit 1
? The only function that can have is to translate higher non-zero exit codes (say, code 20) to 1, which shouldn't be needed. -
Admin almost 2 yearsWhy add
|| exit 1
? The only function that can have is to translate higher non-zero exit codes (say, code 20) to 1, which shouldn't be needed.