Kubernetes reports "pod didn't trigger scale-up (it wouldn't fit if a new node is added)" even though it would?
Solution 1
It's not the hardware requests (confusingly the error message made me assume this) but it's due to my pod affinity rule defined:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: appType
operator: NotIn
values:
- example-api
topologyKey: kubernetes.io/hostname
Solution 2
If you are using K8s from a cloud provider like GKE/EKS, maybe it is worth taking a look at the cloud provider resource quota!
Even everything looks reasonable, K8s gave the same error "pod didn't trigger scale-up"! And that was because the CPU quota was exhausted! (K8s has nothing to do with that limitation, so the error is not clear from K8s side).
Chris Stryczynski
Software dev(op). Independent consultant available for hire! Checkout my GitChapter project on github!
Updated on July 19, 2022Comments
-
Chris Stryczynski almost 2 years
I don't understand why I'm receiving this error. A new node should definitely be able to accommodate the pod. As I'm only requesting 768Mi of memory and 450m of CPU, and the instance group that would be autoscaled is of type
n1-highcpu-2
- 2 vCPU, 1.8GB.How could I diagnose this further?
kubectl describe pod:
Name: initial-projectinitialabcrad-697b74b449-848bl Namespace: production Node: <none> Labels: app=initial-projectinitialabcrad appType=abcrad-api pod-template-hash=2536306005 Annotations: <none> Status: Pending IP: Controlled By: ReplicaSet/initial-projectinitialabcrad-697b74b449 Containers: app: Image: gcr.io/example-project-abcsub/projectinitial-abcrad-app:production_6b0b3ddabc68d031e9f7874a6ea49ee9902207bc Port: <none> Host Port: <none> Limits: cpu: 1 memory: 1Gi Requests: cpu: 250m memory: 512Mi Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-srv8k (ro) nginx: Image: gcr.io/example-project-abcsub/projectinitial-abcrad-nginx:production_6b0b3ddabc68d031e9f7874a6ea49ee9902207bc Port: 80/TCP Host Port: 0/TCP Limits: cpu: 1 memory: 1Gi Requests: cpu: 100m memory: 128Mi Readiness: http-get http://:80/api/v1/ping delay=5s timeout=10s period=10s #success=1 #failure=3 Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-srv8k (ro) cloudsql-proxy: Image: gcr.io/cloudsql-docker/gce-proxy:1.11 Port: 3306/TCP Host Port: 0/TCP Command: /cloud_sql_proxy -instances=example-project-abcsub:us-central1:abcfn-staging=tcp:0.0.0.0:3306 -credential_file=/secrets/cloudsql/credentials.json Limits: cpu: 1 memory: 1Gi Requests: cpu: 100m memory: 128Mi Mounts: /secrets/cloudsql from cloudsql-instance-credentials (ro) /var/run/secrets/kubernetes.io/serviceaccount from default-token-srv8k (ro) Conditions: Type Status PodScheduled False Volumes: cloudsql-instance-credentials: Type: Secret (a volume populated by a Secret) SecretName: cloudsql-instance-credentials Optional: false default-token-srv8k: Type: Secret (a volume populated by a Secret) SecretName: default-token-srv8k Optional: false QoS Class: Burstable Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal NotTriggerScaleUp 4m (x29706 over 3d) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added) Warning FailedScheduling 4m (x18965 over 3d) default-scheduler 0/4 nodes are available: 3 Insufficient memory, 4 Insufficient cpu.
-
kentor over 4 yearsThat should still triger the autoscale or am I wrong? If the cluster autoscaler would add a new node, the pod would fit on it.
-
juan garcia over 4 yearsSo basically you needed to change the
requiredDuringSchedulingIgnoredDuringExecution
topreferredDuringSchedulingIgnoredDuringExecution
? -
Chris Stryczynski over 4 yearsNo, I think it was due to the fact that a new node wouldn't be "compatible" with the pod due to the podAfffinity defined.
-
NIrav Modi almost 3 years@chris How do you have solved it? Can you explain little bit in detail?
-
Chris Stryczynski almost 3 years@NIravModi if something is not clear, you should state exactly what isn't clear to you.
-
JuanXarg over 2 yearsHi @ChrisStryczynski. The problem is that you stated what the problem was, not what was the actual solution. The answer would be much better if you explain what the original problem was and what you actually did to fix it. Thanks!
-
Chris Stryczynski over 2 yearsThis is a solution to my specific question. If you need a more thorough explanation of this, it'd probably be best to ask a new question as opposed to me rewriting everything to be more detailed and broad.