Kubernetes reports "pod didn't trigger scale-up (it wouldn't fit if a new node is added)" even though it would?

kubernetes google-cloud-platform google-kubernetes-engine

15,338

Solution 1

It's not the hardware requests (confusingly the error message made me assume this) but it's due to my pod affinity rule defined:

podAffinity:
  requiredDuringSchedulingIgnoredDuringExecution:
  - labelSelector:
      matchExpressions:
      - key: appType
        operator: NotIn
        values:
        - example-api
    topologyKey: kubernetes.io/hostname

Solution 2

If you are using K8s from a cloud provider like GKE/EKS, maybe it is worth taking a look at the cloud provider resource quota!

Even everything looks reasonable, K8s gave the same error "pod didn't trigger scale-up"! And that was because the CPU quota was exhausted! (K8s has nothing to do with that limitation, so the error is not clear from K8s side).

15,338

Author by

Chris Stryczynski

Software dev(op). Independent consultant available for hire! Checkout my GitChapter project on github!

Updated on July 19, 2022

Comments

Chris Stryczynski almost 2 years

I don't understand why I'm receiving this error. A new node should definitely be able to accommodate the pod. As I'm only requesting 768Mi of memory and 450m of CPU, and the instance group that would be autoscaled is of type n1-highcpu-2 - 2 vCPU, 1.8GB.

How could I diagnose this further?

kubectl describe pod:

Name:           initial-projectinitialabcrad-697b74b449-848bl
Namespace:      production
Node:           <none>
Labels:         app=initial-projectinitialabcrad
                appType=abcrad-api
                pod-template-hash=2536306005
Annotations:    <none>
Status:         Pending
IP:             
Controlled By:  ReplicaSet/initial-projectinitialabcrad-697b74b449
Containers:
  app:
    Image:      gcr.io/example-project-abcsub/projectinitial-abcrad-app:production_6b0b3ddabc68d031e9f7874a6ea49ee9902207bc
    Port:       <none>
    Host Port:  <none>
    Limits:
      cpu:     1
      memory:  1Gi
    Requests:
      cpu:     250m
      memory:  512Mi
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-srv8k (ro)
  nginx:
    Image:      gcr.io/example-project-abcsub/projectinitial-abcrad-nginx:production_6b0b3ddabc68d031e9f7874a6ea49ee9902207bc
    Port:       80/TCP
    Host Port:  0/TCP
    Limits:
      cpu:     1
      memory:  1Gi
    Requests:
      cpu:        100m
      memory:     128Mi
    Readiness:    http-get http://:80/api/v1/ping delay=5s timeout=10s period=10s #success=1 #failure=3
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-srv8k (ro)
  cloudsql-proxy:
    Image:      gcr.io/cloudsql-docker/gce-proxy:1.11
    Port:       3306/TCP
    Host Port:  0/TCP
    Command:
      /cloud_sql_proxy
      -instances=example-project-abcsub:us-central1:abcfn-staging=tcp:0.0.0.0:3306
      -credential_file=/secrets/cloudsql/credentials.json
    Limits:
      cpu:     1
      memory:  1Gi
    Requests:
      cpu:        100m
      memory:     128Mi
    Mounts:
      /secrets/cloudsql from cloudsql-instance-credentials (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-srv8k (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  cloudsql-instance-credentials:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  cloudsql-instance-credentials
    Optional:    false
  default-token-srv8k:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-srv8k
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason             Age                  From                Message
  ----     ------             ----                 ----                -------
  Normal   NotTriggerScaleUp  4m (x29706 over 3d)  cluster-autoscaler  pod didn't trigger scale-up (it wouldn't fit if a new node is added)
  Warning  FailedScheduling   4m (x18965 over 3d)  default-scheduler   0/4 nodes are available: 3 Insufficient memory, 4 Insufficient cpu.

kentor over 4 years

That should still triger the autoscale or am I wrong? If the cluster autoscaler would add a new node, the pod would fit on it.
juan garcia over 4 years

So basically you needed to change the requiredDuringSchedulingIgnoredDuringExecution to preferredDuringSchedulingIgnoredDuringExecution?
Chris Stryczynski over 4 years

No, I think it was due to the fact that a new node wouldn't be "compatible" with the pod due to the podAfffinity defined.
NIrav Modi almost 3 years

@chris How do you have solved it? Can you explain little bit in detail?
Chris Stryczynski almost 3 years

@NIravModi if something is not clear, you should state exactly what isn't clear to you.
JuanXarg over 2 years

Hi @ChrisStryczynski. The problem is that you stated what the problem was, not what was the actual solution. The answer would be much better if you explain what the original problem was and what you actually did to fix it. Thanks!
Chris Stryczynski over 2 years

This is a solution to my specific question. If you need a more thorough explanation of this, it'd probably be best to ask a new question as opposed to me rewriting everything to be more detailed and broad.