Kubernetes reports "pod didn't trigger scale-up (it wouldn't fit if a new node is added)" even though it would?

15,338

Solution 1

It's not the hardware requests (confusingly the error message made me assume this) but it's due to my pod affinity rule defined:

podAffinity:
  requiredDuringSchedulingIgnoredDuringExecution:
  - labelSelector:
      matchExpressions:
      - key: appType
        operator: NotIn
        values:
        - example-api
    topologyKey: kubernetes.io/hostname

Solution 2

If you are using K8s from a cloud provider like GKE/EKS, maybe it is worth taking a look at the cloud provider resource quota!

Even everything looks reasonable, K8s gave the same error "pod didn't trigger scale-up"! And that was because the CPU quota was exhausted! (K8s has nothing to do with that limitation, so the error is not clear from K8s side).

Share:
15,338
Chris Stryczynski
Author by

Chris Stryczynski

Software dev(op). Independent consultant available for hire! Checkout my GitChapter project on github!

Updated on July 19, 2022

Comments

  • Chris Stryczynski
    Chris Stryczynski almost 2 years

    I don't understand why I'm receiving this error. A new node should definitely be able to accommodate the pod. As I'm only requesting 768Mi of memory and 450m of CPU, and the instance group that would be autoscaled is of type n1-highcpu-2 - 2 vCPU, 1.8GB.

    How could I diagnose this further?

    kubectl describe pod:

    Name:           initial-projectinitialabcrad-697b74b449-848bl
    Namespace:      production
    Node:           <none>
    Labels:         app=initial-projectinitialabcrad
                    appType=abcrad-api
                    pod-template-hash=2536306005
    Annotations:    <none>
    Status:         Pending
    IP:             
    Controlled By:  ReplicaSet/initial-projectinitialabcrad-697b74b449
    Containers:
      app:
        Image:      gcr.io/example-project-abcsub/projectinitial-abcrad-app:production_6b0b3ddabc68d031e9f7874a6ea49ee9902207bc
        Port:       <none>
        Host Port:  <none>
        Limits:
          cpu:     1
          memory:  1Gi
        Requests:
          cpu:     250m
          memory:  512Mi
        Mounts:
          /var/run/secrets/kubernetes.io/serviceaccount from default-token-srv8k (ro)
      nginx:
        Image:      gcr.io/example-project-abcsub/projectinitial-abcrad-nginx:production_6b0b3ddabc68d031e9f7874a6ea49ee9902207bc
        Port:       80/TCP
        Host Port:  0/TCP
        Limits:
          cpu:     1
          memory:  1Gi
        Requests:
          cpu:        100m
          memory:     128Mi
        Readiness:    http-get http://:80/api/v1/ping delay=5s timeout=10s period=10s #success=1 #failure=3
        Mounts:
          /var/run/secrets/kubernetes.io/serviceaccount from default-token-srv8k (ro)
      cloudsql-proxy:
        Image:      gcr.io/cloudsql-docker/gce-proxy:1.11
        Port:       3306/TCP
        Host Port:  0/TCP
        Command:
          /cloud_sql_proxy
          -instances=example-project-abcsub:us-central1:abcfn-staging=tcp:0.0.0.0:3306
          -credential_file=/secrets/cloudsql/credentials.json
        Limits:
          cpu:     1
          memory:  1Gi
        Requests:
          cpu:        100m
          memory:     128Mi
        Mounts:
          /secrets/cloudsql from cloudsql-instance-credentials (ro)
          /var/run/secrets/kubernetes.io/serviceaccount from default-token-srv8k (ro)
    Conditions:
      Type           Status
      PodScheduled   False 
    Volumes:
      cloudsql-instance-credentials:
        Type:        Secret (a volume populated by a Secret)
        SecretName:  cloudsql-instance-credentials
        Optional:    false
      default-token-srv8k:
        Type:        Secret (a volume populated by a Secret)
        SecretName:  default-token-srv8k
        Optional:    false
    QoS Class:       Burstable
    Node-Selectors:  <none>
    Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                     node.kubernetes.io/unreachable:NoExecute for 300s
    Events:
      Type     Reason             Age                  From                Message
      ----     ------             ----                 ----                -------
      Normal   NotTriggerScaleUp  4m (x29706 over 3d)  cluster-autoscaler  pod didn't trigger scale-up (it wouldn't fit if a new node is added)
      Warning  FailedScheduling   4m (x18965 over 3d)  default-scheduler   0/4 nodes are available: 3 Insufficient memory, 4 Insufficient cpu.
    
  • kentor
    kentor over 4 years
    That should still triger the autoscale or am I wrong? If the cluster autoscaler would add a new node, the pod would fit on it.
  • juan garcia
    juan garcia over 4 years
    So basically you needed to change the requiredDuringSchedulingIgnoredDuringExecution to preferredDuringSchedulingIgnoredDuringExecution?
  • Chris Stryczynski
    Chris Stryczynski over 4 years
    No, I think it was due to the fact that a new node wouldn't be "compatible" with the pod due to the podAfffinity defined.
  • NIrav Modi
    NIrav Modi almost 3 years
    @chris How do you have solved it? Can you explain little bit in detail?
  • Chris Stryczynski
    Chris Stryczynski almost 3 years
    @NIravModi if something is not clear, you should state exactly what isn't clear to you.
  • JuanXarg
    JuanXarg over 2 years
    Hi @ChrisStryczynski. The problem is that you stated what the problem was, not what was the actual solution. The answer would be much better if you explain what the original problem was and what you actually did to fix it. Thanks!
  • Chris Stryczynski
    Chris Stryczynski over 2 years
    This is a solution to my specific question. If you need a more thorough explanation of this, it'd probably be best to ask a new question as opposed to me rewriting everything to be more detailed and broad.