Prometheus alert CPUThrottlingHigh raised but monitoring does not show it

12,735

Solution 1

The CPUThrottlingHigh is an alert created by the kubernetes-mixin project. There is an open issue (#108) to discuss this alert. I suggest that you read all the comments on this issue to better understand the problem.

In short, the problem is: When working with low CPU limits, spiky workloads can have low averages and still be being throttled.

Also, take a look at this issue (#67577) from Kubernetes project, which addresses a Kernel bug in CFS quotas that may cause unnecessary CPU throttling. The discussion is still open, and the Kubernetes project are even considering disabling CFS quotas for pods in the Guaranteed QoS (see #70585 for reference).

Consider the following options:

  • Increase (or even remove) your container CPU limits
  • Disable Kubernetes CFS quotas entirely (kubelet's flag --cpu-cfs-quota=false)
  • Use a Kernel version that contains this fix (torvalds/linux 512ac99)

Solution 2

To piggy back off of @eduardo-baitello answer, A third option is to increase the CPUThrottlingPercent config here

Share:
12,735

Related videos on Youtube

jobou
Author by

jobou

Updated on September 18, 2022

Comments

  • jobou
    jobou over 1 year

    I have installed Prometheus to monitor my installation and it is frequently raising alerts about CPU throttling.

    The Prometheus alert rules to identify this alert is :

    alert: CPUThrottlingHigh
    expr: 100
      * sum by(container_name, pod_name, namespace) (increase(container_cpu_cfs_throttled_periods_total{container_name!=""}[5m]))
      / sum by(container_name, pod_name, namespace) (increase(container_cpu_cfs_periods_total[5m]))
      > 25
    for: 15m
    

    If I look at one of the pods identified by this alert, it does not seem to have any reason to throttle :

    $ kubectl top pod -n monitoring my-pod
    NAME            CPU(cores)   MEMORY(bytes)   
    my-pod          0m           6Mi
    

    This pod has one container with these resources setup :

    Limits:
      cpu:     100m
      memory:  128Mi
    Requests:
      cpu:     25m
      memory:  64Mi
    

    And the node that is hosting this pod is not under any heavy cpu use :

    $ kubectl -n monitoring top node aks-agentpool-node-1
    NAME                       CPU(cores)   CPU%      MEMORY(bytes)   MEMORY%   
    aks-agentpool-node-1       853m         21%       11668Mi         101%
    

    On grafana, if I look at the chart for this pod, it never goes above 0,000022 of cpu usage

    Why is it throttling ?

    • sheldonhull
      sheldonhull about 5 years
      Have you validated that you need to multiple by 100? Is it a decimal format for the percentage or an integer value?
    • jobou
      jobou about 5 years
      I am not sure to understand. The expression 100m is the same as 0.1 cpu.
    • sheldonhull
      sheldonhull about 5 years
      If that's the syntax for Prometheus then disregard. Was just checking.
  • jobou
    jobou about 5 years
    Thanks, it is nice to know that the issue is not on my installation. I don't want to increase or remove limits for now. And I am using AKS (a managed implementation of Kubernetes in Azure cloud) so I cannot disable CFS quotas. As per the comments in the issue, I am going to try to update the kernel version to see what happens.
  • Vadim Yangunaev
    Vadim Yangunaev over 4 years
    @jobou, have you update a kernel version? Was it helpful?
  • jobou
    jobou over 4 years
    No sorry, I have not worked on this issue yet so I don't have any real feedback to give.
  • Jon Carlson
    Jon Carlson over 3 years
    We are still eeing this exact same thing in Dec 2020 and we are running on GKE 1.16.15.4300. Our requests and limits are quite a bit higher than the ones in the question. Removing CPU limits is not really an option for us.
  • BPD
    BPD over 3 years
    Mind posting this as a comment?