Prometheus alert CPUThrottlingHigh raised but monitoring does not show it

docker kubernetes monitoring prometheus

12,735

Solution 1

The CPUThrottlingHigh is an alert created by the kubernetes-mixin project. There is an open issue (#108) to discuss this alert. I suggest that you read all the comments on this issue to better understand the problem.

In short, the problem is: When working with low CPU limits, spiky workloads can have low averages and still be being throttled.

Also, take a look at this issue (#67577) from Kubernetes project, which addresses a Kernel bug in CFS quotas that may cause unnecessary CPU throttling. The discussion is still open, and the Kubernetes project are even considering disabling CFS quotas for pods in the Guaranteed QoS (see #70585 for reference).

Consider the following options:

Increase (or even remove) your container CPU limits
Disable Kubernetes CFS quotas entirely (kubelet's flag --cpu-cfs-quota=false)
Use a Kernel version that contains this fix (torvalds/linux 512ac99)

Solution 2

To piggy back off of @eduardo-baitello answer, A third option is to increase the CPUThrottlingPercent config here

12,735

jobou

Updated on September 18, 2022

Comments

jobou over 1 year
I have installed Prometheus to monitor my installation and it is frequently raising alerts about CPU throttling.

The Prometheus alert rules to identify this alert is :
```
alert: CPUThrottlingHigh
expr: 100
  * sum by(container_name, pod_name, namespace) (increase(container_cpu_cfs_throttled_periods_total{container_name!=""}[5m]))
  / sum by(container_name, pod_name, namespace) (increase(container_cpu_cfs_periods_total[5m]))
  > 25
for: 15m
```
If I look at one of the pods identified by this alert, it does not seem to have any reason to throttle :
```
$ kubectl top pod -n monitoring my-pod
NAME            CPU(cores)   MEMORY(bytes)   
my-pod          0m           6Mi
```
This pod has one container with these resources setup :
```
Limits:
  cpu:     100m
  memory:  128Mi
Requests:
  cpu:     25m
  memory:  64Mi
```
And the node that is hosting this pod is not under any heavy cpu use :
```
$ kubectl -n monitoring top node aks-agentpool-node-1
NAME                       CPU(cores)   CPU%      MEMORY(bytes)   MEMORY%   
aks-agentpool-node-1       853m         21%       11668Mi         101%
```
On grafana, if I look at the chart for this pod, it never goes above 0,000022 of cpu usage

Why is it throttling ?
- sheldonhull about 5 years
  
  Have you validated that you need to multiple by 100? Is it a decimal format for the percentage or an integer value?
- jobou about 5 years
  
  I am not sure to understand. The expression 100m is the same as 0.1 cpu.
- sheldonhull about 5 years
  
  If that's the syntax for Prometheus then disregard. Was just checking.
jobou about 5 years

Thanks, it is nice to know that the issue is not on my installation. I don't want to increase or remove limits for now. And I am using AKS (a managed implementation of Kubernetes in Azure cloud) so I cannot disable CFS quotas. As per the comments in the issue, I am going to try to update the kernel version to see what happens.
Vadim Yangunaev over 4 years

@jobou, have you update a kernel version? Was it helpful?
jobou over 4 years

No sorry, I have not worked on this issue yet so I don't have any real feedback to give.
Jon Carlson over 3 years

We are still eeing this exact same thing in Dec 2020 and we are running on GKE 1.16.15.4300. Our requests and limits are quite a bit higher than the ones in the question. Removing CPU limits is not really an option for us.
BPD over 3 years

Mind posting this as a comment?