Smoothing Grafana graph

5,942

I should preface my answer by saying that I'm a Prometheus newbie getting used to PromQL myself.

I should remind myself that 'rate' is something that should only be applied to counters, and not gauges (your example is okay with that). I'm going to use the number of time a host has called connect() on a TCP socket.

node_netstat_Tcp_ActiveOpens

Graphing that, it is always increasing, so very likely a counter, even though the node-exporter datasource says the following:

# HELP node_netstat_Tcp_ActiveOpens Statistic TcpActiveOpens.
# TYPE node_netstat_Tcp_ActiveOpens untyped
node_netstat_Tcp_ActiveOpens 20684

I only have on instance that I'm monitoring, but let's add a filter in there, just we don't end up querying too many time series:

node_netstat_Tcp_ActiveOpens{instance="localhost:9100"}

Let's add a 'rate' onto that:

rate(node_netstat_Tcp_ActiveOpens{instance="localhost:9100"})

Oh, but that failed saying 'expected type range vector in call to function "rate", got instant vector'; to graph this, we need something of type 'instant vector', but we got a 'rate vector'. We need to specify a quantum (I don't think that the term Prometheus uses) over which to aggregate/windowize the time-series data.

rate(node_netstat_Tcp_ActiveOpens{instance="localhost:9100"}[5m])

Okay, that works, and we can play with the smoothing by changing 5m to 20m etc...

But spikes can be useful, and 'rate' is just one way of working with the data.

I think its useful to look at a certain percentile of the data, which helps to remove outliers but still gives some liveliness to the displayed data. Here I've specified the 90th percentile.

quantile(0.9, rate(node_netstat_Tcp_ActiveOpens{instance="localhost:9100"}[5m]))

You'll want to try this one more realistic data to find a nice quantile (0.9) and window (5m).

Let's just assume that this data has a faily pattern, and we want to compare the data to the data 1 day previous:

  quantile(0.9, rate(node_netstat_Tcp_ActiveOpens{instance="localhost:9100"}[5m]))
- quantile(0.9, rate(node_netstat_Tcp_ActiveOpens{instance="localhost:9100"}[5m] offset 1d))

Hope that gives you some idea; its sure useful practice for me at least :)

Cheers, Cameron

Share:
5,942

Related videos on Youtube

Ror
Author by

Ror

Updated on September 18, 2022

Comments

  • Ror
    Ror over 1 year

    I'm setting up montinoring on my company infrastructure. I installed Grafana with the Prometheus Datasource. I built my dashboards and have rather "Spiky" graphs. I was wondering if there was any way to smooth it up a bit so it would be more readable.

    Thanks !

  • Ror
    Ror over 4 years
    Thanks for the detailed explanation ! In my case, my graph is composed of multiple metrics and i'm already using rate on each one. What I need is a function to smooth the entire graph and not just the metric