How to monitor network interface utilization correctly with Telegraf, InfluxDB, & Grafana?

networking monitoring grafana

14,721

Immediately after composing the question, I realized what the problem was. (cue head on desk)

Docker provides a virtual ethernet adapter to the container which will only see its own traffic. The solution is to either:

...run Telegraf outside of Docker or
...run the container with the --net=host flag

14,721

Nathan Osman

I love developing software :) I use C++ and wxWidgets extensively. I hate managed code, but I admit that it has less to do with the unmanaged code itself and more to do with the fact that it is overused. If you are looking for a good text editor, I highly recommend PowerPad - http://quickmediasolutions.com/software/powerpad/(Yes, I wrote it.)

Updated on September 18, 2022

Comments

Nathan Osman over 1 year
I have Telegraf installed on a server and it contains the following network configuration:
```
[[inputs.net]]
  interfaces = ["eth0"]
```
This feeds the following metrics into InfluxDB:

bytes_recv, bytes_sent, drop_in, drop_out, err_in, err_out, icmp_inaddrmaskreps, icmp_inaddrmasks, icmp_incsumerrors, icmp_indestunreachs, icmp_inechoreps, icmp_inechos, icmp_inerrors, icmp_inmsgs, icmp_inparmprobs, icmp_inredirects, icmp_insrcquenchs, icmp_intimeexcds, icmp_intimestampreps, icmp_intimestamps, icmp_outaddrmaskreps, icmp_outaddrmasks, icmp_outdestunreachs, icmp_outechoreps, icmp_outechos, icmp_outerrors, icmp_outmsgs, icmp_outparmprobs, icmp_outredirects, icmp_outsrcquenchs, icmp_outtimeexcds, icmp_outtimestampreps, icmp_outtimestamps, ip_defaultttl, ip_forwarding, ip_forwdatagrams, ip_fragcreates, ip_fragfails, ip_fragoks, ip_inaddrerrors, ip_indelivers, ip_indiscards, ip_inhdrerrors, ip_inreceives, ip_inunknownprotos, ip_outdiscards, ip_outnoroutes, ip_outrequests, ip_reasmfails, ip_reasmoks, ip_reasmreqds, ip_reasmtimeout, packets_recv, packets_sent, tcp_activeopens, tcp_attemptfails, tcp_currestab, tcp_estabresets, tcp_incsumerrors, tcp_inerrs, tcp_insegs, tcp_maxconn, tcp_outrsts, tcp_outsegs, tcp_passiveopens, tcp_retranssegs, tcp_rtoalgorithm, tcp_rtomax, tcp_rtomin, udp_ignoredmulti, udp_incsumerrors, udp_indatagrams, udp_inerrors, udp_noports, udp_outdatagrams, udp_rcvbuferrors, udp_sndbuferrors, udplite_ignoredmulti, udplite_incsumerrors, udplite_indatagrams, udplite_inerrors, udplite_noports, udplite_outdatagrams, udplite_rcvbuferrors, udplite_sndbuferrors

I then created a panel in Grafana with the following query:
```
SELECT
  derivative(sum("bytes_sent"), 1s) AS "up",,
  derivative(sum("bytes_recv"), 1s) AS "down"
FROM
  "autogen"."net"
WHERE
  "interface" = 'eth0' AND
    $timeFilter
GROUP BY
  time($__interval)
  fill(null)
```
(The derivative() is necessary since bytes_recv and bytes_sent are accumulating metrics.)

My concern is that the data is not accurate. As a test, I downloaded some very large files (1GB) and confirmed (ifconfig eth0) that RX bytes was increasing by the expected amount as data was received. However, the graph looks like this:

There is no change whatsoever to the metrics being recorded. What am I doing wrong?

Details
- Host is running Ubuntu Server 16.04
- Telegraf, InfluxDB, and Grafana are running in Docker