How to monitor network interface utilization correctly with Telegraf, InfluxDB, & Grafana?

14,721

Immediately after composing the question, I realized what the problem was. (cue head on desk)

Docker provides a virtual ethernet adapter to the container which will only see its own traffic. The solution is to either:

  • ...run Telegraf outside of Docker or
  • ...run the container with the --net=host flag
Share:
14,721

Related videos on Youtube

Nathan Osman
Author by

Nathan Osman

I love developing software :) I use C++ and wxWidgets extensively. I hate managed code, but I admit that it has less to do with the unmanaged code itself and more to do with the fact that it is overused. If you are looking for a good text editor, I highly recommend PowerPad - http://quickmediasolutions.com/software/powerpad/(Yes, I wrote it.)

Updated on September 18, 2022

Comments

  • Nathan Osman
    Nathan Osman over 1 year

    I have Telegraf installed on a server and it contains the following network configuration:

    [[inputs.net]]
      interfaces = ["eth0"]
    

    This feeds the following metrics into InfluxDB:

    bytes_recv, bytes_sent, drop_in, drop_out, err_in, err_out, icmp_inaddrmaskreps, icmp_inaddrmasks, icmp_incsumerrors, icmp_indestunreachs, icmp_inechoreps, icmp_inechos, icmp_inerrors, icmp_inmsgs, icmp_inparmprobs, icmp_inredirects, icmp_insrcquenchs, icmp_intimeexcds, icmp_intimestampreps, icmp_intimestamps, icmp_outaddrmaskreps, icmp_outaddrmasks, icmp_outdestunreachs, icmp_outechoreps, icmp_outechos, icmp_outerrors, icmp_outmsgs, icmp_outparmprobs, icmp_outredirects, icmp_outsrcquenchs, icmp_outtimeexcds, icmp_outtimestampreps, icmp_outtimestamps, ip_defaultttl, ip_forwarding, ip_forwdatagrams, ip_fragcreates, ip_fragfails, ip_fragoks, ip_inaddrerrors, ip_indelivers, ip_indiscards, ip_inhdrerrors, ip_inreceives, ip_inunknownprotos, ip_outdiscards, ip_outnoroutes, ip_outrequests, ip_reasmfails, ip_reasmoks, ip_reasmreqds, ip_reasmtimeout, packets_recv, packets_sent, tcp_activeopens, tcp_attemptfails, tcp_currestab, tcp_estabresets, tcp_incsumerrors, tcp_inerrs, tcp_insegs, tcp_maxconn, tcp_outrsts, tcp_outsegs, tcp_passiveopens, tcp_retranssegs, tcp_rtoalgorithm, tcp_rtomax, tcp_rtomin, udp_ignoredmulti, udp_incsumerrors, udp_indatagrams, udp_inerrors, udp_noports, udp_outdatagrams, udp_rcvbuferrors, udp_sndbuferrors, udplite_ignoredmulti, udplite_incsumerrors, udplite_indatagrams, udplite_inerrors, udplite_noports, udplite_outdatagrams, udplite_rcvbuferrors, udplite_sndbuferrors

    I then created a panel in Grafana with the following query:

    SELECT
      derivative(sum("bytes_sent"), 1s) AS "up",,
      derivative(sum("bytes_recv"), 1s) AS "down"
    FROM
      "autogen"."net"
    WHERE
      "interface" = 'eth0' AND
        $timeFilter
    GROUP BY
      time($__interval)
      fill(null)
    

    (The derivative() is necessary since bytes_recv and bytes_sent are accumulating metrics.)

    My concern is that the data is not accurate. As a test, I downloaded some very large files (1GB) and confirmed (ifconfig eth0) that RX bytes was increasing by the expected amount as data was received. However, the graph looks like this:

    enter image description here

    There is no change whatsoever to the metrics being recorded. What am I doing wrong?

    Details

    • Host is running Ubuntu Server 16.04
    • Telegraf, InfluxDB, and Grafana are running in Docker