Does anyone use check_mk for Nagios? Anything I should be aware of before considering it?

5,036

Solution 1

Disclaimer: I used to work on that project because I felt it's extremely powerful. (and i still think so)

I use it since 2009ish and have except for legacy setups never touched a "normal" (one may say legacy) Nagios setup again. It would feel like a waste of time.

The largest setup I know of is ~1200 monitoring servers. (not: monitored servers) That one is also published, but the original question predates it.

It's now being used in quite many places that weren't happy with plain nagios as opposed to larger scale NMS like OpenView and changed their minds.

The key difference is not scalability (as 37signals seem to quite much enjoy), or the autodetection of monitorable things in a remote system which makes it all a nobrainer and even alerts you if something new is added but not being monitored.

No, the really big thing in the long run is the configuration, which is strictly rule based (and written out as python). A few 100 lines of Check_MK config are enough to let it generate 200K lines of old boring nagios syntax you'll never look back to.

  • It also has a web-based config editor. With inheritance. And validation.
  • The GUI is, among other stuff, optimized for WAN links. And it's actually a full web framework, which is why there's also dashboards and a log classification engine that can take in syslog or snmp for Nagios processing with flexible rulesets.
  • All the checks are written to high quality standards and it shows in time saved for the user.

There's no ponies though.

  • People often get confused about the interaction between Check_MK and Nagios, which is not trivial but actually nicely separated: It writes config, Nagios runs with that config and calls Check_MK to monitor systems.
  • If someone is not using the graphical config editor "WATO" they're assumed to be on an expert level in Nagios.
  • There's no GUI Ops manual! (but: inline help that can be enabled on the fly)
  • perfectly working IPv6 support patches have been floating for years and gone nowhere, yet.

There's many more pros and cons to bring up, but I think I already showed both sides quite well. Personally I like the efficiency of Check_MK setups and am really annoyed if I have to work with oldskool Nagios setups. Even if they use nice template frameworks or are commandeered from Puppet it still feels stone-aged and helpless in comparism to me.

Disclaimer: see above ;)

Solution 2

Does anyone use it? Yes.

37signals (a software company) just posted an overview of how they monitor their systems using nagios, and the major benefits they saw when they started using check_mk. http://37signals.com/svn/posts/3178-nagios-monitoring-performance

Share:
5,036

Related videos on Youtube

WinkyWolly
Author by

WinkyWolly

Updated on September 17, 2022

Comments

  • WinkyWolly
    WinkyWolly almost 2 years

    http://mathias-kettner.de/check_mk.html

    I've been testing it out on a couple of development machines and it seems pretty nifty. I cannot however find much information on deployments of it. Does anyone run this actively? Did anyone rule this out as an option for some reason?

    • Wouter de Bie
      Wouter de Bie almost 14 years
      Thanks for the link! I'll definitely try this out. Seems great for local checks and a replacement for NRPE.
    • dsummersl
      dsummersl almost 12 years
      I haven't used this, but it does IMO it fits into that fuzzy devops landscape. In chef/puppet you'd use ohai/facter to do what it sounds like this mk plugin does, you'd export a nagios configuration that wires an ohai/facter status. This perhaps looks less roundabout. Thanks for the link, I'm definitely gonna look into it myself!
  • Michael Hampton
    Michael Hampton about 11 years
    Right, I liked check_mk but couldn't use it due to the lack of IPv6 support. I'm in a 100% dual stack environment.
  • Florian Heigl
    Florian Heigl about 11 years
    my brainchild is to use NAT64 on the monitoring box or it's gateway and do the monitoring 100% via v6; the icinga core is very v6 ready for example. oh well. soon enough the v4 only people will start seeing the issues their slacking brings them :)