Nagios NRPE check_procs reporting incorrect number

9,617

Solution 1

I had a similar problem. check_procs is internally calling /bin/ps axwo 'stat uid pid ppid vsz rss pcpu comm args', it's listing the processes and then counting them. If you have configured nagios to run with a different user, it's using sudo to execute the command. And here is the problem. If you type sudo ps -AF | grep sudo, some distribution return "grep sudo", others return "sudo ps -AF" and "grep sudo". As check_procs is counting all processes, you will get different results on different machines. Unfortunately I do not have a solution yet how to force check_procs to not count sudo processes.

Solution 2

Sounds like the version of check_procs in Lucid is buggy (or at least has changed it's behaviour); the fact that it's always finding one extra suggests that it's doing a substring match on any part of the process name and arguments and finding one of the processes associated with running the plugin. Worth reporting a bug to Ubuntu about it.

Share:
9,617

Related videos on Youtube

Nick Sturgess
Author by

Nick Sturgess

Updated on September 18, 2022

Comments

  • Nick Sturgess
    Nick Sturgess almost 2 years

    I have an etch nagios server (I will call this NagiosServer), successfully monitoring lots of stuff, including another etch server (MonitorEtch). Using check_nrpe and check_procs I'm able to check the process list for running daemons such as '/usr/sbin/squid'.

    Check MonitorEtch from NagiosServer (correct results):

    /usr/lib/nagios/plugins/check_nrpe -H MonitorEtch -c check_process -a /usr/sbin/squid 1:1 1:1
    PROCS OK: 1 process with args '/usr/sbin/squid'
    
    /usr/lib/nagios/plugins/check_nrpe -H MonitorEtch -c check_process -a whatever 1:1 1:1
    PROCS CRITICAL: 0 processes with args 'whatever'
    

    I've built a new server I want to monitor, running Lucid (MonitorLucid). No matter what process I try and monitor from NagiosServer, I get an extra result than I should.

    Check MonitorLucid from NagiosServer (incorrect results):

    /usr/lib/nagios/plugins/check_nrpe -H MonitorLucid -c check_process -a whatever 1:1 1:1
    PROCS OK: 1 process with args 'whatever'
    
    ps ax|grep sophie
    12737 ?        Ss     0:00 /usr/sbin/sophie -D
    12738 ?        S      0:03 /usr/sbin/sophie -D
    19591 pts/0    S+     0:00 grep --color=auto sophie
    
    /usr/lib/nagios/plugins/check_nrpe -H 192.168.19.252 -c check_process -a sophie 1:1 1:1
    PROCS CRITICAL: 3 processes with args 'sophie'
    

    Contents of /etc/nagios/nrpe_local.cfg on MonitorLucid/MonitorEtch:

    allowed_hosts=NagiosServer,127.0.0.1
    dont_blame_nrpe=1
    command[check_process]=/usr/lib/nagios/plugins/check_procs -a $ARG1$ -m PROCS -w $ARG2$ -c $ARG3$
    

    Yet if I perform the check locally, it returns the correct results!

    Check MonitorLucid from MonitorLucid (correct results):

    /usr/lib/nagios/plugins/check_procs -a whatever -m PROCS -w 1:1 -c 1:1
    PROCS CRITICAL: 0 processes with args 'whatever'
    
    /usr/lib/nagios/plugins/check_procs -a sophie -m PROCS -w 1:1 -c 1:1
    PROCS CRITICAL: 2 processes with args 'sophie'
    

    I'm out of ideas at this point, short of checking for 1 more instance than I should.