Nagios: NRPE: Unable to read output, Can't find the reason, can you?
Solution 1
Nice detailed write-up Itai! Have you tried reducing the complexity of the config to see if it works?
For starters, I would start by changing the line in nrpe.cfg
to
command[check_kvm]=/usr/lib64/nagios/plugins/check_kvm
and temporarily change the /usr/lib64/nagios/plugins/check_kvm script to be something really simple like:
#!/bin/sh
echo Hi
exit 0
If that works, then you can start ratcheting up the complexity. Perhaps instead of giving the nagios
user sudo access to the script, it really needs access to the virsh
command and you can leave out the sudo
part in the nrpe.cfg
command line.
Solution 2
I had the same issue and I manage to solve it by killing the nagios process (on the monitored machine):
ps -ef | grep nagios
kill -9 [NagiosProcessNumber]
/etc/init.d/nagios-nrpe-server start
All went fine after that.
Solution 3
I saw a problem on a Gentoo server that resembles to yours at http://forums.gentoo.org/viewtopic-t-806014-start-0.html
there is a nice method there to debug the issue.
the user on that post had a problem with check_disk and got the exact same error message as yours.
he was told to execute the following command:
ssh remote_ip /usr/lib/nagios/plugins/check_disk -w 10 -c 5 -p "/" 2>&1
the 2>&1
will output stderr and might reveal the exact error.
so in your case replace remote_ip with the ip address of the server can't execute check_nrpe on. and replace the check_disk command with the full command that check_kvm is supposed to execute. if you run it without any parameters so you can just go and execute
ssh <remote_ip> /usr/lib64/nagios/plugins/check_kvm 2>&1
that hopefully will reveal information regarding the problem.
good luck!
Related videos on Youtube
![Itai Ganot](https://i.stack.imgur.com/ICkw8.jpg?s=256&g=1)
Itai Ganot
Architect and Lecturer in the field of DevOps Engineering. LinkedIn: https://www.linkedin.com/in/itaiganot Personal Website: http://geek-kb.com
Updated on September 18, 2022Comments
-
Itai Ganot almost 2 years
I have a Nagios server and a monitored server. On the monitored server:
[root@Monitored ~]# netstat -an |grep :5666 tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN [root@Monitored ~]# locate check_kvm /usr/lib64/nagios/plugins/check_kvm [root@Monitored ~]# /usr/lib64/nagios/plugins/check_kvm -H localhost hosts:3 OK:3 WARN:0 CRIT:0 - ab2c7:running alpweb5:running istaweb5:running [root@Monitored ~]# /usr/lib64/nagios/plugins/check_nrpe -H localhost -c check_kvm NRPE: Unable to read output [root@Monitored ~]# /usr/lib64/nagios/plugins/check_nrpe -H localhost NRPE v2.14 [root@Monitored ~]# ps -ef |grep nrpe nagios 21178 1 0 16:11 ? 00:00:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d [root@Monitored ~]#
On the Nagios server:
[root@Nagios ~]# /usr/lib64/nagios/plugins/check_nrpe -H 1.1.1.159 -c check_kvm NRPE: Unable to read output [root@Nagios ~]# /usr/lib64/nagios/plugins/check_nrpe -H 1.1.1.159 NRPE v2.14 [root@Nagios ~]#
When I check another server in the network using the same command it works:
[root@Nagios ~]# /usr/lib64/nagios/plugins/check_nrpe -H 1.1.1.80 -c check_kvm hosts:4 OK:4 WARN:0 CRIT:0 - karmisoft:running ab2c4:running kidumim1:running travel2gether1:running [root@Nagios ~]#
Running the check locally using Nagios account:
[root@Monitored ~]# su - nagios -bash-4.1$ /usr/lib64/nagios/plugins/check_kvm hosts:3 OK:3 WARN:0 CRIT:0 - ab2c7:running alpweb5:running istaweb5:running -bash-4.1$
Running the check remotely from the Nagios server using Nagios account:
-bash-4.1$ /usr/lib64/nagios/plugins/check_nrpe -H 1.1.1.159 -c check_kvm NRPE: Unable to read output -bash-4.1$ /usr/lib64/nagios/plugins/check_nrpe -H 1.1.1.159 NRPE v2.14 -bash-4.1$
Running the same check_kvm against a different server in the network using Nagios account:
-bash-4.1$ /usr/lib64/nagios/plugins/check_nrpe -H 1.1.1.80 -c check_kvm hosts:4 OK:4 WARN:0 CRIT:0 - karmisoft:running ab2c4:running kidumim1:running travel2gether1:running -bash-4.1$
Permissions:
-rwxr-xr-x. 1 root root 4684 2013-10-14 17:14 nrpe.cfg (aka /etc/nagios/nrpe.cfg) drwxrwxr-x. 3 nagios nagios 4096 2013-10-15 03:38 plugins (aka /usr/lib64/nagios/plugins)
/etc/sudoers:
[root@Monitored ~]# grep -i requiretty /etc/sudoers #Defaults requiretty
iptables/selinux:
[root@Monitored xinetd.d]# service iptables status iptables: Firewall is not running. [root@Monitored xinetd.d]# service ip6tables status ip6tables: Firewall is not running. [root@Monitored xinetd.d]# grep disable /etc/selinux/config # disabled - No SELinux policy is loaded. SELINUX=disabled [root@Monitored xinetd.d]#
The command in
/etc/nagios/nrpe.cfg
is:[root@Monitored ~]# grep kvm /etc/nagios/nrpe.cfg command[check_kvm]=sudo /usr/lib64/nagios/plugins/check_kvm
and the
nagios
user is added on/etc/sudoers
:nagios ALL=(ALL) NOPASSWD:/usr/lib64/nagios/plugins/check_kvm nagios ALL=(ALL) NOPASSWD:/usr/lib64/nagios/plugins/check_nrpe
The
check_kvm
is a shell script, looks like that:#!/bin/sh LIST=$(virsh list --all | sed '1,2d' | sed '/^$/d'| awk '{print $2":"$3}') if [ ! "$LIST" ]; then EXITVAL=3 #Status 3 = UNKNOWN (orange) echo "Unknown guests" exit $EXITVAL fi OK=0 WARN=0 CRIT=0 NUM=0 for host in $(echo $LIST) do name=$(echo $host | awk -F: '{print $1}') state=$(echo $host | awk -F: '{print $2}') NUM=$(expr $NUM + 1) case "$state" in running|blocked) OK=$(expr $OK + 1) ;; paused) WARN=$(expr $WARN + 1) ;; shutdown|shut*|crashed) CRIT=$(expr $CRIT + 1) ;; *) CRIT=$(expr $CRIT + 1) ;; esac done if [ "$NUM" -eq "$OK" ]; then EXITVAL=0 #Status 0 = OK (green) fi if [ "$WARN" -gt 0 ]; then EXITVAL=1 #Status 1 = WARNING (yellow) fi if [ "$CRIT" -gt 0 ]; then EXITVAL=2 #Status 2 = CRITICAL (red) fi echo hosts:$NUM OK:$OK WARN:$WARN CRIT:$CRIT - $LIST exit $EXITVAL
Edit (10/22/13): Following all that, I am now able to get some response from the script:
[root@Monitored ~]# /usr/lib64/nagios/plugins/check_nrpe -H localhost -c check_kvm Unknown guests [root@Monitored ~]# /usr/lib64/nagios/plugins/check_nrpe -H localhost NRPE v2.14 [root@Monitored ~]# /usr/lib64/nagios/plugins/check_kvm hosts:3 OK:3 WARN:0 CRIT:0 - ab2c7:running alpweb5:running istaweb5:running [root@Monitored ~]# su - nagios -bash-4.1$ /usr/lib64/nagios/plugins/check_kvm hosts:3 OK:3 WARN:0 CRIT:0 - ab2c7:running alpweb5:running istaweb5:running -bash-4.1$ /usr/lib64/nagios/plugins/check_nrpe -H localhost -c check_kvm Unknown guests -bash-4.1$ /usr/lib64/nagios/plugins/check_nrpe -H localhost NRPE v2.14
It seems like the problem is some how related to the
check_nrpe
command or something which is related to thenrpe
installation on the server.Edit 12/2/13: Other checks on the problematic server work:
-
Itai Ganot over 10 yearsI Have tried it and still getting NRPE: Unable to read output, any more suggestions?
-
KJH over 10 yearsWhat are the ownership and permissions on
/usr/lib64/nagios/plugins/check_kvm
? -
KJH over 10 yearsDid you try changing the script itself to something simple? I don't think the basic "Hi" one I suggested would be 2581 bytes?
-
Itai Ganot over 10 yearsYes, I've tried changing the script but to no avail. more than that, the script works just fine when checked against another server, or if i run it locally, only when i use the
check_nrpe -H localhost -c check_kvm
method it returnsUnknown guests
-
KJH over 10 yearsHi Itai - join this chat room: chat.stackexchange.com/rooms/11147/…
-
Itai Ganot over 10 yearsIt seems like i missed you at the chat, but i've updated the questions, thank you.
-
KJH over 10 yearsTry turning on debug in NRPE (might need a restart) and capture the output from wherever it logs to.
-
Itai Ganot over 10 yearsUnfortunately, i get the same outputs:
[root@Nagios-SRV ~]# ssh 1.1.1.159 /usr/lib64/nagios/plugins/check_kvm "/" 2>&[email protected]'s password: hosts:3 OK:3 WARN:0 CRIT:0 - ab2c7:running alpweb5:running istaweb5:running [root@Nagios-SRV ~]# ssh 1.1.1.159 /usr/lib64/nagios/plugins/check_nrpe -H localhost "/" 2>&1 [email protected]'s password: NRPE v2.14 [root@Nagios-SRV ~]# ssh 1.1.1.159 /usr/lib64/nagios/plugins/check_nrpe -H localhost -c check_kvm "/" 2>&1 [email protected]'s password: Unknown guests [root@Nagios-SRV ~]#
-
ufk over 10 yearshave you tried running other scripts like check_disk ? does this behaviour happens on every script or just this one ?
-
KJH over 10 yearsHave you tried running
virsh list --all
as root and as nagios on that system? -
Itai Ganot over 10 yearsYes everything else works and so does the
check_kvm
script while checking other remote machines. -
Some Linux Nerd almost 10 yearsMine says "sorry, you must have a tty to run sudo" :)