NRPE: Unable to read output – The followup
Last week, I wrote about a problem with NRPE reporting NRPE: Unable to read output.
Harold Paulson encountered the same issue yesterday. I was able to help him debug it today.
We found a solution: a full path to sudo.
We both had the same situation. Checking from the nagios server:
$ /usr/local/libexec/nagios/check_nrpe2 -H kraken -c check_smartmon_ada8 NRPE: Unable to read output
But when running locally on the Nagios client (NOTE: I amended the shell for nagios to /bin/sh for this test):
# su -m nagios -c 'sudo /usr/local/libexec/nagios/check_smartmon -d /dev/ada8' OK: device is functional and stable (temperature: 35)|TEMP=35;55;60;
If you don’t make that temporary shell adjustment, you’ll get this instead:
# su -m nagios -c 'sudo /usr/local/libexec/nagios/check_smartmon -d /dev/ada8' UNKNOWN: no read permission given
We couldn’t figure this out. Then, while looking at /usr/local/etc/nrpe.cfg and searching for sudo, I found:
# *** THIS EXAMPLE MAY POSE A POTENTIAL SECURITY RISK, SO USE WITH CAUTION! *** # Usage scenario: # Execute restricted commmands using sudo. For this to work, you need to add # the nagios user to your /etc/sudoers. An example entry for alllowing # execution of the plugins from might be: # # nagios ALL=(ALL) NOPASSWD: /usr/local/libexec/nagios/ # # This lets the nagios user run all commands in that directory (and only them) # without asking for a password. If you do this, make sure you don't give # random users write access to that directory or its contents! # command_prefix=/usr/local/bin/sudo
My clue was the full path. I figured we were onto something. In the meantime, Harold had restarted nrpe and discovered
it resolved his particular problem. I then recalled that the same restart had fixed something for me too. I decided
to restart my system to reproduce the problem Harold had just fixed. After the reboot, Nagios was indeed reporting the same
error messages. It was about that time that I recalled this entry in /etc/crontab, which attempted, but failed, to solve
# for some reason, nrpe2 doesn't start right on boot @reboot root /bin/sleep 600 && /usr/local/etc/rc.d/nrpe2 restart
I recall now that this never need resolve the issue. I always had to restart it by hand. We were about to confirm our
suspicions. First, I mounted procfs so I can use the -e option on ps to view the environment variables of the
# mount -t procfs proc /proc
Then I looked at the existing nrpe process which was producing the error:
$ sudo ps -auxwe -p 1104 USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND nagios 1104 0.0 0.1 11060 3300 ?? Ss 1:16PM 0:00.02 HOME=/ PATH=/sbin:/bin:/usr/sbin:/usr/bin RC_PID=24 PWD=/ /usr/local/sbin/nrpe2 -d -c /usr/local/etc/nrpe.cfg
Then I restarted nrpe and issue the command for the new process:
$ sudo ps -auxwe -p 3619 sudo: cannot get working directory USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND nagios 3619 0.0 0.1 11060 3372 ?? Ss 1:25PM 0:00.00 SUDO_GID=1001 USER=root MAIL=/var/mail/root HOME=/root SUDO_UID=1001 LOGNAME=root USERNAME=root TERM=xterm PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/games:/usr/local/sbin:/usr/local/bin:/home/dan/bin RC_PID=3600 SUDO_COMMAND=/usr/local/etc/rc.d/nrpe2 restart SHELL=/bin/sh SUDO_USER=dan PWD=/proc/1104 /usr/local/sbin/nrpe2 -d -c /usr/local/etc/nrpe.cfg
As you can see, in the first output, /usr/local/bin is not in the PATH, but it is in the second output. This explains why the errors go away after nrpe is restarted.
I then amended nrpe.cfg to contain a full path to sudo:
command[check_smartmon_ada8]=/usr/local/bin/sudo /usr/local/libexec/nagios/check_smartmon -d /dev/ada8
After a reboot of the system, no more nrpe errors. 🙂
Why? When processes are run from init and cron, they exist in a very sparse environement. This is by design.
Thus, full paths are often needed and are a very good design choice in any case.