Sep 032010
3Ware Nagios plugin
I use Nagios to monitor my servers and work stations. If something goes wrong, I usually get told by Nagios before I notice the problem myself. A week or so back, I noticed a rather odd RAID problem. Eventually, the problem was solved by upgrading the firmware on the controller. In the meantime, I had located and installed a Nagios 3ware plugin. I like it and I’m using it on more than one server. However, now that I turned on AUTO-VERIFY, I’ve found a spot where I can improve the plugin.Verifying…!
Earlier today, I turned on AUTO-VERIFY for this controller. Tonight, Nagios is reporting:Status: UNKNOWN Status Information: UNKNOWN: /c0/u0 RAID-10 VERIFYING - 56% 64K 195.548 ON ON - /c0/u1 SPARE VERIFYING - 0% - 69.2404 - ON - /c0/u2 SPARE VERIFYING - 0% - 69.2404 - ON -If I look at the status output, I see:
$ sudo /usr/local/sbin/tw_cli info c0 u0 Password: Unit UnitType Status %RCmpl %V/I/M Port Stripe Size(GB) ------------------------------------------------------------------------ u0 RAID-10 VERIFYING - 62% - 64K 195.548 u0-0 RAID-1 VERIFYING 62% - - - - u0-0-0 DISK OK - - p0 - 65.1826 u0-0-1 DISK OK - - p2 - 65.1826 u0-1 RAID-1 VERIFYING 62% - - - - u0-1-0 DISK OK - - p6 - 65.1826 u0-1-1 DISK OK - - p5 - 65.1826 u0-2 RAID-1 VERIFYING 63% - - - - u0-2-0 DISK OK - - p3 - 65.1826 u0-2-1 DISK OK - - p4 - 65.1826 u0/v0 Volume - - - - - 195.548Now I’d rather have something other than UNKNOWN. Fortunately, I have the source.
The patch!
This is the patch:--- /usr/local/libexec/nagios/check_3ware.sh 2010-08-27 02:34:55.000000000 +0100 +++ /home/dan/bin/check_3ware.sh 2010-09-02 01:08:39.000000000 +0100 @@ -66,6 +66,12 @@ MSG="$MSG $STATUS -" PREEXITCODE=1 ;; + VERIFYING) + CHECKUNIT=`$TWCLI info $i unitstatus | ${GREP} -E "${UNIT[$COUNT]}" | ${AWK} '{print $1,$3,$5}'` + STATUS="/$i/$CHECKUNIT" + MSG="$MSG $STATUS -" + PREEXITCODE=1 + ;; DEGRADED) CHECKUNIT=`$TWCLI info $i unitstatus | ${GREP} -E "${UNIT[$COUNT]}" | ${AWK} '{print $1,$3}'` STATUS="/$i/$CHECKUNIT"This is what it outputs:
$ sudo ~/bin/check_3ware.sh WARNING: /c0/u0 VERIFYING 89% - /c0/u1 VERIFYING 0% - /c0/u2 VERIFYING 0% -After replacing the original script, I get this output when testing it from the command line on the Nagios server:
$ /usr/local/libexec/nagios/check_nrpe2 -H supernews-vpn -c check_3ware.sh WARNING: /c0/u0 VERIFYING 99% - /c0/u1 VERIFYING 1% - /c0/u2 VERIFYING 0% -I now see this on my Nagios webpage:
Status: WARNING Status Information: WARNING: /c0/u0 VERIFYING 99% - /c0/u1 VERIFYING 1% - /c0/u2 VERIFYING 0% -Other ideas
Tonight I started a battery test. The status immediately went to CRITICAL. That got me thinking about this patch:$ diff -ruN /usr/local/libexec/nagios/check_3ware.sh ~/bin/check_3ware.sh --- /usr/local/libexec/nagios/check_3ware.sh 2010-09-02 01:08:39.000000000 +0100 +++ /home/dan/bin/check_3ware.sh 2010-09-02 02:52:39.000000000 +0100 @@ -100,7 +100,7 @@ # Check BBU's BBU=(`$TWCLI info $i |${GREP} -E "^bbu"|${AWK} '{print $1,$2,$3,$4,$5}'`) if [ "${BBU[0]}" = "bbu" ]; then - if [ "${BBU[1]}" != "On" ] || [ "${BBU[2]}" != "Yes" ] || [ "${BBU[3]}" != "OK" ] || [ "${BBU[4]}" != "OK" ]; then + if [ "${BBU[1]}" != "On" ] || [ "${BBU[2]}" != "Yes" ] || [ "${BBU[3]}" != "OK" && "${BBU[3]}" != "Testing" ] || [ "${BBU[4]}" != "OK" ]; then BBUEXITCODE=2 BBUERROR="BBU on $i failed" fiI also think I may change the status for VERIFYING from WARNING to OK, because really, everything IS OK. The controller is merely running VERIFY. FYI: I sent an email to the plugin author before I published this.
I also needed more flexible alert system for that plugin. Also emailed the author. My solution allows you to specify states that should generate warnings from: R (Rebuilding), (I) Initializing, V (Verifying) and P (Verify-Paused). Default is to not warn during these states.
Patch is long, but you can get the code here:
http://kf-compute1.path.utah.edu/public/code/nagios/
Also there is a plugin there that checks all the sata disks in a 3ware array for bad sectors. It needs some work, but I use it to monitor about 60 hard drives.
-Kael
p.s. I’ve been reading the FreeBSD diary since you started it. Thanks for the great resource.
[%sig%]
Post Edited (09-09-10 23:48)