I do a lot of things using passive checks — if there are things I want to keep an eye on without actually watching all the time.
For example, consider the following:
define service { use bidaily-service name bidaily-service-passive active_checks_enabled 0 ; service is passive only passive_checks_enabled 1 ; enable passive which seems redundant but for clarity check_freshness 1 ; ...but check freshness to catch when the service isn't reporting in freshness_threshold 129600 ; == 36 hours: echo 36 60 * 60 * p|dc -- to catch 2 failures check_command panic-run-in-circles-shouting ; command to be run when freshness fails register 0 }
… and an instance of that template:
define service{ use bidaily-service-passive ; passive: triggered by /etc/cron.daily/mirror-idisk host_name localhost service_description iDisk Sync }
In this case, I also have a script /etc/cron.daily/mirror-idisk that backs up my Apple iDisk (I love backups) and finishes with:
date "+%s PROCESS_SERVICE_CHECK_RESULT;localhost:iDisc sync:0:iDisk Sync OK %Y-%m-%d_%H%m" >> /var/nagios/rw/nagios.cmd
As you can see, this script does its work, and drops a successful return code into Nagios; Nagios simply shows it with a happy green marker on the status page.
What happens if the script’s action fails? It gives a bad result, and Nagios reports that.
What happens if the script has an error and chokes and dies? Nagios sees no result for 36 hours, and executes the “panic-run-in-circles-screaming” command. In my case, that’s another command that puts a failure into a queue, but that’s a bit tangental.
This is quite effective especially when my Nagios is tied to my Jabber, and can escalate to a twitter feed that reaches my by SMS. I know that errors will reach me, so I never have to check the status screen.
Recent Comments