Friday, May 20, 2011

The Simple Logic of Monitoring Scripts

Ultimately, at some point in time, almost every systems administrator runs into a situation where they need to monitor a system, service, or process and the monitoring solution that they employ is simply not capable of providing them with the desired results.  Therefore, the system administrator is left with several option.

First, he or she can ask his or her monitoring solution provider to develop or redevelop the necessary checks that are needed to monitor the system, service, or process accordingly.  This process can range from several hours to several months, occasionally even never, depending on the provider and the monitoring "check" that is needed.  Further, this process can be long and frustrating, as it can involve countless hours spent in telephone, email, instant message, and work ticket communications.

Second, the systems administrator can turn to the company's development team; that is if the company has even has one.  As with the previous options, this process can take an exceptionally long time to come to fruition, and then even longer to refine.  Please reference this comic for much more detail on that process: http://blog.thingsdesigner.com/uploads/id/tree_swing_development_requirements.jpg.  Also as before, this can and most likely will involve countless hours of interdepartmental meetings and telephone, email, instant message, and work ticket communications.

Finally, the third option is for the systems administrator to take things into his or her own hands and develop a monitoring "check" themselves that they can either run as a local service, scheduled task, or use the company's current monitoring solution to call/run the check and pull in the result of the current values.  While this process can take a fair bit of the systems administrator's time depending on the check, it can prove to be the least stressful, simplest, cheapest, and yes even most rewarding option of the three mentioned.  Thus, the question becomes why do more systems administrators not write their own checks more frequently?

In some cases, it is due to time.  Even if the process of someone else creating the check may overall take longer, some systems administrators simply do not have the necessary time to work on and develop the checks that he or she need or would like to see in place.  At other times, it may be that the administrator is more so systems minded and therefore knows what they would like to see monitored, but he or she has no idea how to go about accomplishing the task.  However, in a lot of cases it may just be that the systems administrator is unsure about how to get started, or they are unsure as to how the logic of the script should be written so as to properly inform them when something goes wrong.

Hence the reasoning for this post.  Below, readers will find a collection of extremely simple scripts, in Bash, Perl, and PowerShell showing the basic logic of how checks can be run effectively to minimize false positives and false negatives.  The key to remember is that your "check" should never default to an "OK" response.  The script should have to meet a specific criteria in order to return an OK or GOOD, etc result.

Scripts: Basic Logic Sample Scripts

Hopefully these simple examples will help some administrators to get on their way to developing new or more effective monitoring checks/scripts.  More posts will hopefully be forthcoming with more in-depth monitoring script discussions.

No comments:

Post a Comment