Nagios System and Network Monitoring

Nagios System and Network Monitoring 116

Posted by samzenpus on Wednesday April 11, 2007 @03:22PM from the keep-an-eye-on-things dept.

David Martinjak writes "Nagios is an open source application for monitoring hosts, services, and conditions over a network. Availability of daemons and services can be tested, and specific statistics can be checked by Nagios to provide system and network administrators with vital information to help sustain uptime and prevent outages. Nagios: System and Network Monitoring is for everyone who has a network to run." Read on for the rest of the review.

Nagios: System and Network Monitoring
author	Wolfgang Barth
pages	464
publisher	No Starch Press
rating	9
reviewer	David Martinjak
ISBN	1593270704
summary	Covers installing, configuring, and deploying Nagios to monitor systems and services on a network.

The book is authored by Wolfgang Barth and published by No Starch Press. The publisher hosts a Web page which contains an online copy of the table of contents, portions of reviews, links to purchase the electronic and print versions of the book, and a sample chapter ("Chapter 7: Testing Local Resources") in PDF format.

An amusing note to begin: this is one of the only books I have read where the introduction was actually worth reading closely. Many books seem to talk about background or history of the subject without providing much pertinent information, if any at all. In Nagios: System and Network Monitoring, Wolfgang Barth begins with a hypothetical anecdote to illustrate the usefulness of Nagios. The most important section in the introduction, however, is the explanation of states in Nagios. While monitoring a resource, Nagios will return of one of four states. OK indicates nominal status, WARNING shows a potentially problematic circumstance, CRITICAL signifies an emergency situation, and UNKNOWN usually means there is an operating error with Nagios or the corresponding plugin. The definitions for each of these states are determined by the person or team who administers Nagios so that relevant thresholds can be set for the WARNING and CRITICAL status levels.

The first chapter walks the reader through installing Nagios to the filesystem. All steps are shown, which proves to be very helpful if you are unfamiliar with unpacking archives or compiling from source. Users who are either new to Linux, or cannot install Nagios through a package manager, will appreciate the verbosity offered here. Fortunately, the level of detail is consistent through the book.

Chapter 2 explains the configuration structure of Nagios to the reader. This chapter may contain the most important material in the book as understanding the layout of Nagios is essential to a successful deployment in any environment. The book moves right into enumerating the uses and purposes of the config files, objects, groupings, and templates. All of this information is valuable and presented in a descriptive manner to help the reader set up a properly configured installation of Nagios. My biggest stumbling block in using Nagios was wrapping my brain around the relationships of the config files and objects. This chapter clears up all of the ambiguities I remember having to work out for myself. If only this book had been around a few years ago!

The sixth chapter dives into the details of plugins that are available for monitoring network services. This chapter explains using the check_icmp plugin to ping both a host and a specific service for verifying reachability. Additional examples include monitoring mail servers, LDAP, web servers, and DNS among others. There is even a section for testing TCP and UDP ports.

Next, the book covers checking the status of local resources on systems. At work, we have a system in production that could have been partitioned better. Unfortunately, /var is a bit smaller than it should be, and tends to fill up relatively frequently. Thankfully, Nagios can trigger a warning when there is a low amount of free space left on the partition. From there, we have Nagios execute a script that cleans out certain items in /var so we don't have to bother with it. We can also receive notification if the situation does not improve, and requires further attention. In addition to monitoring hard drive usage, the book includes examples for checking swap utilization, system load, number of logged-in users, and even Nagios itself.

Chapter 12 discusses the notification system in Nagios. You provide who, what, when, where, and how in the configs, and Nagios does the rest. The book does a fantastic job of explaining what exactly triggers a notification, and how to efficiently configure Nagios to ensure the proper parties are being informed of relevant issues at reasonable intervals. For example, the server team might be interested to know that /var is 90% full on one of the LDAP servers; however they don't need to be notified of this every thirty seconds. This chapter also covers an important aspect of Nagios known as flapping. Flapping occurs when a monitored resource quickly alternates between states. Nagios can be configured for a certain tolerance against rapid alternating changes in states. This means Nagios won't sound the alarm if the problem will resolve itself in a short period of time. Usually flapping is caused by an external factor temporarily influencing the results of the test from Nagios; and therefore has no long-term impact.

The last major chapter to mention here deals with essentially anything and everything about the Nagios Web interface. The main point of interaction between the administrator and Nagios is the fully featured Web interface. This chapter covers recognizing and working on problems, planning downtimes, making configuration changes, and more. I especially like that the book gives an overview of each of the individual CGI programs that the Web interface is composed of; as these files are important for UI customization.

The only aspect of this book that I did not care for was that the book reads like a reference manual at times. The first several chapters start out more conversational in tone with great explanations of the procedures and files; but later it sometimes feels like I am repeatedly reading an iterated piece-by-piece structure, filled in with the content for that chapter. That is not necessarily bad all together as it does provide consistency in the presentation of the information. Additionally, the level of detail is outstanding throughout the book. The explanations are never too short or too long. This is definitely a valuable book for administrators at all levels with fantastic breadth and depth of material. Administrators who are interested in proactive management of their systems and networks should be pleased with Nagios: System and Network Monitoring.

Nagios is licensed under the GNU General Public License Version 2, and can be downloaded from http://nagios.org.

David Martinjak is a programmer, GNU/Linux addict, and the director of 2600 in Cincinnati, Ohio. He can be reached at david.martinjak@gmail.com.

You can purchase Nagios: System and Network Monitoring from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

This discussion has been archived. No new comments can be posted.