Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×

Nagios System and Network Monitoring 116

David Martinjak writes "Nagios is an open source application for monitoring hosts, services, and conditions over a network. Availability of daemons and services can be tested, and specific statistics can be checked by Nagios to provide system and network administrators with vital information to help sustain uptime and prevent outages. Nagios: System and Network Monitoring is for everyone who has a network to run." Read on for the rest of the review.
Nagios: System and Network Monitoring
author Wolfgang Barth
pages 464
publisher No Starch Press
rating 9
reviewer David Martinjak
ISBN 1593270704
summary Covers installing, configuring, and deploying Nagios to monitor systems and services on a network.


The book is authored by Wolfgang Barth and published by No Starch Press. The publisher hosts a Web page which contains an online copy of the table of contents, portions of reviews, links to purchase the electronic and print versions of the book, and a sample chapter ("Chapter 7: Testing Local Resources") in PDF format.

An amusing note to begin: this is one of the only books I have read where the introduction was actually worth reading closely. Many books seem to talk about background or history of the subject without providing much pertinent information, if any at all. In Nagios: System and Network Monitoring, Wolfgang Barth begins with a hypothetical anecdote to illustrate the usefulness of Nagios. The most important section in the introduction, however, is the explanation of states in Nagios. While monitoring a resource, Nagios will return of one of four states. OK indicates nominal status, WARNING shows a potentially problematic circumstance, CRITICAL signifies an emergency situation, and UNKNOWN usually means there is an operating error with Nagios or the corresponding plugin. The definitions for each of these states are determined by the person or team who administers Nagios so that relevant thresholds can be set for the WARNING and CRITICAL status levels.

The first chapter walks the reader through installing Nagios to the filesystem. All steps are shown, which proves to be very helpful if you are unfamiliar with unpacking archives or compiling from source. Users who are either new to Linux, or cannot install Nagios through a package manager, will appreciate the verbosity offered here. Fortunately, the level of detail is consistent through the book.

Chapter 2 explains the configuration structure of Nagios to the reader. This chapter may contain the most important material in the book as understanding the layout of Nagios is essential to a successful deployment in any environment. The book moves right into enumerating the uses and purposes of the config files, objects, groupings, and templates. All of this information is valuable and presented in a descriptive manner to help the reader set up a properly configured installation of Nagios. My biggest stumbling block in using Nagios was wrapping my brain around the relationships of the config files and objects. This chapter clears up all of the ambiguities I remember having to work out for myself. If only this book had been around a few years ago!

The sixth chapter dives into the details of plugins that are available for monitoring network services. This chapter explains using the check_icmp plugin to ping both a host and a specific service for verifying reachability. Additional examples include monitoring mail servers, LDAP, web servers, and DNS among others. There is even a section for testing TCP and UDP ports.

Next, the book covers checking the status of local resources on systems. At work, we have a system in production that could have been partitioned better. Unfortunately, /var is a bit smaller than it should be, and tends to fill up relatively frequently. Thankfully, Nagios can trigger a warning when there is a low amount of free space left on the partition. From there, we have Nagios execute a script that cleans out certain items in /var so we don't have to bother with it. We can also receive notification if the situation does not improve, and requires further attention. In addition to monitoring hard drive usage, the book includes examples for checking swap utilization, system load, number of logged-in users, and even Nagios itself.

Chapter 12 discusses the notification system in Nagios. You provide who, what, when, where, and how in the configs, and Nagios does the rest. The book does a fantastic job of explaining what exactly triggers a notification, and how to efficiently configure Nagios to ensure the proper parties are being informed of relevant issues at reasonable intervals. For example, the server team might be interested to know that /var is 90% full on one of the LDAP servers; however they don't need to be notified of this every thirty seconds. This chapter also covers an important aspect of Nagios known as flapping. Flapping occurs when a monitored resource quickly alternates between states. Nagios can be configured for a certain tolerance against rapid alternating changes in states. This means Nagios won't sound the alarm if the problem will resolve itself in a short period of time. Usually flapping is caused by an external factor temporarily influencing the results of the test from Nagios; and therefore has no long-term impact.

The last major chapter to mention here deals with essentially anything and everything about the Nagios Web interface. The main point of interaction between the administrator and Nagios is the fully featured Web interface. This chapter covers recognizing and working on problems, planning downtimes, making configuration changes, and more. I especially like that the book gives an overview of each of the individual CGI programs that the Web interface is composed of; as these files are important for UI customization.

The only aspect of this book that I did not care for was that the book reads like a reference manual at times. The first several chapters start out more conversational in tone with great explanations of the procedures and files; but later it sometimes feels like I am repeatedly reading an iterated piece-by-piece structure, filled in with the content for that chapter. That is not necessarily bad all together as it does provide consistency in the presentation of the information. Additionally, the level of detail is outstanding throughout the book. The explanations are never too short or too long. This is definitely a valuable book for administrators at all levels with fantastic breadth and depth of material. Administrators who are interested in proactive management of their systems and networks should be pleased with Nagios: System and Network Monitoring.

Nagios is licensed under the GNU General Public License Version 2, and can be downloaded from http://nagios.org.

David Martinjak is a programmer, GNU/Linux addict, and the director of 2600 in Cincinnati, Ohio. He can be reached at david.martinjak@gmail.com.


You can purchase Nagios: System and Network Monitoring from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
This discussion has been archived. No new comments can be posted.

Nagios System and Network Monitoring

Comments Filter:
  • by Critical Facilities ( 850111 ) on Wednesday April 11, 2007 @03:32PM (#18693651)
    Ok, man, I swear this isn't a troll, but I have to know, what the heck, [slashdot.org] are you doing [slashdot.org] to these books? [slashdot.org]

    I mean, it's none of my business, but do you have some insane reading technique?
  • Re:Others (Score:3, Interesting)

    by phish ( 46788 ) on Wednesday April 11, 2007 @04:29PM (#18694387)
    Try Hyperic: http://www.hyperic.com/ [hyperic.com]

    GPL, 30-minute or less setup time, auto discovery and built in support for monitoring, controlling, and log tracking for anything you can think of. 9 OS's, 42 apps, network devices, extensible plugins....

    Nagios is great, but I agree with the parent that the time it takes to set up and maintain is unreasonable. Oh, and yes, I'm biased. I work for Hyperic.

    -javier
  • by sysmanman ( 1080819 ) on Wednesday April 11, 2007 @04:39PM (#18694519) Homepage
    At the risk of getting off-topic, I'm tired of stuff that doesn't quite work. (can't comment on the actual book because I haven't read it) However, I can't see how Nagios can even begin to satisfy the needs of most modern IT operations folks. These days, most people need to know a lot more than whether machine X is up. They need to know which part(s) of their web apps are not functioning correctly. They need a lot more intricate detail than is possible with Nagios or SNMP-based monitoring tools. Really, the only monitoring tool that does it for me is Hyperic [hyperic.com].
  • by schlick ( 73861 ) on Wednesday April 11, 2007 @05:03PM (#18694805)
    Have you looked at Hyperic? http://www.hyperic.com/ [hyperic.com] I'm using the open source version and I like it alot.
  • Comment removed (Score:3, Interesting)

    by account_deleted ( 4530225 ) on Wednesday April 11, 2007 @05:13PM (#18694969)
    Comment removed based on user account deletion
  • by Colin Smith ( 2679 ) on Wednesday April 11, 2007 @07:24PM (#18696383)
    Personally. Zabbix.

    Big Brother/Sister don't really scale.

    Nagios is horrible to administer.

    Jffnms is nice, the most feature complete, but not robust enough.

    OpenNMS looks interesting but I've never had the time to set it up.

    Cacti/MRTG are trending systems.

    Zabbix or OpenNMS.
     
  • Re:Others (Score:2, Interesting)

    by aclark4life ( 639571 ) on Wednesday April 11, 2007 @10:05PM (#18697629) Homepage
    There's also ZENOSS (http://www.zenoss.com/), I didn't see anyone else mention so I thought I would. Haven't tried it yet but I like that it's Zope based (because I am a Zope consultant).
  • by thurgoodj187 ( 905656 ) on Wednesday April 11, 2007 @10:21PM (#18697707)
    Zenoss has a Virtual appliance out on the VMWare site, makes it real easy to test and evaluate! I've got it running (Whenever I've got my laptop up)
  • by Emrys ( 7536 ) on Thursday April 12, 2007 @11:50AM (#18703061)
    You know, I was reasonably interested in Hyperic and ZenOSS when they were first announced. Competition is good, and though I'm quite happy with what I've been doing with Netsaint and then Nagios (yes, "in the Enterprise"), I was glad to look at them and see what new things they brought to the table.

    So far I've been utterly disgusted by the FUD and BS you guys are spewing, and I've lost about all interest in caring what you think you're bringing to the table. I've yet to hear any of you actually do a meaningful technical comparison beyond "uh, Nagios is like, hard, you know?" and "ZOMG 30 minutes, auto-discovery FTW!!!11!". Well, guess what: if you only have 30 minutes to spend configuring your monitoring solution "in the Enterprise", you're pretty well doomed to spend a lot more time than that dealing with false alerts (both positive and negative) and irate users and admins. Knowing you have an apache server on port 8080 of server X is about 2% of the problem. It's a lot more important to know what application sits there and what other services and hosts it depends on so you can implement sane end-to-end monitoring that can do a full test of actual application functionality and if something is broken tell you which part of the tree actually has the problem, not just "oh noes teh port 8080 is down!!" (or better yet, "teh port 8080 is up!! no problemz!!" when the app you actually care about is returning a dead page instead of processing data). So tell me: is all this also "cake" under Hyperic? And if so, how is it "objectively better" done than Nagios does it?

    Auto-discovery is a marketing feature, but if that's all some inexperienced admin thinks they need it's not even hard to do with the 80 Nagios helper utilities that do it for you. As for a "pluggable framework", you'd be very hard-pressed to demonstrate anything more flexible than Nagios. Hell, we've been known to use it to monitor business processes and workflow efficiencies. But please do at least try, and stop talking liking a marketdroid.

It's a naive, domestic operating system without any breeding, but I think you'll be amused by its presumption.

Working...