Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Image

Nagios 3 Enterprise Network Monitoring 147

jgoguen writes "Nagios, originally known as Netsaint, has been a long-time favourite for network and device monitoring due to its flexibility, ease of use, and efficiency. Nagios provided, and still provides today, a low-cost, versatile alternative to commercial network monitoring applications. Nagios 3 takes a huge step forward compared to Nagios 2, providing improved flexibility, ease of use and extensibility, all while also making significant performance enhancements. Due to its extensibility and ease of use, no device or situation has yet been found that cannot be monitored using Nagios and a pre-made or custom script, plug-in or enhancement." Read on for the rest of jgoguen's review.
Nagios 3: Enterprise Network Monitoring
author Max Schubert, Derrick Bennett, Jonathan Gines, Andrew Hay, John Strand
pages 339
publisher Syngress
rating 8
reviewer jgoguen
ISBN 978-1-59749-267-6
summary Making Nagios 3 work for you and your business.
The first chapter is devoted to new features in Nagios 3. The major changes implemented for Nagios 3, which includes changes to data storage options and locations, checks, configuration objects, and macros, are discussed here. Operational, performance, and usability enhancements are also discussed here. Users upgrading from Nagios 2, or users who may already be familiar with Nagios 2, will gain the most from this chapter. New users will still gain value from this chapter, however, since a number of changes also involve some of the major features of Nagios. In addition, users who may be referring to configuration file samples created for Nagios 2 will save a great deal of time referring to this chapter for changes. Using Nagios 2 configuration files directly prevents users from enjoying some new features of Nagios 3. Users who will only be writing plug-ins and scripts for their local Nagios deployment might not find Chapter 1 very useful.

Chapters 2 and 3 deal with scaling Nagios to work efficiently within large deployments. First, designing a Nagios configuration for large organizations is shown. This is something that all Nagios administrators should make use of when designing configurations, not only administrators in large organizations, because a properly done configuration for a small organization will easily scale up as the organization grows. I was impressed to see that the authors stress the importance of the end user's input when designing configurations. Administrators who ignore this piece of advice risk the success of Nagios in their organization. Various diagrams help to explain the relationships between the various Nagios configuration objects. A good amount of detail is provided regarding allowing various groups within an organization to have semi-independent control over how Nagios interacts with their hosts and services, and how Nagios alerts their staff. The authors have included numerous configuration file snippets, which allows a Nagios administrator to very quickly create a configuration file and then tweak the configuration parameters to suit local requirements.

Scaling the Nagios graphical user interface (GUI) follows a very simple concept: use a "less is more" approach. Although the specific details here deal with Nagios, the general idea is equally applicable to anyone displaying information they expect their users to actually pay attention to. In general, users should be able to see as much as they want (limited by resources and permissions) but only be shown what they need to know about by default. For example, the system administrator for marketing probably does not need to know when the development disk image server goes down, while the development system administrator would probably be very interested. Utilizing user accounts allows the administrator to allow various groups to have access to Nagios filtered by its fine-grained permissions system. Users from various groups can also be shown only what they need to be shown by default, without the need to select a particular area first. Utilizing user accounts also prevents users who need to view Nagios from having full administrative control, and allows for records of each user's actions to be made. Using a patch provided with the book's download package will enable Nagios to have read-only accounts as well, which is great for organizations who would like to grant certain users (or groups) access to view Nagios but not make any changes. As an example, an organization's help desk could use Nagios to determine quickly whether users are unable to access services because of an outage, or if further troubleshooting is necessary.

The authors continue on here to discuss clustering, failover, and the future of the Nagios GUI. I'm not convinced that these belong in a chapter devoted to scaling the Nagios GUI, since these seem to mostly deal with scaling the entire Nagios deployment. Regardless, they are all very important topics, especially when Nagios is heavily relied upon. Clustering allows remote sites to have a Nagios instance local to the site monitoring hosts and devices rather than requiring a central Nagios instance to monitor remote hosts and services. Not only would monitoring hosts and services take much longer due to the WAN links between the central instance and remote locations, but also due to the security implications of allowing the checks to be done. The authors don't discuss the security side of clustering, but it's still something that every Nagios administrator (and everyone else!) should keep in mind. The clustering section deals primarily with the rationale behind clustering and how to configure the local and remote instances of Nagios properly, but the authors include a good deal of information here that a less experienced Nagios administrator might overlook. Most notable is their discussion about the display of service status when a service is reachable from the master server but not from a remote instance. While Nagios can translate the remote instance's check result to be displayed from its own perspective, it may be more desirable to have the master Nagios GUI display the results from the perspective of the server which made the check. After implementing clustering, some sort of fallback mechanism is required. Failover and redundancy are the two main choices, and that's what the authors discuss next. They don't spend much time on redundancy, since this would require each redundant Nagios instance to perform its own set of checks, which can significantly raise the load on both the monitored hosts and the network in general. Given the problems it can introduce, the authors have spent more time on redundancy than most administrators should spend considering. Failover is a much better solution, and the authors do a great job of covering the setup of a proper failover setup. As usual, they make sure to remind readers of some things that are easily overlooked, especially when you're trying to get Nagios back up and running when the master server crashes.

Chapters 4 and 5 discuss Nagios plug-ins, add-ons, and enhancements. These chapters alone are worth the price of the book because of how much time they can save. It's much faster to copy a script and make minor tweaks than it is to try reinventing the wheel, and with the number of scenarios covered here combined with the Nagios user community there aren't very many things that haven't been done already. Whether you want to test command-line interfaces, CPU usage, memory utilization, bandwidth utilization, HTML pages, LDAP services, or even specialized hardware, there's probably already a plug-in written for it. Most common scenarios actually have a plug-in already included in this book. The available add-ons and plug-ins are equally varied, providing ways to monitor hosts across security zones, configure read-only displays that live in a security zone other than the one Nagios is in, interface with Cacti, and even read out alerts. Even more scenarios can be handled by other scripts provided by the Nagios community.

Chapter 6 goes into detail on how to integrate Nagios into an enterprise environment. This chapter goes into just enough detail to get Nagios configured to work with a large number of third-party services, such as LDAP authentication, Cacti, Puppet, and Splunk. Emphasis here is always placed on the human element; how to use Nagios to help help desk and/or NOC staff do their jobs more efficiently and effectively, and how to gain maximum support for Nagios within the organization. The importance of the human element, in all its forms, simply cannot be overstated, and the the authors have done a wonderful job of outlining a good way to make Nagios an integral part of an organization. A lot of the material towards the end of the chapter, especially the section on smaller Network Operation Centres, could be used by anyone looking for ways to help a small group work together effectively.

Chapter 7 is another chapter with a lot of content easily applicable outside of a Nagios environment. The chapter begins with the authors reminding you to know your network and to watch out for session hijack attacks, then show you how to use Nagios to do both. Nagios can't replace a competent network administrator, but it can make their lives easier and the authors show you here how the configuration you've already done on Nagios already shows you a potential session hijack attack and how it forces you to properly know your network. Nagios forces you to know your network not only by how it's built and by what devices are in use, but it also requires that you have a solid handle on what constitutes normal conditions for all your devices and services.

Another area which is very important to companies, especially companies operating in the United States, that Nagios can assist with is regulatory compliance. The authors outline how a company could use Nagios to assist with compliance with Sarbanes-Oxley (SOX) with COBIT or COSO, Payment Card Industry (PCI) Data Security Standard (DSS), Director of Central Intelligence Directive (DCID) 6/3 and Department of Defence (DoD) Information Assurance Certification and Accreditation Process (DIACAP). Nagios alone isn't enough to be compliant, at the very least detailed documentation will also be required, but the authors give a good overview of how Nagios can assist with compliance in all of these regulations.

The final chapter helps to bring the rest of the book together by walking through a full Nagios configuration for a fictional Fortune 500 corporation. The bulk of this chapter covers the pre-deployment stage of a Nagios deployment, but that doesn't mean that there isn't a lot to learn about deploying Nagios. A major hurdle towards deploying Nagios in an organization is the pre-deployment phase, and the authors outline here how to easily turn this major challenge into a series of simple steps to increase the chances of Nagios' success in your organization. From the very beginning, you can see how involving the customer early and starting small, along with everything else, becomes a part of a process. Although it's specific to Nagios, the process followed here could be easily adapted to integrating any sort of monitoring service. The remainder of the chapter is devoted to how you might integrate Nagios into a Fortune 500 company, finishing the book off with some good advice for integrating Nagios.

Despite all the book's strengths, there is some room for improvement. In chapter 2, it may have been more effective to outline the relationships between the Nagios configuration objects before discussing configuration planning. I found it much easier to think of a configuration for a large organization after knowing about how Nagios' configuration objects relate to each other.

Throughout the book, the authors have included configuration file snippets, scripts, and example script output in the main text. While all of these are quite useful and serve to enhance the book, I think it would have been better if these were all included in an appendix instead, perhaps keeping only the relevant parts of configuration snippets in the main text for clarification.

At the end of chapter 3, the sections on the future of Nagios and the CGI front end are informational and interesting, but they would be better placed in a separate chapter dealing with the potential future of Nagios in general. These and the other major areas of Nagios combined would provide more than enough material for a full chapter on their collective futures.

Overall, this is a great book for anyone using Nagios as more than a casual user, and is still very informative for the casual user. A few of these chapters alone would be worth the price of the whole book.

Disclaimer: I worked with one author when I was asked to review this book.

You can purchase Nagios 3: Enterprise Network Monitoring from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

*

This discussion has been archived. No new comments can be posted.

Nagios 3 Enterprise Network Monitoring

Comments Filter:
  • by druiid ( 109068 ) on Wednesday October 08, 2008 @03:44PM (#25304549)

    I used nagios for years.. many many years. It has to be, as many have already pointed out.. the most difficult to configure OSS project ever made.

    That said, it was fairly powerful once configured properly.

    The thing is, though, that is has many shortcomings. I found a much better (although not necessarily as scalable) monitoring and data-gathering solution in Zabbix. They recently released a new version as well that adds many really nice capabilities like ipmi support.

  • Zabbix (Score:5, Interesting)

    by kosmosik ( 654958 ) <kos AT kosmosik DOT net> on Wednesday October 08, 2008 @04:04PM (#25304807) Homepage

    I like Nagios but I can't really imagine how to apply it in large (think ten thousand hosts) setup in multiple regional/organizational branches and so on.

    Also Nagios *is* painful to setup. First of all AFAIK there is no way to delegate administration f.e. to organizational branches. Configuration is just a big pile of config files included from some other config files etc. There is no autodiscovery/autoconfiguration of hosts since Nagios team belives it is BAD etc.

    Well IMHO Nagios is grat but it is like, a big fat pile of hacked scripts and configs. Not too elegant but working.

    Now... I am (well we are in my organization) using Zabbix and I find it great. It is much better organised/elegant than Nagios.

    In Zabbix architecture you have well designed atomic elements like checks, items, services (groups), etc. It also gathers fine tuned historical data for trends and historical review. You can compact the data (lower the resolution) after a given time and so on. It is in fact a very complete monitoring framework with its own internal condition language, escalation engine. You can gather data from network checks, SNMP, custom scripts, Zabbix agents (aviable for most platforms) etc.

    And it has normal configuration, not crude text config files. I have nothing against text files but sometime I don't really want to open my text editor only to quickly setup an ad-hoc overwiev screen with maps, graphs, status displays, clocks and you can have few screens of such rotating on your big screens in NOC. All with mouse clicking.

    I can give it as a tool for sysadmin and he or she can work with it without having to study manuals. Not everybody in your organization is an unix hacker you know...

    We have dozens of branch servers which are managed by local sysadmins and a farm of central servers which is managed by central staff.

    Zabbix works in distributed manner so a local branch can have very detailed view on their infrastructure and at central level I can have an functional/business overview of entire infrastructure, core services (like business systems, transactions etc.) Not just simple checks if RAID is OK - I don't care if RAID in some server is OK. I need to know why (where, who to blame) given service (be it MQ/WebSphere) is not working as desired.

    And also it is free, open source and aviable in most linux/unix distributions as a standard package. So when considering enterprise monitoring platform do yourself a favour and also check Zabbix.

    http://www.zabbix.com/downloads/ZABBIX%20Manual%20v1.6.pdf [zabbix.com]

  • Re:Spam alert! (Score:2, Interesting)

    by perldork ( 1381373 ) on Wednesday October 08, 2008 @10:19PM (#25308491)
    Set up dependencies is a must, use notification_delay wisely, and only send out emails or other 'push' notifications for problems that have to have immediate attention. I like to take the approach of monitoring everything that is important but only send emails out for problems that really truly require immediate attention from on-call staff.
  • Re:Spam alert! (Score:2, Interesting)

    by empaler ( 130732 ) on Thursday October 09, 2008 @09:22AM (#25312563) Journal
    Also, configure planned downtime FTW. I do enjoy that it can be set to auto-ignore flapping hosts.

Software production is assumed to be a line function, but it is run like a staff function. -- Paul Licker

Working...