Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Image

Nagios 3 Enterprise Network Monitoring 147

jgoguen writes "Nagios, originally known as Netsaint, has been a long-time favourite for network and device monitoring due to its flexibility, ease of use, and efficiency. Nagios provided, and still provides today, a low-cost, versatile alternative to commercial network monitoring applications. Nagios 3 takes a huge step forward compared to Nagios 2, providing improved flexibility, ease of use and extensibility, all while also making significant performance enhancements. Due to its extensibility and ease of use, no device or situation has yet been found that cannot be monitored using Nagios and a pre-made or custom script, plug-in or enhancement." Read on for the rest of jgoguen's review.
Nagios 3: Enterprise Network Monitoring
author Max Schubert, Derrick Bennett, Jonathan Gines, Andrew Hay, John Strand
pages 339
publisher Syngress
rating 8
reviewer jgoguen
ISBN 978-1-59749-267-6
summary Making Nagios 3 work for you and your business.
The first chapter is devoted to new features in Nagios 3. The major changes implemented for Nagios 3, which includes changes to data storage options and locations, checks, configuration objects, and macros, are discussed here. Operational, performance, and usability enhancements are also discussed here. Users upgrading from Nagios 2, or users who may already be familiar with Nagios 2, will gain the most from this chapter. New users will still gain value from this chapter, however, since a number of changes also involve some of the major features of Nagios. In addition, users who may be referring to configuration file samples created for Nagios 2 will save a great deal of time referring to this chapter for changes. Using Nagios 2 configuration files directly prevents users from enjoying some new features of Nagios 3. Users who will only be writing plug-ins and scripts for their local Nagios deployment might not find Chapter 1 very useful.

Chapters 2 and 3 deal with scaling Nagios to work efficiently within large deployments. First, designing a Nagios configuration for large organizations is shown. This is something that all Nagios administrators should make use of when designing configurations, not only administrators in large organizations, because a properly done configuration for a small organization will easily scale up as the organization grows. I was impressed to see that the authors stress the importance of the end user's input when designing configurations. Administrators who ignore this piece of advice risk the success of Nagios in their organization. Various diagrams help to explain the relationships between the various Nagios configuration objects. A good amount of detail is provided regarding allowing various groups within an organization to have semi-independent control over how Nagios interacts with their hosts and services, and how Nagios alerts their staff. The authors have included numerous configuration file snippets, which allows a Nagios administrator to very quickly create a configuration file and then tweak the configuration parameters to suit local requirements.

Scaling the Nagios graphical user interface (GUI) follows a very simple concept: use a "less is more" approach. Although the specific details here deal with Nagios, the general idea is equally applicable to anyone displaying information they expect their users to actually pay attention to. In general, users should be able to see as much as they want (limited by resources and permissions) but only be shown what they need to know about by default. For example, the system administrator for marketing probably does not need to know when the development disk image server goes down, while the development system administrator would probably be very interested. Utilizing user accounts allows the administrator to allow various groups to have access to Nagios filtered by its fine-grained permissions system. Users from various groups can also be shown only what they need to be shown by default, without the need to select a particular area first. Utilizing user accounts also prevents users who need to view Nagios from having full administrative control, and allows for records of each user's actions to be made. Using a patch provided with the book's download package will enable Nagios to have read-only accounts as well, which is great for organizations who would like to grant certain users (or groups) access to view Nagios but not make any changes. As an example, an organization's help desk could use Nagios to determine quickly whether users are unable to access services because of an outage, or if further troubleshooting is necessary.

The authors continue on here to discuss clustering, failover, and the future of the Nagios GUI. I'm not convinced that these belong in a chapter devoted to scaling the Nagios GUI, since these seem to mostly deal with scaling the entire Nagios deployment. Regardless, they are all very important topics, especially when Nagios is heavily relied upon. Clustering allows remote sites to have a Nagios instance local to the site monitoring hosts and devices rather than requiring a central Nagios instance to monitor remote hosts and services. Not only would monitoring hosts and services take much longer due to the WAN links between the central instance and remote locations, but also due to the security implications of allowing the checks to be done. The authors don't discuss the security side of clustering, but it's still something that every Nagios administrator (and everyone else!) should keep in mind. The clustering section deals primarily with the rationale behind clustering and how to configure the local and remote instances of Nagios properly, but the authors include a good deal of information here that a less experienced Nagios administrator might overlook. Most notable is their discussion about the display of service status when a service is reachable from the master server but not from a remote instance. While Nagios can translate the remote instance's check result to be displayed from its own perspective, it may be more desirable to have the master Nagios GUI display the results from the perspective of the server which made the check. After implementing clustering, some sort of fallback mechanism is required. Failover and redundancy are the two main choices, and that's what the authors discuss next. They don't spend much time on redundancy, since this would require each redundant Nagios instance to perform its own set of checks, which can significantly raise the load on both the monitored hosts and the network in general. Given the problems it can introduce, the authors have spent more time on redundancy than most administrators should spend considering. Failover is a much better solution, and the authors do a great job of covering the setup of a proper failover setup. As usual, they make sure to remind readers of some things that are easily overlooked, especially when you're trying to get Nagios back up and running when the master server crashes.

Chapters 4 and 5 discuss Nagios plug-ins, add-ons, and enhancements. These chapters alone are worth the price of the book because of how much time they can save. It's much faster to copy a script and make minor tweaks than it is to try reinventing the wheel, and with the number of scenarios covered here combined with the Nagios user community there aren't very many things that haven't been done already. Whether you want to test command-line interfaces, CPU usage, memory utilization, bandwidth utilization, HTML pages, LDAP services, or even specialized hardware, there's probably already a plug-in written for it. Most common scenarios actually have a plug-in already included in this book. The available add-ons and plug-ins are equally varied, providing ways to monitor hosts across security zones, configure read-only displays that live in a security zone other than the one Nagios is in, interface with Cacti, and even read out alerts. Even more scenarios can be handled by other scripts provided by the Nagios community.

Chapter 6 goes into detail on how to integrate Nagios into an enterprise environment. This chapter goes into just enough detail to get Nagios configured to work with a large number of third-party services, such as LDAP authentication, Cacti, Puppet, and Splunk. Emphasis here is always placed on the human element; how to use Nagios to help help desk and/or NOC staff do their jobs more efficiently and effectively, and how to gain maximum support for Nagios within the organization. The importance of the human element, in all its forms, simply cannot be overstated, and the the authors have done a wonderful job of outlining a good way to make Nagios an integral part of an organization. A lot of the material towards the end of the chapter, especially the section on smaller Network Operation Centres, could be used by anyone looking for ways to help a small group work together effectively.

Chapter 7 is another chapter with a lot of content easily applicable outside of a Nagios environment. The chapter begins with the authors reminding you to know your network and to watch out for session hijack attacks, then show you how to use Nagios to do both. Nagios can't replace a competent network administrator, but it can make their lives easier and the authors show you here how the configuration you've already done on Nagios already shows you a potential session hijack attack and how it forces you to properly know your network. Nagios forces you to know your network not only by how it's built and by what devices are in use, but it also requires that you have a solid handle on what constitutes normal conditions for all your devices and services.

Another area which is very important to companies, especially companies operating in the United States, that Nagios can assist with is regulatory compliance. The authors outline how a company could use Nagios to assist with compliance with Sarbanes-Oxley (SOX) with COBIT or COSO, Payment Card Industry (PCI) Data Security Standard (DSS), Director of Central Intelligence Directive (DCID) 6/3 and Department of Defence (DoD) Information Assurance Certification and Accreditation Process (DIACAP). Nagios alone isn't enough to be compliant, at the very least detailed documentation will also be required, but the authors give a good overview of how Nagios can assist with compliance in all of these regulations.

The final chapter helps to bring the rest of the book together by walking through a full Nagios configuration for a fictional Fortune 500 corporation. The bulk of this chapter covers the pre-deployment stage of a Nagios deployment, but that doesn't mean that there isn't a lot to learn about deploying Nagios. A major hurdle towards deploying Nagios in an organization is the pre-deployment phase, and the authors outline here how to easily turn this major challenge into a series of simple steps to increase the chances of Nagios' success in your organization. From the very beginning, you can see how involving the customer early and starting small, along with everything else, becomes a part of a process. Although it's specific to Nagios, the process followed here could be easily adapted to integrating any sort of monitoring service. The remainder of the chapter is devoted to how you might integrate Nagios into a Fortune 500 company, finishing the book off with some good advice for integrating Nagios.

Despite all the book's strengths, there is some room for improvement. In chapter 2, it may have been more effective to outline the relationships between the Nagios configuration objects before discussing configuration planning. I found it much easier to think of a configuration for a large organization after knowing about how Nagios' configuration objects relate to each other.

Throughout the book, the authors have included configuration file snippets, scripts, and example script output in the main text. While all of these are quite useful and serve to enhance the book, I think it would have been better if these were all included in an appendix instead, perhaps keeping only the relevant parts of configuration snippets in the main text for clarification.

At the end of chapter 3, the sections on the future of Nagios and the CGI front end are informational and interesting, but they would be better placed in a separate chapter dealing with the potential future of Nagios in general. These and the other major areas of Nagios combined would provide more than enough material for a full chapter on their collective futures.

Overall, this is a great book for anyone using Nagios as more than a casual user, and is still very informative for the casual user. A few of these chapters alone would be worth the price of the whole book.

Disclaimer: I worked with one author when I was asked to review this book.

You can purchase Nagios 3: Enterprise Network Monitoring from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

*

This discussion has been archived. No new comments can be posted.

Nagios 3 Enterprise Network Monitoring

Comments Filter:
  • Re:Spam alert! (Score:5, Informative)

    by Ngarrang ( 1023425 ) on Wednesday October 08, 2008 @02:26PM (#25303235) Journal

    A review, by an associate of the author, of an obscure product, with a picture of the book plastered on the front page of Slashdot. Who was paid off for that?

    Obscure product? What world have you been living in?

  • Cacti Users (Score:2, Informative)

    by cmorford ( 906819 ) on Wednesday October 08, 2008 @02:35PM (#25303415)
    I've used Nagios, but found Cacti and haven't turned back. Any other Cacti users out there? I found Cacti to be much easier to setup than Nagios and fairly extensible for the advance user.
  • Re:Cacti Users (Score:1, Informative)

    by Anonymous Coward on Wednesday October 08, 2008 @03:05PM (#25303947)
    Try ClearSite... it's what Cacti is to MRTG. http://clearsite.sourceforge.net/coming-soon.html [sourceforge.net] Linux only for now, but the developers are very nice and will share a newer version if you contact them. -theWiseWan
  • Re:not good. (Score:2, Informative)

    by Anonymous Coward on Wednesday October 08, 2008 @03:11PM (#25304029)

    Just get a good front end for nagios, like Groundworks open source. That will make configuration loads easier. (posted as ac 'cause my password is so good I can't remember it)

  • by rhizome ( 115711 ) on Wednesday October 08, 2008 @03:11PM (#25304041) Homepage Journal

    1st Paragraph: Paraphrase of Foreword.
    2nd Paragraph: What the initial chapter(s) is (are) about.
    3rd Paragraph: What the next chapter is about.
    4th Paragraph: What the chapter after that is about.
    5th Paragraph: What the last chapter(s) is(are) about.
    6th Paragraph: Pithy criticisms for balance.
    7th Paragraph: Conclusion with the required, "This book is useful if you are like me" statement, as in, "Overall, this is a great book for anyone using Nagios as more than a casual user, and is still very informative for the casual user."

  • Re:Cacti Users (Score:5, Informative)

    by thanasakis ( 225405 ) on Wednesday October 08, 2008 @03:19PM (#25304179)

    You are comparing apples with oranges, nagios is for service monitoring, cacti is for diagrams.

  • OpenNMS is better (Score:2, Informative)

    by viridari ( 1138635 ) on Wednesday October 08, 2008 @03:31PM (#25304381)

    I don't know why OpenNMS [opennms.org] doesn't get more credit, maybe because it's a Java app, but it's a damned good one.

    Get a very basic OpenNMS configuration going, point it at a range of IP addresses, and it will auto-discover most of what's out there. If you've got your SNMP agents up and running properly, it'll automatically start checking the more important OID's for you and graphing them with an RRD back end. Most of the setup can be done through the web interface instead of through vi. You don't have to restart the daemon every time you add a node.

    If Nagios drives you a bit batty, check out OpenNMS.

  • Re:not good. (Score:3, Informative)

    by mindstrm ( 20013 ) on Wednesday October 08, 2008 @03:35PM (#25304429)

    If all you want is a tool to ping a few servers, nagios is overkill.

    My gut reaction is that if nagios configs seem too complicated, you likely have never had to roll out real enterprise monitoring.

    Our Nagios install monitors thousands of things, many of them custom tests.
    (Transaction volumes, application response times, cron job status, files....).. it can be made to to be the focal point for all the "stuff" the people responsible for monitoring company IT operations need to know about.

  • Re:Zabbix (Score:3, Informative)

    by kosmosik ( 654958 ) <kos AT kosmosik DOT net> on Wednesday October 08, 2008 @04:31PM (#25305169) Homepage

    Well for me what ruled out Nagios was:

    1. It is painfull to setup, don't get me wrong - I've sat my time over configuration and I think I know it a little bit and I can easly set it up for like 100 hosts with some templates +includes +sed magic. But that is what I can do. Not all of my staff can do it and it really is not easy.

    2. It is not distributed. The checks can be distributed. But you cannot have like 20 child Nagios nodes managed by local staff and parent nodes that gather data from children. This is a killer feature of Zabbix for me. I can send out a standard configured box/server with Zabbix to my local staff. Give them access via LDAP/AD. And tell them to configure it so it suits *their* local setup (well we have quite uncommon/unstandardized branches - historical/political reasons). Then I can gather data from their local system (they have configured it) and process it in central place so I can have a clear overview what is going on in infrastructure. I really have no clue on how to do it with Nagios - probably it is possible with some ninja-like-hacking but it is not something (ninja-like-hacking) you like for big organization. You need a clean, managable stuff.

    3. Zabbix can collect and really process historical data. If for some reason I wish to know how in past year my network bandwith evolved I can quite easly click and get some nice graphs, reports and even prognose some stuff based on various trends.

    To summarize Nagios for me seems like perfect tool for sysadmin. But it is not so good for enterprise monitoring where you have quite different goals.

  • Re:Monitor my gf (Score:3, Informative)

    by dfn_deux ( 535506 ) <datsun510&gmail,com> on Wednesday October 08, 2008 @05:46PM (#25306049) Homepage
    I understand that your comment was made in jest, but.... Nagios is a really flexible polling and alerting framework. There is nothing in nagios that makes it specifically tailored to monitoring computers or services. For example, there is no reason why you must use the HostAddress directive to hold an IP or a hostname, it could just as easily be a street address, phone number, SSN, etc... And like wise there is no need for polling to actively poll, you can just as easily configure nagios to only respond to passive updates. So, just for the sake of argument, if you really wanted to use nagios to track/control a human's actions and movements you could combine passive monitoring by having an investigator follow your target and supply them with either a phone number, email address, or website where they could submit a "check result" while at the same time you could do active monitoring by utilizing any number of GPS/cellular logging devices combined with a small analysis script with some thresholds. If you wanted you could even use the output of the gps to update the relative location of "nodes" on your status map... I believe one of the examples in the documentation has phone numbers for local pizza places used as HostAddresses and has a dial out script to check the average rings to answer for phone availability validation.
  • by Spad ( 470073 ) <`slashdot' `at' `spad.co.uk'> on Wednesday October 08, 2008 @05:49PM (#25306081) Homepage

    You'll be wanting:

    http://nagios.sourceforge.net/docs/3_0/configmain.html [sourceforge.net]
    http://nagios.sourceforge.net/docs/3_0/objectdefinitions.html [sourceforge.net]
    http://nagios.sourceforge.net/docs/3_0/configcgi.html [sourceforge.net]

    Initially. There's a lot of stuff that isn't linked directly off the TOC, which is a pain, but it can be found with a bit of digging (or download the PDF and search it).

    The FAQs (http://www.nagios.org/faqs/) also have a fair amount of useful info (Such as why the bloody thing won't use GD2 without a lot of arsing around).

    I'd also recommend the forums here: http://nagios.meulie.net/ [meulie.net] (Though they seem to be down at the moment).

  • by rossz ( 67331 ) <ogre&geekbiker,net> on Wednesday October 08, 2008 @07:37PM (#25307249) Journal

    I have never once personally had any dealings with a properly implemented Nagios system. Every single time it was obviously tossed up by someone who had minimal knowledge of how to properly monitor the infrastructure.

    The biggest complaint I hear is "too many alerts". So set your dependencies properly! You say you did that but you still get 600 alerts when the router dies? That's because you told it you wanted the alerts. See that "u" in "notification_options". That means "unreachable". You want to be alerted when the box can't be reached. You probably wanted "d,r", not "d,r.u".

    The next complaint. It's so much work to add a system. Huh? It takes me about 30 seconds to add another system and all the tests I need. The trick is using host groups to automatically assign tests to a system. For example, using a generic LAMP type server. What can we assume about this? It's running Linux, Apache, MySQL, and Perl or PHP. That's a bunch of tests right off. In my world, SNMP is assumed on all systems (because I made it that way, that's why). So we define a bunch of service checks using SNMP, but instead of using "host_name some_hostname", we use "hostgroup_name lamp-servers". Now when I add a new server, I add "hostgroups lamp-servers" to the definition and like magic it gets all the tests I need: snmp port responding, ssh access, apache daemon running, mysql daemon running, web page accessible, disk space good (defined in snmpd.conf), CPU usage, load average, plus sone automatic dependencies: all snmp tests depend on the snmp port responding. Web pages are dependent on the apache daemon running, etc. I even have some simple graphing included automatically. Even the O/S icons are defined by the hostgroups. Each distro has its own hostgroup which takes care of that detail (e.g. centos-system and ubuntu-system).

    Ten simple lines to define a new hosts can result in 20 service checks. I rarely need to define a new service check. And when a router goes out? One alert for the router.

    Not every system is going to be generic like this, but any time I have more than one system require a specific service check, I create a hostgroup to handle it.

It's a naive, domestic operating system without any breeding, but I think you'll be amused by its presumption.

Working...