Nagios is a fantastic program that allows you to monitor remote systems for availability. Nagios, available from http://www.nagios.org/, is often provided by Linux vendors, so it should be an apt-get or urpmi away.

Nagios makes extensive use of configuration files, typically located in /etc/nagios. The main configuration is /etc/nagios/nagios.cfg and, amongst other configuration options, points to other configuration files using cfg_file directives:

cfg_file=/etc/nagios/contacts.cfg
cfg_file=/etc/hosts.cfg
cfg_file=/etc/services.cfg

The files above further configure and refine how Nagios works. For instance, the contacts.cfg might contain:

define contact{
        contact_name                    admin
        alias                           admin
        service_notification_period     24x7
        host_notification_period        24x7
        service_notification_options    c,r
        host_notification_options       d,r
        service_notification_commands   notify-by-email
        host_notification_commands      host-notify-by-email
        email                           admin@mysite.com
        }

This defines who will be receiving alerts, what kind of alerts, and when. Here you can see that the admin is available 24 hours a day, seven days a week, and receives service notifications for critical/recovery notifications for hosts, as well as down/recovery notifications for services.

The hosts.cfg file would contain host definitions of the systems being monitored and the definitions look like:

define host{
        name                            linux-server
        use                             generic-host
        check_period                    24x7
        max_check_attempts              10
        check_command                   check-host-alive
        notification_period             workhours
        notification_interval           120
        notification_options            d,u,r
        contact_groups                  admins
        register                        0
        }
        
define host{
        use                     linux-server
        host_name               surtr
        alias                   surtr.mysite.com
        address                 127.0.0.1
        }

The first definition is a template (register is set to 0). Other definitions can use this template to build upon, preventing useless duplication of information. The second definition is the actual host, providing what template to use (linux-server), with the host name and alias and IP address. You can define as many hosts as you like and as many templates as you like.

The services.cfg file contains service definitions that are used when monitoring hosts. For instance, here is an entry to check if the POP3 server is available:

define service{
        use                             local-service
        hostgroup_name                  remote
        service_description             POP3 Availability
        check_command                   check_pop
        }

The first use command indicates a template to build upon. The hostgroup_name defines which hosts should be using this service (defined elsewhere, such as hostgroups.cfg). The check_command is the script or command (plugin) to use.

The hostgroups.cfg file might contain an entry like:

define hostgroup{
        hostgroup_name  remote
        alias           Remote Servers
        members         hades,titan
        }

This would be the definition for the remote hostgroup, used in the POP3 check illustrated previously. In this case, two hosts (hades and titan) are defined as being included in this group. You can have any number of host groups, with any number of hosts in them, and hosts can be members of multiple host groups.

Finally, the commands.cfg file would contain the actual commands or plugins to use:

define command{
        command_name    check_pop
        command_line    $USER1$/check_pop -H $HOSTADDRESS$
        }

This defines the check_pop command, used in the previous POP3-checking service as defined in services.cfg. The check_pop program defined here is a plugin, usually available in /usr/libexec/nagios (or wherever the vendor installs the plugins). This is a simple program that returns status information, such as:

# /usr/local/nagios/libexec/check_pop hades.mysite.com POP OK - 0.025 second response time on port 110 [+OK Hello there.] |time=0.024849s;0.000000;0.000000;0.000000;10.000000

Nagios itself interprets those responses to determine if the service is up and running. Because the output is pretty simplistic, you can write your own plugins for Nagios using shell script, Perl, or any other language.

This has only scratched the surface of what can be done with Nagios. You can observe Nagios reports and trends for hosts using the Web interface to view data, and there are a lot of different pre-existing plugins that can be used to check host uptime and availability, services like LDAP, SSH, FTP, and more. Nagios can be a little time-consuming to set up, but the end result is worth it, especially if you are in charge of watching even a few different systems and want early warnings of problems or potential problems.

Open Sourcery This was published in Open Sourcery, check every Monday for more stories

Related links

Comments

1

sanju - 06/10/07

i read nagios sheet it is good but i am not getting what i need i want basic of nagios ..can you send to me

» Report offensive content

2

sanju - 06/10/07

i read nagios sheet it is good but i am not getting what i need i want basic of nagios ..can you send to me

» Report offensive content

3

Vinayak - 08/04/08

Can we fire SQL commands using Nagios...to monitor changes in table rows

» Report offensive content

Leave a comment

You must read and type the 6 chars within 0..9 and A..F

* indicates mandatory fields.

3

Vinayak - 04/08/08

Can we fire SQL commands using Nagios...to monitor changes in table rows ... more

2

sanju - 10/06/07

i read nagios sheet it is good but i am not getting what i need i want basic of nagios ..can ... more

1

sanju - 10/06/07

i read nagios sheet it is good but i am not getting what i need i want basic of nagios ..can ... more

Log in


Sign up | Forgot your password?

What's on?