En:Nagios

A Unix/Linux szerverek üzemeltetése wikiből
(Változatok közti eltérés)
(v0)
 
(Some narrative prose added; beginnings of structure)
1. sor: 1. sor:
 
''This page is a work in progress.''
 
''This page is a work in progress.''
   
apt-get install nagios3
+
This article attempts to be a concise, to the point introduction to the guts of Nagios.
  +
It is assumed that the reader is familiar with what Nagios is, what it does, and has at least a generic idea of how it works.
  +
Basic installation and setup will be covered, not with the goal of attaining a specific working configuration, but more with a look to helping you understand what can be tweaked where in order to do what.
   
Overview of configfiles installed in /etc/nagios3:
+
The text below applies to the version of nagios3 in Debian unstable ("sid") as of March 2010.
  +
The stable and testing distributions may behave slightly differently.
  +
  +
== Important concepts ==
  +
  +
Before we continue, there are some concepts to be introduced.
  +
All of these provide some kind of indirection, mostly aimed at saving typing while writing the configuration (which does take very long even so, at least for a system of any complexity).
  +
  +
=== Macro ===
  +
  +
A ''macro'' is something most people would probably call a variable. Nagios macros have upper-case names, enclosed in dollar signs;
  +
whenever they are referenced, they are replaced with the value associated with that particular macro in that particular context.
  +
  +
=== Command ===
  +
  +
A ''command'' is some external binary Nagios can run.
  +
Its definition includes its name, the full path of the binary, and optionally, command line arguments to pass the binary.
  +
These arguments can reference ''macros'' (such as <tt>$HOSTADDRESS$</tt>) that are derived from the context the ''command'' is used in as well as ''macros'' of the form <tt>$ARG1$</tt>.
  +
The values for these <tt>$ARGx$</tt> macros are passed in when referencing the ''command'' like this:
  +
<pre>
  +
check_command check_all_disks!20%!10%
  +
</pre>
  +
  +
The <tt>check_all_disks</tt> command is defined as follows:
  +
  +
<pre>
  +
define command{
  +
command_name check_all_disks
  +
command_line /usr/lib/nagios/plugins/check_disk -w '$ARG1$' -c '$ARG2$' -e
  +
}
  +
</pre>
  +
  +
When invoking this check as shown above, <tt>$ARG1$</tt> will have a value of 20% while <tt>$ARG2$</tt> will expand to 10%.
  +
These specify the "warning" and "critical" thresholds for the plugin (the <tt>-e</tt> switch causes it to only report filesystems that are too full).
  +
The idea here is that you could easily modify the <tt>check_all_disks</tt> command definition to call a different binary as the binary itself is only referenced in this one place.
  +
This is the advantage of the indirection.
  +
The disadvantage is that it's mandatory: if you need a one-shot command for a specific service, you can't just define it along with the service.
  +
You must define the command and then reference it in the service definition.
  +
  +
== Host ==
  +
  +
A ''host'' is an object that has one or more ''services'' associated with it. These services are what Nagios monitors (often by attempting
  +
to use them). A host is basically a group of services reachable via the IP address of the host.
  +
Hosts also appear in various parts of the web interface as clickable objects.
  +
  +
== Service ==
  +
  +
A ''service'' is something for which we can define a ''command'' that checks its status (which, for the sake of simplicity, can be "OK", "WARNING" or "CRITICAL").
  +
  +
== Installing Nagios ==
  +
  +
First, install the nagios3 package and some monitoring plug-ins:
  +
  +
<pre>
  +
apt-get install nagios3 nagios-plugins-basic nagios-plugins-standard
  +
</pre>
  +
  +
This will install the binaries and a very basic configuration that monitors some aspects of "localhost".
  +
  +
Let's take a look at the configuration installed in <tt>/etc/nagios3</tt> first.
   
 
* commands.cfg: command definitions (unlikely to need modification)
 
* commands.cfg: command definitions (unlikely to need modification)

A lap 2010. március 22., 19:37-kori változata

This page is a work in progress.

This article attempts to be a concise, to the point introduction to the guts of Nagios. It is assumed that the reader is familiar with what Nagios is, what it does, and has at least a generic idea of how it works. Basic installation and setup will be covered, not with the goal of attaining a specific working configuration, but more with a look to helping you understand what can be tweaked where in order to do what.

The text below applies to the version of nagios3 in Debian unstable ("sid") as of March 2010. The stable and testing distributions may behave slightly differently.

Tartalomjegyzék

1 Important concepts

Before we continue, there are some concepts to be introduced. All of these provide some kind of indirection, mostly aimed at saving typing while writing the configuration (which does take very long even so, at least for a system of any complexity).

1.1 Macro

A macro is something most people would probably call a variable. Nagios macros have upper-case names, enclosed in dollar signs; whenever they are referenced, they are replaced with the value associated with that particular macro in that particular context.

1.2 Command

A command is some external binary Nagios can run. Its definition includes its name, the full path of the binary, and optionally, command line arguments to pass the binary. These arguments can reference macros (such as $HOSTADDRESS$) that are derived from the context the command is used in as well as macros of the form $ARG1$. The values for these $ARGx$ macros are passed in when referencing the command like this:

check_command                   check_all_disks!20%!10%

The check_all_disks command is defined as follows:

define command{
	command_name	check_all_disks
	command_line	/usr/lib/nagios/plugins/check_disk -w '$ARG1$' -c '$ARG2$' -e
	}

When invoking this check as shown above, $ARG1$ will have a value of 20% while $ARG2$ will expand to 10%. These specify the "warning" and "critical" thresholds for the plugin (the -e switch causes it to only report filesystems that are too full). The idea here is that you could easily modify the check_all_disks command definition to call a different binary as the binary itself is only referenced in this one place. This is the advantage of the indirection. The disadvantage is that it's mandatory: if you need a one-shot command for a specific service, you can't just define it along with the service. You must define the command and then reference it in the service definition.

2 Host

A host is an object that has one or more services associated with it. These services are what Nagios monitors (often by attempting to use them). A host is basically a group of services reachable via the IP address of the host. Hosts also appear in various parts of the web interface as clickable objects.

3 Service

A service is something for which we can define a command that checks its status (which, for the sake of simplicity, can be "OK", "WARNING" or "CRITICAL").

4 Installing Nagios

First, install the nagios3 package and some monitoring plug-ins:

apt-get install nagios3 nagios-plugins-basic nagios-plugins-standard

This will install the binaries and a very basic configuration that monitors some aspects of "localhost".

Let's take a look at the configuration installed in /etc/nagios3 first.

  • commands.cfg: command definitions (unlikely to need modification)
    • notify-host-by-email
    • notify-service-by-email
    • process-host-perfdata
    • process-service-perfdata
  • conf.d/contacts_nagios2.cfg: default contacts
    • contact "root" (email root@localhost)
    • contactgroup "admins" (only member: root)
  • conf.d/extinfo_nagios2.cfg:
    • hostextinfo hostgroup debian-servers (adds fancy icons and such)
  • conf.d/generic-host_nagios2.cfg:
    • generic-host template (enables flap detection, notification etc.)
  • conf.d/generic-service_nagios2.cfg:
    • generic-service template (sets defaults, as above)
  • conf.d/host-gateway_nagios3.cfg:
    • defines the 'gateway' host as a generic-host; its IP probably needs to be set manually.
  • conf.d/hostgroups_nagios2.cfg:
    • hostgroup all (members *)
    • hostgroup debian-servers (members localhost)
    • hostgroup http-servers (members localhost)
    • hostgroup ssh-servers (members localhost)
    • hostgroup ping-servers (members gateway)
      • for hosts that don't even have snmp; nagios needs a "service" it can monitor, so for these hosts, we define "ping" as a service.
  • conf.d/localhost_nagios2.cfg:
    • defines the 'localhost' host as a generic-host and some "services" on it:
      • diskspace (check_all_disks);
      • logged in users (check_users);
      • total processes (check_procs);
      • load average (check_load).
  • conf.d/services_nagios2.cfg: defines the services associated with service-based hostgroups
    • check_http for http-servers;
    • check_ssh for ssh-servers;
    • check_ping for ping-servers.
  • conf.d/timeperiods_nagios2.cfg: defines various time periods (which can be used to decide which contact to notify):
    • 24x7;
    • workhours (Monday-Friday, 9:00-17:00);
    • nonworkhours (complements workhours);
    • never.
  • resource.cfg: used to define variables (which Nagios calls "macros").
    • These can be referenced in command definitions.
    • Only 32 are supported and they all must have names of the form $USERx$.

Plugin configuration files reside in /etc/nagios-plugins/config. The following are shipped by default (by the nagios-plugins-basic package):

  • apt.cfg defines two commands:
    • check_apt (checks how many packages could be upgraded; apparently warns if there are any, and reports critical status if any possible upgrades are "critical")
    • check_apt_distupgrade (same as above, but for APT's dist-upgrade operation)
  • dhcp.cfg defines two commands (both of which need root privileges):
    • check_dhcp
    • check_dhcp_interface
  • disk.cfg defines the following commands:
    • check_disk
    • check_all_disks
    • ssh_disk
    • ssh_disk_4 (to test IPv4 connectivity on IPv6 enabled systems)
  • dummy.cfg contains some commands that are only useful for testing; they always return a fixed status.
    • return-ok
    • return-warning
    • return-critical
    • return-unknown
    • return-numeric
  • ftp.cfg defines the following commands:
    • check_ftp
    • check_ftp_4 (to test IPv4 connectivity on IPv6 enabled systems)
  • http.cfg defines many commands:
    • check_http (will try to fetch http://ip.of.host/)
    • check_httpname (will try to fetch http://name.of.virtual.host/ from ip.of.host)
    • check_http2 (permits manual tuning of critical and warning thresholds)
    • check_squid
    • check_https
    • check_https_hostname
    • check_https_auth
    • check_https_auth_hostname
    • check_cups (will try a http request to port 631)
    • All of the above also exist with a "_4" suffix which forces the plugin to use IPv4.
  • load.cfg defines:
    • check_load
  • mail.cfg defines:
    • check_pop
    • check_smtp
    • check_ssmtp
    • check_imap
    • check_spop (this should actually be called check_pop3s)
    • check_simap (this should actually be called check_imaps)
    • check_mailq_sendmail
    • check_mailq_postfix
    • check_mailq_exim
    • check_mailq_qmail
    • As usual, these also come with IPv4-only variants.
  • nntp.cfg defines:
    • check_nntp
    • check_nntp_4
  • ntp.cfg defines:
    • check_ntp
    • check_ntp_ntpq
    • check_time
  • ping.cfg defines:
    • check_ping
    • The following are actually defined identically. The aliases help keep the distinction between hosts, printers, switches and routers; also, they allow you to modify the ping command used to test the reachability of one kind of device without affecting the others.
      • check-host-alive
      • check-printer-alive
      • check-switch-alive
      • check-router-alive
    • Again, IPv4-only variants are provided.
  • procs.cfg defines:
    • check_procs
    • check_procs_zombie
    • check_procs_httpd
      • This is more an example than something actually useful on Debian: it checks for the existence of processes named "httpd".
      • Also, the test isn't very meaningful: the existence of httpd processes doesn't mean that the website they are supposed to serve is available.
  • real.cfg defines commands to test the availability of RTSP servers:
    • check_real_url
    • check_real
  • ssh.cfg defines:
    • check_ssh
    • check_ssh_port (to check ssh on a nonstandard port)
    • check_ssh_4
    • check_ssh_port_4
  • tcp_udp.cfg defines commands to test the availability of arbitrary TCP/UDP ports (without any application layer test):
    • check_tcp
    • check_udp
    • check_tcp_4
    • check_udp_4
  • telnet.cfg defines:
    • check_telnet
    • check_telnet_4
  • users.cfg defines:
    • check_users (checks whether the number of logged-in users exceeds a threshold)

Installing nagios-plugins-standard yields the following additional plugin configuration files:

  • breeze.cfg:
    • check_breeze (checks the signal strength of a piece of Breezecom wireless equipment)
  • disk-smb.cfg:
    • check_disk_smb (checks the amount of available free space on an SMB share)
    • check_disk_smb_workgroup (same as above, but the name of the workgroup can also be specified)
    • check_disk_smb_host (also specifies the IP of the server on the command line)
    • check_disk_smb_workgroup_host
    • check_disk_smb_user (also specifies a username to connect as)
    • check_disk_smb_workgroup_user
    • check_disk_smb_host_user
    • check_disk_smb_workgroup_host_user
  • dns.cfg:
    • check_dns (checks the availability of recursive DNS)
    • check_dig (checks the availabiltiy of authoritative DNS)
  • flexlm.cfg:
    • check_flexlm (checks the availability of a flexlm license manager)
  • fping.cfg:
    • check-fast-alive (uses fping to check reachability, which may be faster than regular ping)
  • games.cfg:
    • check_quake
    • check_unreal
  • hppjd.cfg:
    • check_hpjd (uses SNMP to check the status of HP printer that has JetDirect)
  • ifstatus.cfg:
    • check_ifstatus (SNMP based network interface status check)
    • check_ifstatus_exclude (as above, but allows exclusion of specified interface types, such as PPP)
    • check_ifoperstatus_ifindex
    • check_ifoperstatus_ifdescr
  • ldap.cfg:
    • check_ldap
    • check_ldaps
    • check_ldap_4
    • check_ldaps_4
  • mrtg.cfg:
    • check_mrtg
    • traffic_average
  • mysql.cfg:
    • check_mysql
    • check_mysql_cmdlinecred
    • check_mysql_database
  • netware.cfg:
    • check_netware_logins
    • check_nwstat_conns
    • check_netware_1load
    • check_netware_5load
    • check_netware_15load
    • check_nwstat_vol_p
    • check_nwstat_vol_k
    • check_nwstat_ltch
    • check_nwstat_puprb
    • check_nwstat_dsdb
    • check_netware_abend
    • check_nwstat_csprocs
  • nt.cfg (these commands depend on an "NSClient" service running on a Windows box and allow you to monitor the Windows box):
    • check_nt
    • check_nscp
  • pgsql.cfg:
    • check_pgsql
    • check_pgsql_4
  • radius.cfg:
    • check_radius
  • rpc-nfs.cfg:
    • check-rpc
    • check-nfs
  • snmp.cfg
    • snmp_load
    • snmp_cpustats
    • snmp_procname
    • snmp_disk
    • snmp_mem
    • snmp_swap
    • snmp_procs
    • snmp_users
    • snmp_mem2
    • snmp_swap2
    • snmp_mem3
    • snmp_swap3
    • snmp_disk2
    • snmp_tcpopen
    • snmp_tcpstats
    • check_snmp_bgpstate
    • check_netapp_uptime
    • check_netapp_cpuload
    • check_netapp_numdisks
    • check_compaq_thermalCondition
Személyes eszközök