En:Nagios
A Unix/Linux szerverek üzemeltetése wikiből
A lap korábbi változatát látod, amilyen KornAndras (vitalap | szerkesztései) 2010. február 28., 16:57-kor történt szerkesztése után volt.
This page is a work in progress.
apt-get install nagios3
Overview of configfiles installed in /etc/nagios3:
- commands.cfg: command definitions (unlikely to need modification)
- notify-host-by-email
- notify-service-by-email
- process-host-perfdata
- process-service-perfdata
- conf.d/contacts_nagios2.cfg: default contacts
- contact "root" (email root@localhost)
- contactgroup "admins" (only member: root)
- conf.d/extinfo_nagios2.cfg:
- hostextinfo hostgroup debian-servers (adds fancy icons and such)
- conf.d/generic-host_nagios2.cfg:
- generic-host template (enables flap detection, notification etc.)
- conf.d/generic-service_nagios2.cfg:
- generic-service template (sets defaults, as above)
- conf.d/host-gateway_nagios3.cfg:
- defines the 'gateway' host as a generic-host; its IP probably needs to be set manually.
- conf.d/hostgroups_nagios2.cfg:
- hostgroup all (members *)
- hostgroup debian-servers (members localhost)
- hostgroup http-servers (members localhost)
- hostgroup ssh-servers (members localhost)
- hostgroup ping-servers (members gateway)
- for hosts that don't even have snmp; nagios needs a "service" it can monitor, so for these hosts, we define "ping" as a service.
- conf.d/localhost_nagios2.cfg:
- defines the 'localhost' host as a generic-host and some "services" on it:
- diskspace (check_all_disks);
- logged in users (check_users);
- total processes (check_procs);
- load average (check_load).
- defines the 'localhost' host as a generic-host and some "services" on it:
- conf.d/services_nagios2.cfg: defines the services associated with service-based hostgroups
- check_http for http-servers;
- check_ssh for ssh-servers;
- check_ping for ping-servers.
- conf.d/timeperiods_nagios2.cfg: defines various time periods (which can be used to decide which contact to notify):
- 24x7;
- workhours (Monday-Friday, 9:00-17:00);
- nonworkhours (complements workhours);
- never.
- nagios.cfg: main config; lists other files and directories to include and contains some global directives.
- log_file=/var/log/nagios3/nagios.log
- cfg_file=/etc/nagios3/commands.cfg (command definitions, see above)
- cfg_dir=/etc/nagios-plugins/config (shipped by the nagios-plugins package)
- cfg_dir=/etc/nagios3/conf.d (this is where we're supposed to create our own configfiles; they all must have a .cfg extension)
- object_cache_file=/var/cache/nagios3/objects.cache (generated based on the startup config; used by the CGIs)
- precached_object_file=/var/lib/nagios3/objects.precache (useful for complex configurations; can speed up restarting nagios)
- resource_file=/etc/nagios3/resource.cfg (resource files can contain macro definitions and are not read by CGIs; so resource files are the place to record passwords and suchlike)
- status_file=/var/cache/nagios3/status.dat (stores status of monitored services and hosts; used by the CGIs)
- status_update_interval=10 (how often to update status.dat, in seconds)
- nagios_user=nagios (what user to run as)
- nagios_group=nagios (what group to run as)
- check_external_commands=0 (whether to enable "external commands" which can be issued from the web interface; disabled by default)
- command_check_interval=-1 (how often to check for "external commands"; -1 means "as often as possible")
- command_file=/var/lib/nagios3/rw/nagios.cmd (the "external command file"; permissions are crucial, so see the documentation)
- external_command_buffer_slots=4096 (a performance tuning setting; leave it alone)
- some other not too important settings, like the location of the pidfile, a temporary file, a temporary directory, the log rotation frequency, the directory where old logs are placed etc.
- event_broker_options= (see documentation)
- broker_module= (you can load event broker modules that process events; more on this later, hopefully)
- use_syslog={0|1} (whether to log message to syslog in addition to the nagios logfile)
- log_notifications={0|1} (whether to log notifications at all)
- log_service_retries, log_host_retries, log_event_handlers, log_initial_states, log_external_commands, log_passive_checks (whether to log the respective events at all)
- global_host_event_handler, global_service_event_handler (you can have some nagios commands executed for every host or service state change)
- service_inter_check_delay_method={n,d,s,x.xx} (how to schedule service checks; the default of "smart" is probably the best choice as it tries to spread out service checks to avoid load peaks)
- max_service_check_spread=30 (how many minutes may elaps from program start until all initial service checks should complete)
- max_host_check_spread=30 (as above, only for hosts instead of services)
- service_interleave_factor=s (configures how Nagios determines how long to wait between two service checks; leave it alone)
- host_inter_check_delay_method=s (as above, only for hosts instead of services)
- max_concurrent_checks=0 (how many service checks may run in parallel. 0 means no limit and is probably a good choice in most situations)
- check_result_reaper_frequency=10 (how often, in seconds, to process the results of checks; leave it alone)
- max_check_result_reaper_time=30 (a performance tuning setting; leave it alone)
- check_result_path=/var/lib/nagios3/spool/checkresults (a spool directory of incoming unprocessed check results; leave it alone)
- max_check_result_file_age=3600 (how old an unprocessed check result file can be to still be considered valid and processed)
- cached_host_check_horizon=15 (a performance tuning setting; leave it alone)
- cached_service_check_horizon=15 (a performance tuning setting; leave it alone)
- enable_predictive_host_dependency_checks=1 (there should be no need to disable this)
- enable_predictive_service_dependency_checks=1 (there should be no need to disable this)
- soft_state_dependencies=0 (whether to consider "soft" states in dependency calculation; enabling may decrease accuracy but cut down on notification floods)
- auto_reschedule_checks=0 (a check scheduling option that may improve or degrade performance)
- auto_rescheduling_interval=30 (a fine tuning option related to auto_reschedule_checks)
- auto_rescheduling_window=180 (a fine tuning option related to auto_reschedule_checks)
- service_check_timeout, host_check_timeout, event_handler_timeout, notification_timeout, ocsp_timeout, perfdata_timeout (various command timeouts; if a subprocess doesn't finish in time, it's killed)
- retain_state_information=1 (whether to save host and service state information on shutdown; it probably makes little sense to disable it. state_retention_file configures where the data is saved.)
- retention_update_interval=60 (how often, in seconds, to write state retention information to disk. If 0, only update it on shutdown.)
- use_retained_program_state=1 (whether to load program status variables, including many configuration options, from the retention file; having it enabled may make nagios ignore some configuration changes, so beware)
- use_retained_scheduling_info=1 (the same for saved scheduling decisions)
- retained_host_attribute_mask, retained_service_attribute_mask, retained_process_host_attribute_mask, retained_process_service_attribute_mask, retained_contact_host_attribute_mask, retained_contact_service_attribute_mask (state retention fine-tuning)
- check_for_updates=1 (whether to periodically check for new versions; bare_update_check sets whether to also send what version you're currently running)
- use_aggressive_host_checking=0 (when set to 0, the default, host checking is supposedly smarter somehow, but potentially less reliable)
- execute_service_checks=1 (whether to perform active service checks; if disabled, Nagios still processes check results that are dropped in its spool from somewhere else)
- accept_passive_service_checks=1 (complements the above)
- execute_host_checks, accept_passive_host_checks (as above, only for hosts instead of services)
- enable_notifications=1 (whether to send notifications at all)
- enable_event_handlers=1 (self-explanatory)
- process_performance_data=0 (whether to run host_perfdata_command and service_perfdata_command. These allow munin-like monitoring of numeric metrics in addition to up-warning-down type states.)
- host_perfdata_file, service_perfdata_file, host_perfdata_file_template, service_perfdata_file_template (where to store performance data and how to name the files themselves)
- host_perfdata_file_mode={a|w|p}, service_perfdata_file_mode={a|w|p} (whether to open perfdata files in append or write mode; p is for named pipes)
- host_perfdata_file_processing_interval, service_perfdata_file_processing_interval, host_perfdata_file_processing_command, service_perfdata_file_processing_command (performance data can be periodically processed. These directive tell Nagios how often to process the data and what commands to run on it.)
- obsess_over_services, ocsp_command, obsess_over_hosts, ochp_command, translate_passive_host_checks, passive_host_checks_are_soft, check_service_freshness, service_freshness_check_interval, check_host_freshness, host_freshness_check_interval, additional_freshness_latency (used for distributed monitoring)
- check_for_orphaned_services=1, check_for_orphaned_hosts=1 (leave enabled)
- enable_flap_detection=1 (whether to detect rapid up/down state changes of a host and service and suppress notifications temporarily when such occur)
- low_service_flap_threshold=5.0, high_service_flap_threshold=20.0, low_host_flap_threshold=5.0, high_host_flap_threshold=20.0 (flap detection fine-tuning)
- date_format=iso8601 (the other formats are... not useful, so leave this alone)
- use_timezone (override system timezone)
- p1_file, enable_embedded_perl, use_embedded_perl_implicitly (options related to embedded Perl interpreter; normally, you can leave these alone)
- illegal_object_name_chars, illegal_macro_output_chars (leave them alone)
- use_regexp_matching=0 (if enabled, regular expression matching is used to match host, hostgroup, service, and service group names/descriptions in some fields of various object types)
- use_true_regexp_matching=0 (if disabled, only use regex matching if a string contains "*" or "?"; otherwise, always use regex matching)
- admin_email=root@localhost, admin_pager=pageroot@localhost (these are made available to notification commands as $ADMINEMAIL$ and $ADMINPAGER$)
- daemon_dumps_core=0 (whether to produce coredumps on crashes; may be useful for debugging)
- use_large_installation_tweaks=0, enable_environment_macros=1, free_child_process_memory, child_processes_fork_twice (performance fine-tuning, mainly for large installations)
- debug_level=0, debug_verbosity=1, debug_file, max_debug_file_size (see configfile comments for details)
- resource.cfg: used to define variables (which Nagios calls "macros").
- These can be referenced in command definitions.
- Only 32 are supported and they all must have names of the form $USERx$.
Plugin configuration files reside in /etc/nagios-plugins/config. The following are shipped by default (by the nagios-plugins-basic package):
- apt.cfg defines two commands:
- check_apt (checks how many packages could be upgraded; apparently warns if there are any, and reports critical status if any possible upgrades are "critical")
- check_apt_distupgrade (same as above, but for APT's dist-upgrade operation)
- dhcp.cfg defines two commands (both of which need root privileges):
- check_dhcp
- check_dhcp_interface
- disk.cfg defines the following commands:
- check_disk
- check_all_disks
- ssh_disk
- ssh_disk_4 (to test IPv4 connectivity on IPv6 enabled systems)
- dummy.cfg contains some commands that are only useful for testing; they always return a fixed status.
- return-ok
- return-warning
- return-critical
- return-unknown
- return-numeric
- ftp.cfg defines the following commands:
- check_ftp
- check_ftp_4 (to test IPv4 connectivity on IPv6 enabled systems)
- http.cfg defines many commands:
- check_http (will try to fetch http://ip.of.host/)
- check_httpname (will try to fetch http://name.of.virtual.host/ from ip.of.host)
- check_http2 (permits manual tuning of critical and warning thresholds)
- check_squid
- check_https
- check_https_hostname
- check_https_auth
- check_https_auth_hostname
- check_cups (will try a http request to port 631)
- All of the above also exist with a "_4" suffix which forces the plugin to use IPv4.
- load.cfg defines:
- check_load
- mail.cfg defines:
- check_pop
- check_smtp
- check_ssmtp
- check_imap
- check_spop (this should actually be called check_pop3s)
- check_simap (this should actually be called check_imaps)
- check_mailq_sendmail
- check_mailq_postfix
- check_mailq_exim
- check_mailq_qmail
- As usual, these also come with IPv4-only variants.
- nntp.cfg defines:
- check_nntp
- check_nntp_4
- ntp.cfg defines:
- check_ntp
- check_ntp_ntpq
- check_time
- ping.cfg defines:
- check_ping
- The following are actually defined identically. The aliases help keep the distinction between hosts, printers, switches and routers; also, they allow you to modify the ping command used to test the reachability of one kind of device without affecting the others.
- check-host-alive
- check-printer-alive
- check-switch-alive
- check-router-alive
- Again, IPv4-only variants are provided.
- procs.cfg defines:
- check_procs
- check_procs_zombie
- check_procs_httpd
- This is more an example than something actually useful on Debian: it checks for the existence of processes named "httpd".
- Also, the test isn't very meaningful: the existence of httpd processes doesn't mean that the website they are supposed to serve is available.
- real.cfg defines commands to test the availability of RTSP servers:
- check_real_url
- check_real
- ssh.cfg defines:
- check_ssh
- check_ssh_port (to check ssh on a nonstandard port)
- check_ssh_4
- check_ssh_port_4
- tcp_udp.cfg defines commands to test the availability of arbitrary TCP/UDP ports (without any application layer test):
- check_tcp
- check_udp
- check_tcp_4
- check_udp_4
- telnet.cfg defines:
- check_telnet
- check_telnet_4
- users.cfg defines:
- check_users (checks whether the number of logged-in users exceeds a threshold)
Installing nagios-plugins-standard yields the following additional plugin configuration files:
- breeze.cfg:
- check_breeze (checks the signal strength of a piece of Breezecom wireless equipment)
- disk-smb.cfg:
- check_disk_smb (checks the amount of available free space on an SMB share)
- check_disk_smb_workgroup (same as above, but the name of the workgroup can also be specified)
- check_disk_smb_host (also specifies the IP of the server on the command line)
- check_disk_smb_workgroup_host
- check_disk_smb_user (also specifies a username to connect as)
- check_disk_smb_workgroup_user
- check_disk_smb_host_user
- check_disk_smb_workgroup_host_user
- dns.cfg:
- check_dns (checks the availability of recursive DNS)
- check_dig (checks the availabiltiy of authoritative DNS)
- flexlm.cfg:
- check_flexlm (checks the availability of a flexlm license manager)
- fping.cfg:
- check-fast-alive (uses fping to check reachability, which may be faster than regular ping)
- games.cfg:
- check_quake
- check_unreal
- hppjd.cfg:
- check_hpjd (uses SNMP to check the status of HP printer that has JetDirect)
- ifstatus.cfg:
- check_ifstatus (SNMP based network interface status check)
- check_ifstatus_exclude (as above, but allows exclusion of specified interface types, such as PPP)
- check_ifoperstatus_ifindex
- check_ifoperstatus_ifdescr
- ldap.cfg:
- check_ldap
- check_ldaps
- check_ldap_4
- check_ldaps_4
- mrtg.cfg:
- check_mrtg
- traffic_average
- mysql.cfg:
- check_mysql
- check_mysql_cmdlinecred
- check_mysql_database
- netware.cfg:
- check_netware_logins
- check_nwstat_conns
- check_netware_1load
- check_netware_5load
- check_netware_15load
- check_nwstat_vol_p
- check_nwstat_vol_k
- check_nwstat_ltch
- check_nwstat_puprb
- check_nwstat_dsdb
- check_netware_abend
- check_nwstat_csprocs
- nt.cfg (these commands depend on an "NSClient" service running on a Windows box and allow you to monitor the Windows box):
- check_nt
- check_nscp
- pgsql.cfg:
- check_pgsql
- check_pgsql_4
- radius.cfg:
- check_radius
- rpc-nfs.cfg:
- check-rpc
- check-nfs
- snmp.cfg
- snmp_load
- snmp_cpustats
- snmp_procname
- snmp_disk
- snmp_mem
- snmp_swap
- snmp_procs
- snmp_users
- snmp_mem2
- snmp_swap2
- snmp_mem3
- snmp_swap3
- snmp_disk2
- snmp_tcpopen
- snmp_tcpstats
- check_snmp_bgpstate
- check_netapp_uptime
- check_netapp_cpuload
- check_netapp_numdisks
- check_compaq_thermalCondition