Network Monitoring and Management Cacti, Nagios and Smokeping Ticket Creation with Request Tracker ---------------------------------------------------------------- Notes: ------ * Commands preceded with "$" imply that you should execute the command as a general user - not as root. * Commands preceded with "#" imply that you should be working as root. * Commands with more specific command lines (e.g. "RTR-GW>" or "mysql>") imply that you are executing commands on remote equipment, or within another program. Exercises --------- At this point in the week you should have Cacti, Nagios and Smokeping installed on your PCs. These exercises show you how to set up each of these programs to send alerts to the RT (Request Tracker) ticketing system to generate tickets. Exercises Part I ---------------- 0. Log in to your PC or open a terminal window as the sysadm user. 1. Verify that you have configured rt-mailgate to work with your MTA --------------------------------------------------------------------- Open the file /etc/aliases: $ sudo editor /etc/aliases In the file /etc/aliases you should have the following two lines: net-comment: "|/usr/bin/rt-mailgate --queue net --action comment --url http://localhost/rt/" net: "|/usr/bin/rt-mailgate --queue net --action correspond --url http://localhost/rt/" If these lines are not in /etc/aliases, then be sure to add them. When you are done save the file and exit. Then you need to tell the MTA (Mail Transfer Agent) that there are some new aliases to be used: $ sudo newaliases 2. Configure Smokeping ---------------------- In the file: /etc/smokeping/config.d/Alerts You can tell Smokeping where alert outputs should go. Edit the file: $ sudo vi /etc/smokeping/config.d/Alerts And Update the top of the file to be: *** Alerts *** to = net@localhost from = smokealert@localhost At the end of the file, add another alert like this: +anydelay type = rtt # in milliseconds pattern = >1 comment = Just for testing Be sure that all text is flush left in the file. Now exit and save the file. Notice the pattern in this alert. It means that an alert will be triggered as soon as a sample measurement has "ANY" delay, that is, more than one millisecond. This is just for testing. In reality, you will want to create an alert based on your observed baseline. For example, if your DNS servers' delay suddendly goes from under 10 ms to over 100ms. Next, be sure you have this test alert defined for some of your Targets. You can either turn on alerts by defining alerts for a probe in the /etc/smokeping/config.d/Probes file, or by individual Targets entries. In our case let's edit the Targets file and turn on alerts for our DNS Latency checks. $ sudo vi /etc/smokeping/config.d/Targets Find (or add if necessary) the following section in the file: +DNS probe = DNS ... Now let's add an entry for a global DNS server that responds recursively. ++GoogleA menu = 8.8.8.8 title = DNS Latency for google-public-dns-a.google.com host = google-public-dns-a.google.com alerts = anydelay Notice the line that says, "alerts=anydelay". So, in summary - you should have in your Targets file the following section near the bottom of the file: +DNS probe = DNS menu = DNS Latency title = DNS Latency Probes ++GoogleA menu = 8.8.8.8 title = DNS Latency for google-public-dns-a.google.com host = google-public-dns-a.google.com alerts = anydelay (items should be flush left in the file). Save and exit from the file, then restart smokeping: $ sudo service smokeping restart Now check RT to see if you have received anything from Smokeping. It may take up to 5 minutes for a new ticket to appear. NOTE: - If you have not already configured the DNS Latency checks for Smokeping you may need to edit the file /etc/smokeping/config.d/Probes and add in the entry for DNS like this: $ sudo vi /etc/smokeping/config.d/Probes And, at the bottom of the file add: + DNS binary = /usr/bin/dig pings = 5 step = 180 lookup = www.nsrc.org Save and exit from the file and restart Smokeping: $ sudo service smokeping restart 3. Nagios and Request Tracker Ticket Creation ---------------------------------------------- To configure RT and Nagios so that alerts from Nagios automatically create tickets requires a few steps: * Create a proper contact entry for Nagios in /etc/nagios3/conf.d/contacts_nagios2.cfg * Create the proper command in Nagios to use the rt-mailgate interface. The command is defined in /etc/nagios3/commands.cfg These next two items should already be done in RT if you have finished the RT exercises. * Install the rt-mailgate software and configure it properly in your /etc/aliases file for your MTA in use. * Configure the appropriate queues in RT to receive emails passed to it from Nagios via the rt-mailgate software. 5. Configure a Contact in Nagios --------------------------------- - Edit the file /etc/nagios3/conf.d/contacts_nagios2.cfg $ sudo bash # vi /etc/nagios3/conf.d/contacts_nagios2.cfg - In this file we will first add a new contact name under the default root contact entry. The new contact should look like this: define contact{ contact_name net alias RT Alert Queue service_notification_period 24x7 host_notification_period 24x7 service_notification_options c host_notification_options d service_notification_commands notify-service-ticket-by-email host_notification_commands notify-host-ticket-by-email email net@localhost } - _DO NOT_ remote the "root" contact_name entry! This entry goes below the "root" contact. - the service_notification_option of "c" means only notify once a service is considered "critical" by Nagios (i.e. down). The host_notification_option of "d" means down. By specify only "c" and "d" this means that notifications will not be sent for other states. - Note the email address in use "net@localhost" - this is important as this was previously defined for RT. - Now we must create a Contact Group that contains this contact. We will call this group "tickets." Do this at the end of the file: define contactgroup{ contactgroup_name tickets alias email to ticket system for RT members net,root } - You could leave off "root" as a member, but we've left this on to have another user that receives email to help us troubleshoot if there are issues. - Now that your contact has been created you need to create the commands that were referenced in the initial contact creation above, these are "notify-service-ticket-by-email" and "notify-host-ticket-by-email" 6. Update Nagios Commands ------------------------- - To create the notify-service-ticket-by-email and notify-host-ticket-by-email commands we need to edit the file /etc/nagios3/commands.cfg. # vi /etc/nagios3/commands.cfg - In this file you already have two command definitions that we are using. These are called notify-host-by-email and notify-service-by-email. We are going to add two new commands. - We _strongly_ suggest that you COPY and PASTE the text below. It is almost impossible to type it without errors. - Put these two new entries _BELOW_ the current notify-host-by-email and notify-service-by-email command entries. Do not remove the old one. - NOTE: The "commands below do not contain breaks. They are a single line. Be aware of this as COPY and PASTE between some editors and environments may insert line breaks. ################################################################ # Additional commands created for network management workshop # ################################################################ # 'notifiy-host-ticket-by-email' command definition define command{ command_name notify-host-ticket-by-email command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" $CONTACTEMAIL$ } # 'notify-service-ticket-by-email' command definition define command{ command_name notify-service-ticket-by-email command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$ } 7. Choose a Service to Monitor with RT Tickets ---------------------------------------------- - The final step is to tell Nagios that you wish to notify the contact "tickets" for a particular service. If you look in /etc/nagios3/conf.d/generic-service_nagios2.cfg the default contact_groups is "admins". To override this for a service edit the file /etc/nagios3/conf.d/services_nagios2.cfg and a contact_groups entry for one of the service definitions. - To send email to generate tickets in RT if HTTP goes down on a box you would edit the HTTP service check so that it looks like this: # check that web services are running define service { hostgroup_name http-servers service_description HTTP check_command check_http use generic-service notification_interval 0 ; set > 0 if you want to be renotified contact_groups tickets } Note the additional item that we now have, "contact_groups." You can do this for other entries as well if you wish. - When you are done, save the file and exit. - Now restart Nagios to verify your changes are correct. # /etc/init.d/nagios3 stop # /etc/init.d/nagios3 start 4.) Generate RT Tickets for Hosts --------------------------------- - To do this you must either specify "contact_groups tickets" for individual host definitions, or you must update the template file for all hosts and change the default contact_groups entry to tickets. This file is generic-host_nagios2.cfg. - If you wish to do this go ahead. Tickets will be generated if a host goes down and you have specified the contact_groups for that host as being "tickets" 5. See Nagios Tickets in RT --------------------------- To verify your changes have worked we can be sure to monitor for HTTP one of our servers that is not running HTTP. Let's pick the second Mac Mini in our class or the box known as "s1.ws.nsrc.org" (see the network diagram for details). If you do not have an entry for this machine add on to the file where your PCs are defined. If this is in a file called pcs.cfg you would do: # vi /etc/nagios3/conf.d/pcs.cfg In this file add (or verify you have) an entry that looks like this: define host { use generic-host host_name s1 alias s1 address 10.10.0.241 parents sw } Save and exit from the file. Now edit the file named /etc/nagios3/conf.d/hostgroups_nagios2.cfg and add s2 to the hostgroup for HTTP service checks: # vi /etc/nagios3/conf.d/hostgroups_nagios2.cfg Look for the "hostgroup_name http-servers" entry and update it so that it looks like this: # A list of your web servers define hostgroup { hostgroup_name http-servers alias HTTP servers members localhost,pc1,pc2,pc3,pc4,pc5,pc6,pc7,pc8,pc9,pc10,pc11,pc12, pc13,pc14,pc15,pc16,pc17,pc18,pc19,pc20,pc21,pc22,pc23,pc24, pc25,pc26,pc28,pc29,pc30,pc31,pc32,pc35,pc37,pc39,s1 } _REMEMBER_ that the line with all the "members" must not have any line breaks. Notice that "s1" has been entered on the end of the line. Now save the file and exit and restart Nagios: # service nagios3 stop # service nagios3 start - It will take a while (up to 10 minutes) for Nagios to report that HTTP is "critical", but once that happens a new ticket should appear in your RT instance in the net queue generated by Nagios. - Remember to see this go to http://pcX.ws.nsrc.org/rt/ and log in as Username "sysadmin" with the password you chose when you created the RT sysadmin account. The new ticket should appear in the "10 newest unowned tickets" box in the main log in page in RT. 6. Configure Cacti to send emails to net@localhost to generate tickets in RT ---------------------------------------------------------------------------- If you have not installed the Plugin Architecture for Cacti, then please be sure to attempt this exercise last. You can view how this work by logging in on the Cacti instance running on the noc box as this has the Cacti Plugin Architecture installed and the two plugins called, "Settings" and "Threshold". To see how Cacti can generate a ticket first go to: http://noc.ws.nsrc.org/cacti/ Log in as "admin" (system password). The do: * Click on the Console tab (upper-left) * Click on "Settings" (lower-left) * Click on the "Mail / DNS" tab (upper-right) * Verify that the fields for email are properly filled in: - Test Email (sysadm or net @ localhost) - Mail Services (PHP Mail() Function) - From Email Address (cacti@localhost) - From Name (Cacti System Monitor) - SMTP Hostname (localhost) - SMTP Port (25) Now we need to create a threshold that we'll use to trigger an email that, in turn, will create a ticket in RT: * Click on "Thresholds" (middle-left) * Click on the "Add" option (upper-right) * Select a Host (localhost, for example) * Select a Graph (Processes) * Select the Data Source (proc) * Click on the "create" button Now you will be presented with a detailed screen where you can specify what should happen if the threshhold is reached. Verify or do the following: * Threshold Name: Something Descriptive * Very that "Threshold Enabled" is checked * Threshold Type: High / Low Values (for Processes) * High Threshold: 50 (this will cause the threshold to trip) * Breach Duration: 5 minutes (this will give us ticket in 5 to 10 minutes) * Data Type: Exact Value * Re-Alert Cycle: Never * Extra Alert Emails: net@localhost,sysadm@localhost This will send an email to net@localhost within 5 or 10 minutes. This will create a new ticket in RT. In addition an email will go to sysadm@localhost. You can view the email as the sysadm user by doing: $ mutt -f /var/mail/sysadm You can create all types of threshold states that can be tripped, which will result in ticket creation. Feel free to play around with the cacti instance on the Noc to create new thresholds. You can see if they are working by logging in on the Noc instance of Request Tracker (RT) at: http://noc.ws.nsrc.org/rt/ Username "sysadm" and password is the class password. +-----+ Last update 2jun2011 Hervey Allen