Network Management & Monitoring Smokeping Notes: ------ * Commands preceded with "$" imply that you should execute the command as a general user - not as root. * Commands preceded with "#" imply that you should be working as root. * Commands with more specific command lines (e.g. "RTR-GW>" or "mysql>") imply that you are executing commands on remote equipment, or within another program. Exercises ---------- 0. Log in to your PC or open a terminal window as the sysadmin user. Once you are logged in you can continue with these exercises. 1. Install Smokeping $ sudo apt-get install smokeping 2. Initial Configuration $ cd /etc/smokeping/config.d $ ls -l -rwxr-xr-x 1 root root 578 2010-02-26 01:55 Alerts -rwxr-xr-x 1 root root 237 2010-02-26 01:55 Database -rwxr-xr-x 1 root root 413 2010-02-26 05:40 General -rwxr-xr-x 1 root root 271 2010-02-26 01:55 pathnames -rwxr-xr-x 1 root root 859 2010-02-26 01:55 Presentation -rwxr-xr-x 1 root root 116 2010-02-26 01:55 Probes -rwxr-xr-x 1 root root 155 2010-02-26 01:55 Slaves -rwxr-xr-x 1 root root 8990 2010-02-26 06:30 Targets $ sudo vi General Change the following lines: owner = NOC contact = sysadmin@localhost cgiurl = http://localhost/cgi-bin/smokeping.cgi mailhost = localhost Save the file and exit. Now let's restart the Smokeping service to verify that no mistakes have been made before going any further: $ sudo /etc/init.d/smokeping stop $ sudo /etc/init.d/smokeping start 2. Configure monitoring of devices The majority of your time and work configuring Smokeping will be done in the file /etc/smokeping/config.d/Targets. For this class please do the following: Use the default FPing probe to check: - all the student NOC PCs - classroom NOC - switches - routers You can use the classroom Network Diagram on the classroom wiki to figure out addresses for each item, etc. Create some hierarchy to the Smokeping menu for your checks. Such as: + PCs menu = Lab PCs title = Lab Pcs ++ pc1 menu = pc1 title = pc1 host = pc1 ++ pc2 menu = pc2 title = pc2 host = pc2 Save the file and restart Smokeping: $ sudo /etc/init.d/smokeping stop $ sudo /etc/init.d/smokeping start Go to your browser and check the Smokeping page: http://10.10.x.y/cgi-bin/smokeping.cgi If everything is looking OK, continue adding: + Routers ++ bb-gw menu = bb-gw title = bb-gw menu = bb-gw ++ rtr1 menu = rtr1 title = rtr1 host = rtr1 + Switches ++ bb-sw menu = sw title = sw menu = sw ... Save the file, restart smokeping, and check your browser again. 3. Add new probes The current entry in Probes is fine, but if you wish to use additional Smokeping checks you can add them in here and you can specify their default behavior. You can do this, as well, in the Targets file if you wish. Here is an example of a Probes file that would specify what to use to check for HTTP and DNS latency as well as the FPing probe that is used for ping latency: $ sudo vi Probes *** Probes *** + FPing binary = /usr/bin/fping + EchoPingHttp + DNS binary = /usr/bin/dig pings = 5 step = 180 lookup = www.nsrc.org Save the file. 4. Add HTTP latency checks Now edit your Targets again: $ sudo vi Targets Add a check for HTTP latency for all the classroom PCs. This will mean adding another category, such as: + HTTP Servers probe = EchoPingHttp ++ PC1 host = pc1 ++ PC2 host = pc2 ... If you have time, consider checking some machines that are external to our classroom and the conference (your organization's website, a popular web page, etc...) 5. Add DNS Latency Checks You can check either or both internal or external names using the DNS latency probe. Add a menu hierarchy for DNS Latency. Check an external address (nsrc.org) and an internal address (noc). This will look something like this (in Targets): + DNS probe = DNS menu = External DNS Check title = DNS Latency ++ nsrc host = nsrc.org ++ noc host = noc.mgmt Exit and save your changes to the file Targets. Restart Smokeping to see the changes: $ sudo /etc/init.d/smokeping stop $ sudo /etc/init.d/smokeping start Look at additional Smokeping probes and consider implementing some of them: http://oss.oetiker.ch/smokeping/probe/index.en.html As trying to explain all syntactical details of how the file /etc/smokeping/config.d/Targets is used would require several pages we will go through some examples in class, and you can refer to the Smokeping configuration files that are in use on the classroom NOC box by going to: http://noc/configs/etc/smokeping http://noc/configs/etc/smokeping/config.d 6. Send Smokeping alerts $ sudo vi Alerts Update the top of the file where it says: *** Alerts *** to = alertee@address.somewhere from = smokealert@company.xy to include a proper "to" and "from" field for your server. Something like: *** Alerts *** to = sysadmin@localhost from = smokeping-alert@localhost If you have installed RT, you can instead send your alerts to an existing RT queue: *** Alerts *** to = net@localhost At the end of the file, add another alert like this: +anydelay type = rtt # in milliseconds pattern = >1 comment = Just for testing Notice the pattern in this alert. It means that an alert will be triggered as soon as a sample measurement has "ANY" delay, that is, more than one millisecond. This is just for testing. In reality, you will want to create an alert based on your observed baseline. For example, if your DNS servers' delay suddendly goes from under 10 ms to over 100ms. Next, be sure you have this test alert defined for some of your Targets. You can either turn on alerts by defining alerts for a probe in the /etc/smokeping/config.d/Probes file, or by individual Targets entries. In our case let's edit the Targets file and turn on alerts for our DNS Latency checks. $ sudo vi /etc/smokeping/config.d/Targets Find the following section in the file: + DNS probe = DNS menu = External DNS Check title = DNS Latency ++ nsrc host = nsrc.org And add the following alerts line after "+++ nsrc" +++ nsrc host = nsrc alerts = anydelay Save and exit from the file, then restart smokeping: $ sudo /etc/init.d/smokeping stop $ sudo /etc/init.d/smokeping start Check your e-mail with mutt $ mutt (or check your RT queues) And see if you have received alerts after 5 minutes. 6. MultiHost Graphs Once you have defined a group of hosts under a single probe type in your /etc/smokeping/config.d/Targets file, then you can create a single graph that will show you the results of all smokeping tests for all hosts that you define. This has the advantage of letting you quickly compare, for example, a group of hosts that you are monitoring with the FPing probe. The MultiHost graph function in Smokeping is extremely picky - pay close attention. To create a MultiHost graph first edit the file Targets: $ sudo vi Targets If you had a section for the FPing probe defined that looked like this (this is an example only - your Targets file may look different): + Local menu = Local title = Local Network ++ LocalMachine menu = Local Machine title = This host host = localhost ++ pc1 menu = pc1 title = pc1 host = pc1 ++ pc2 menu = pc2 title = pc2 host = pc2 ++ pc3 menu = pc3 title = pc3 host = pc3 Right now smokeping displays the results of the FPing probe for each host defined in separate graphs. If you wish to see the results in a single graph with multiple lines, then you would do this after the last FPing probe host definition: + MultiHostPCs menu = MultiHost Ping title = Consolidated Ping Response Time host = /Local/LocalMachine /Local/pc1 /Local/pc2 /Local/pc3 (Note: if the lines get too long, you can have multiple lines for the "host" entry by using the "\" character to indicate another line - ask about this if you are unsure!) Now save and exit the file Targets and restart smokeping: $ sudo /etc/init.d/smokeping stop $ sudo /etc/init.d/smokeping start You should see a new graph under the "MultiHost Ping" menu in your smokeping web interface. This graph will have different color lines for each host you have defined. 7. Slave instances - only done if we have the time. This is a description only for informational purposes in case you wish to attempt this type of configuration once the workshop is over. The idea behind this is that you can run multiple smokeping instances at multiple locations that are monitoring the same hosts and/or services as your master instance. The slaves will send their results to the master server and you will see these results side-by-side with your local results. This allows you to view how users outside your network see your services and hosts. This can be a powerful tool for resolving service and host issues that may be difficult to troubleshoot if you only have local data. Graphically this looks this: [slave 1] [slave 2] [slave 3] | | | +-------+ | +--------+ | | | v v v +---------------+ | master | +---------------+ You can see example of this data here: http://oss.oetiker.ch/smokeping-demo/ Look at the various graph groups and notice that many of the graphs have multiple lines with the color code chart listing items such as "median RTT from mipsrv01" - These are not MultiHost graphs, but rather graphs with data from external smokeping servers. To configure a smokeping master/slave server you can see the documentation here: http://oss.oetiker.ch/smokeping/doc/smokeping_master_slave.en.html In addition, a sample set of steps for configuring this is available in the file sample-smokeping-master-slave.txt.