I haven't released IP-SLA Enumeration/Monitoring v2 yet, but its comming soon. I wanted to wait for the release of 2.5 so as to not need an extra release to cleanup unexpected incompatabilities; that said I'm working on some of them now! :-D
One thing I have been stuck on is what would be the most useful naming scheme for SLA objects. It's a simple matter th change, but could adversly affect how clean the SLA object list looks and its usefulness. Any suggestions?
There have been many excelletn suggestions posted, some which I've included, some which I have not yet had time. The more feedback and suggestions the better the product!
Thanks everyone for the support.
Good to hear, looking forward for release v2
"One thing I have been stuck on is what would be the most useful naming scheme for SLA objects. It's a simple matter th change, but could adversly affect how clean the SLA object list looks and its usefulness. Any suggestions?"
Not sure what you mean with the above?
The list under SLA you get from the tag in the unit or the names in the graphs?
1) by looking at screenshots such as http://community.zenoss.org/servlet/JiveServlet/showImage/102-3436-4-1131/ipsla_http.png, it seems you need to define new datasources. what are they for? I mean I already have monitoring setup to monitor http pages with httpMonitor zenpack, ping monitoring, snmp and so on.
isn't is possible to just reuse the already existing performance data?
->If you need to reuse existing data you're RRD files would need to be dropped into the place of the resulting RRD for said datapoint. The RRd would need to have the same RRD create statement to ensure compatability.
2) is it possible to define SLA's such as "99.9% uptime (ping monitoring)" or "99.5% uptime where 'up' means http OK within x ms"
->This is possible, is just a caluculation. I haven't added this because extra RRD graph statements needed slow graphs when using a graph report and including hundreds of SLA graphs. PM me and I'll explain how one can do this.
3) is it possible to get reports (basically a list of all SLA's and a list of issues with their timeframe and calculation of the actual sla performance
-> I'm working on reports. Reports haven't been released yet because they contain some hackish code to make them work. I will release them soon.
4) is it possible to attribute problems to external causes that don't count? I.e. if you have a month worth of data with 2 outages of an hour you could say "the first item is our fault, but the 2nd outage was caused by yourself" so that only 1 hour would count?
->Negative. I haven't thought of a way to do this. If anyone has suggestions on how I could, I'll code it!
Version 2 updates:
Please post tracebacks or problems so I can debug them! Thanks!
I will update the ZenPack page for this later, do not have time this moment.
What would you say the cpu usage would be on a 32bit install? or more, what impact would it be?
With your words, plz use 64bit it doesnt sound that good :/
Good news anyhow, looking forward to test this next week
CPU usage on a 32-bit box will be about that of the perl script, however, on a 64-bit installation its much faster. Version 2.1 will feature compiled in support for psyco, a python acceleration engine for 32-bit. I'll post it soon.
Besides psyco, I'm trying to figure ways to speed it up by consolidating all the testing under one operation, rather than the number of actual SLA's- very difficult because ZenOSS doesn't traditionally operate this way. I'm considering creating a daemon for it, however, this comes with a number of drawbacks. Additionally QoS support is comming.
As usual, suggestions, bug reports, requests, etc. are welcomed to help further the development!
Hackman, a deamon would be sweet, a compile on each system for perfect fit.
Maybe even building support for having it running on a separate box? And just have the core Zenoss install fetch the rrd files of that server?
The model I'm trying to implement for a daemon consists of the command templates executing a very short script (more 1 liner) to queue the request with the SLA collection daemon which will then return back to Zenoss as if it were the original command. The idea is that there is one main thread doing the collection, the rest only being short process blips to queue the work to the daemon. The main problem is its convoluted. Any ideas?
Ex: SLA_Object -> Template -> Python Datasource -> Queue 1 liner -> SLA Collection Daemon -> Collection Operation -> Return to Python Datasource -> Zenoss RRD entry
Hey hackman, thanks for getting back to me.
I don't understand what you mean with the first point. it just seems normal and intuitive to me to reuse the results we're already gathering anyway instead of monitoring certain things twice. I'm fine with typing rrd commands manually and/or updating them if we change the way how/where we monitor certain parameters. as long as the monitoring configuration, polling, data storing etc efforts don't need to be duplicated.
about the 4th point: I guess you could calculate the starts and ends of the timeperiods where a certain metric did not meet the required specifications.
once you this, you could cache this information in separate files (on a per-metric/rrd-file basis) so that next time, you only need to retrieve the periods that come _after_ what you already found out. to attribute outages, you could provide a simple interface with an html form where each period has a toggle field and textfield for comments. so you could toggle the field and fill in an explanation if you want to say "don't count this"
@nickname, basically what he means is instead of polling sychronously (zenoss executes command, command gathers info, zenoss waits until command is done), it will happen asynchronously (zenoss executes command, command just makes an entry in a table/text file on localhost, command can quickly return so zenoss can do whatever it needs to do else). then there would be a separate program checking this table/text file and processing the requested commands, gathering the needed info, and storing it in rrd files, which zenoss can then display.
I'm not sure if such an approach is worth it: it depends on how heavyweight the zenoss process is that needs to wait on the command execution, but i would guess this is just a lightweight thread that can wait for a while, so the proposed approach might not bring any advantage at all.
You could essentially resuse datapoints from the http monitoring pack, however, the SLA's are neccesarrily from the perspective of the Zenoss segment. On very large, unweildy, over subnetted and firewalled network (like mine :-/) the SLA's are generally used between points to help determine bottle necks and user perspective on the availability of a web service, dns, smb, etc. If you wanted to reuse points you could essentially create a local copy of the template for the HTTP SLA tests you wish, disable the datasources and adjust the graph to point to datapoints used by the http monitor. I'll think of how I can include this as a feature in future releases.
Concerning black out periods in RRD files, I'm having a hard time thinking of a way to approach the code for it. What I've done instead, for our organization, is build a report that computes the values when the tested device or service is considered up. The report takes forever to run a 1 month report for about 800 SLA's because it pumbles ZOPE looking for device and service availability data. I'm working on a way to clean that up. I'll brainstorm around your suggestion of how to do this. It would work well if I could figure how to do it! :-)
Concerning the daemon for IP-SLA, it would only be helpful in very large implementations like 500 SLA's at 1 minute collection. We're doing ~800 at 1 minute and the zencommand daemon gets pumbled. In smaller installations, it would probably be slightly less efficient as you suggest.
Great suggestions from everyone! The more the better!
Havnt got approval jet to change from previous version, Will try to get one today.
How is the progress going for next version? the one with less cpu time per cycle with 32bit installs?
Follow Us On Twitter »
||Latest from the Zenoss Blog »||Community||Products||Services||Customers||About Us|
Copyright © 2005-2011 Zenoss, Inc.