Thank you for taking the time to read this. I am relatively new to Linux operating systems and network management systems, so please pardon my potential lack of common knowledge.
All Zenoss installs are built within VMs on a local server. They are allocated 20GB hard drive space, 1 or 2 cores at 2.00GHz, and 1-2GB of ram.
Various configurations are as follows:
Zenoss-3.2.1-x86_64.vmware (stock VM installation)
Centos 5.8-x86_64 with zenoss-stack-3.2.1-linux.bin stack install.
Across the two types of install, around 20 VM’s have been created and re-made over the past few months.
ZenPacks and additions
fping-3.1-1el5.rf.x86_64.rpm (installed in Linux so the ZenPack has a command to reference)
ZenPacks.BlakeDrager.fping-1.0.egg (had to remove the py2.4 for correct installation)
Recently I also made a change to allow the alpha to be set within graph parameters (default puts Stacked at 0% transparency, and everything else at 50%).
I have experienced this issue with none of the ZenPacks installed, some of the ZenPacks installed, and all of the ZenPacks installed.
I am using Fping to track latency and packet loss across various Vlans or networks from different locations congruently. The idea is to use multiple Zenoss servers to monitor Default Gateways and the latency and packet loss between networks. I use CheckPing to give me a keep alive timer monitoring various IP’s. Theoretically Fping should be able to do both, but when devices go down it just leaves empty space in the graph.
Randomly (often during hours where there is no traffic on the network, after business hours or over weekends) the fping, regular ping, and checkping graphs will stop updating. The old data remains, and Zenoss recognizes the devices are still up and functioning, receives snmp packets where applicable, and the interface throughput and other graphs still work. Running the ping, checkping and fping commands with the test produces results.
Using Zendmd to reindex() and commit() fixes the issue, though it doesn’t retroactively fill in the missing portions of the fping/checkping/ping graphs. With this issue unresolved, I will be unable to utilizes this tool in the facility.
Additional information that may or may not be pertinent: yesterday I noticed that a server had stopped graphing. I applied the reindex fix, and monitored the graphs for a time. At 1:00 p.m. I left for a meeting, disconnecting my client laptop from the network. At that exact time, the graphs broke again. Again, the VM’s are hosted on a server, not on the local machine.
Thank you for your time, and any guidance you can offer.
You don't say how many devices each Zenoss ismonitoring but 1-2 GB RAM is probably too low.
Have you checked various logfiles? In $ZENHOME, make sure you check zenhub.log and event.log. Given the areas you have issues, check zenping.log. Can't remember how the various fping zenpacks work but probably through zencommand so check zencommand.log too.
Thanks for pointing me in the right direction, zencommand.log has shown me that the error "[Errno 24] Too many open files" coincides with the graphs failing to update from then on.
Update: The issue seems to have been resolved. It seems Fping was opening programs without killing them, which left several thousand running processes, each with 0% ram and cpu utilization. I am building a new VM absent Fping to ensure that this was the issue, so far I havn't noticed runaway running process numbers.