Skip navigation
1425 Views 5 Replies Latest reply: Aug 28, 2012 6:59 PM by Jo Rhett RSS
Jo Rhett Rank: White Belt 42 posts since
Feb 13, 2012
Currently Being Moderated

Aug 7, 2012 7:01 PM

Has anyone else seen random bursts of query timeouts?

I've been chasing this for about two weeks right now, and coming up dry. Has anyone else seen the following problem? Any idea where to look?

 

About once every other day, we'll get a spattering of query timeouts. Pick any random ten or twenty queries, and they all time out. The event logs indicates that they were unable to get a response in time. By the time we get the alerts and login the problem has long since cleared.

 

Obvious troubleshooting:

 

1. I've set up constant ping and tcp monitoring between some of the affected systems and proved that there was no networking outage when the timeouts occurred.

 

2. Many of the services which have reported failures would have large primary failures other systems would notice (ie DB servers would create db failure messages in the logs) and this simply doesn't occur.

 

3. The timing of the messages is completely random and unrelated to load. In fact, they have happened during off-peak periods more often than during peak load.

 

In short, we've isolated that this "timeout" seems to be occuring inside Zenoss itself, and is not actually a problem with the remote service. Some sort of internal locking?

 

1. This started about two weeks ago, and there had been zero other changes to the system for many months. Not related to a change.

 

2. This server does NOTHING except run Zenoss. It has no cron scripts unrelated to Zenoss, etc.

 

3. Zenoss and SAR monitoring of the system indicate no resource consumption issues -- plenty of free memory, cpu, etc.

 

Environment:
  CentOS 5

   Zenoss Stack 3.2.1

   24 GB main memory

   16 cores

More Like This

  • Retrieving data ...

Legend

  • Correct Answers - 4 points
  • Helpful Answers - 2 points