Skip navigation
Currently Being Moderated

Dev Chat 12/22/2011

Created on: Dec 22, 2011 2:51 PM by Nick Yeates - Last Modified:  Dec 22, 2011 3:28 PM by Nick Yeates

[22-Dec-2011 11:01:12] <themactech> nick, just so you know, I was at training last week at Savant, they make a mac-based control system much like AMX or Crestron, they are looking to add monitoring to their solution, I gave them the name of your product, just a heads up

[22-Dec-2011 11:01:41] <nyeates> Oh nice

[22-Dec-2011 11:02:09] <nyeates> URL?

[22-Dec-2011 11:02:22] <themactech>

[22-Dec-2011 11:03:34] <nyeates> Thanks for the free word Manuel

[22-Dec-2011 11:04:10] <themactech> I am learning their system, insanely powerful when in the hands of a Mac guru, I will make zenoss profiles for their stuff when I have time

[22-Dec-2011 11:04:30] <themactech> tried to get them to come to Penn state, looks like they can't swing it

[22-Dec-2011 11:05:42] <nyeates> zenoss profiles?

[22-Dec-2011 11:05:58] <themactech> ZenPacks, that keep a good eye on their stuff

[22-Dec-2011 11:06:04] <themactech> their controllers are mac minis

[22-Dec-2011 11:06:15] <nyeates> mac minis rock....i have 2

[22-Dec-2011 11:06:37] <nyeates> Any dev questions for the John man btw?

[22-Dec-2011 11:06:42] <nyeates> He is our dev in the house today

[22-Dec-2011 11:06:42] <themactech> I am tasked to make good zenpacks for anything we roll out since monitoring has become a huge part of our sevice income

[22-Dec-2011 11:07:12] <themactech> I just wanted to get an ETA on 4.2 and the ZenPack dev docs that will come with it

[22-Dec-2011 11:07:43] <themactech> I have put all Zenoss development on hold (except emergencies) until that comes out, so I don't duplicate efforts between 3.1 and the upcoming 4.2

[22-Dec-2011 11:08:01] <nyeates> john_zenoss: Can you give any details you have heard about core? Is it even going to be release 4.2?

[22-Dec-2011 11:08:17] <Jane_Curry> hear, hear!  I am also loath to do any more ZenPack development until we see what 4.2 will break

[22-Dec-2011 11:08:47] <themactech> I just did a zenpack yesterday to respond to a customer emergency and I am thinking 'will this work on 4.2'

[22-Dec-2011 11:09:17] <Jane_Curry> Is the documentation at the start on "Chet's ZenPack documentation"???

[22-Dec-2011 11:09:49] <john_zenoss> It's a secret

[22-Dec-2011 11:09:52] <nyeates> So have you all seen the presentation that Simon did about 4.2 core?

[22-Dec-2011 11:11:30] <nyeates> Jane_Curry: This is not the documentation that you are looking for - Chet has stated this to me. It is higher level stuff that he wanted to get out of the way first, before delving into detailed development docs and tips

[22-Dec-2011 11:11:44] <nyeates> But, it is still important docs nonetheless

[22-Dec-2011 11:12:42] <john_zenoss> Actually, we're working hard on the 4.2 release right now, and hope to get 4.2 core out by the end of Q1. Alphas should be out in the next couple of months, with a hardened 4.2 scheduled for the end of Q1/beginning of Q2

[22-Dec-2011 11:13:42] <nyeates> are docs that our internal service teams, engineers, and the community are all hoped to follow

[22-Dec-2011 11:14:34] <nyeates> john_zenoss: the earlier the alphas the BETTER....the community really is eager to test stuff

[22-Dec-2011 11:15:02] <nyeates> and as Jane said, we want to rule out zenpack breakages as early as possible

[22-Dec-2011 11:16:32] <whyzgeek> hi guys we are on 2.5.2 Ent, started to experience zenhub problems in a way that modeling takes forever to finish and when I restart the zenhub it becomes fine. All other zenhub functions seems to be working fine

[22-Dec-2011 11:16:47] <whyzgeek> anybody had the same issue before?

[22-Dec-2011 11:16:48] <themactech> For me it's not only about ZenPack compatibility, but being able to make my own.  Documentation is lacking.  I want to add custom components to my ZenPacks and have not been successful at it.  I want to also do SSH modelers, and a few other things.

[22-Dec-2011 11:17:15] <themactech> Can you do a cron job to restart that deamon every so often?

[22-Dec-2011 11:18:16] <whyzgeek> well i just did that

[22-Dec-2011 11:18:27] <whyzgeek> but I am worrying that hides some bigger problem

[22-Dec-2011 11:18:35] <whyzgeek> which i am not aware of it

[22-Dec-2011 11:18:41] <nyeates> thamactech, I will send those details at Chet for his doc creation that he is to work on

[22-Dec-2011 11:19:29] <nyeates> whyzgeek: how many workers is your hub using?

[22-Dec-2011 11:19:52] <whyzgeek> 8

[22-Dec-2011 11:20:01] <nyeates> and how many cores is the box?

[22-Dec-2011 11:20:25] <whyzgeek> 12 hyperthreaded

[22-Dec-2011 11:20:29] <whyzgeek> ie 24

[22-Dec-2011 11:20:43] <whyzgeek> two six core cpus

[22-Dec-2011 11:21:51] <nyeates> yeah sounds plenty spec'ed.....what happens to the processes? do they become cpu or IO bound when the modeling slows?

[22-Dec-2011 11:22:34] <whyzgeek> no I don't see that much activity

[22-Dec-2011 11:22:47] <whyzgeek> and also no useful logs

[22-Dec-2011 11:23:02] <whyzgeek> even from web interface manual modeling stops working

[22-Dec-2011 11:23:09] <whyzgeek> ie takes for ever to do

[22-Dec-2011 11:25:49] <nyeates> we fixed a lot of stuff pertaining to hub invalidations in 3.x..... everytime a change to the object model is made - probably heavy during modeling - these invalidations are sent through the hub and apply to various other daemons - this could be stopping up the hub..... anyone know how to verify this?

[22-Dec-2011 11:25:49] <john_zenoss> @whyzgeek -- what's kind of device are you modeling? do all types of devices take forever when it gets in this state, or just some kinds?

[22-Dec-2011 11:26:27] <whyzgeek> yes when it starts happening everything becomes like that

[22-Dec-2011 11:27:40] <john_zenoss> interesting. Is there anything common about the time when the system gets into that state? like perhaps a time of day, or after modeling VMware, or after zenhub has been up for 3 weeks, etc

[22-Dec-2011 11:27:43] <whyzgeek> nyeates: yep I am aware of it in fact first it was even crashing and I raised invalidation queue to 1200 and from then it become ok

[22-Dec-2011 11:28:34] <whyzgeek> john_zenoss: it happens when it finish one whole cycle

[22-Dec-2011 11:28:52] <whyzgeek> so I was suspecious may be it is related to a single device causing it

[22-Dec-2011 11:29:05] <whyzgeek> but I couldn't find anything useful in the logs

[22-Dec-2011 11:29:12] <whyzgeek> helping me on that

[22-Dec-2011 11:30:50] <john_zenoss> hmm -- so no tracebacks at all in zenhub log?

[22-Dec-2011 11:31:09] <whyzgeek> no nothing

[22-Dec-2011 11:31:30] <whyzgeek> it seems that its doing it job

[22-Dec-2011 11:31:39] <whyzgeek> only thing that doesn't work is modeling

[22-Dec-2011 11:33:16] <whyzgeek> well to be accurate there are some tracebacks like for example some trasform is not correct but they were always there

[22-Dec-2011 11:33:23] <whyzgeek> and also after restart they appear

[22-Dec-2011 11:33:34] <whyzgeek> so i don't believe it is related to them

[22-Dec-2011 11:33:40] <rocket> whyzgeek do a lsof of the zenhub process .. check if there are more than 1024 connections to the hub.  if there are bad things happen and workers eventually die off

[22-Dec-2011 11:35:46] <whyzgeek> rocket: root@mon020:~# lsof -p 13460 | wc -l

[22-Dec-2011 11:35:48] <whyzgeek> 247

[22-Dec-2011 11:36:24] <whyzgeek> but I restarted it this morning so probably I have to it when it happens

[22-Dec-2011 11:36:34] <rocket> Thx @SlideShare! Our 100 Best Cloud & Data Stats of 2011 eBook is now a pick of the day! Check out the

[22-Dec-2011 11:36:37] <rocket> homepage

[22-Dec-2011 11:37:54] <rocket> whyzgeek: I would keep an eye on that .. also if you have not done so .. increase the logging of zenhub

[22-Dec-2011 11:38:44] <rocket> whyzgeek: we have a similar issue at another customer and the debug logging over a long period of time is indicating the workers are slowly not responding .. in their case they have over 1024 open connections though

[22-Dec-2011 11:39:08] <whyzgeek> rocket:  interesting to now

[22-Dec-2011 11:39:11] <rocket> python select in 3.X and earlier that we are using has problems once you hit 1024 connections

[22-Dec-2011 11:39:13] <whyzgeek> know

[22-Dec-2011 11:39:22] <whyzgeek> ic

[22-Dec-2011 11:39:29] <rocket> that is changed in 4.1.1 I believe

[22-Dec-2011 11:39:31] <whyzgeek> so it can well be that

[22-Dec-2011 11:39:41] <whyzgeek> so what would be solution if that's the case

[22-Dec-2011 11:39:42] <whyzgeek> ?

[22-Dec-2011 11:39:50] <whyzgeek> ic

[22-Dec-2011 11:40:02] <Hackman238> whyzgeek: Maybe a cron job to restart zenhub every few days

[22-Dec-2011 11:40:02] <rocket> split to more hubs with fewer collectors per hub

[22-Dec-2011 11:40:28] <rocket> until you get to 4.1.1 I believe

[22-Dec-2011 11:40:40] <nyeates> mattray: i think chet is using vagrant and jenkins to do some of the Continuous Integration stuff:

[22-Dec-2011 11:40:41] <rocket> I am not 100% sure its the same issue you are seeing

[22-Dec-2011 11:41:08] <whyzgeek> rocket: yep that's sounds good

[22-Dec-2011 11:41:15] <whyzgeek> I will double check this

[22-Dec-2011 11:41:15] <mattray> nyeates: cool

[22-Dec-2011 11:41:20] <john_zenoss> this is very odd. i'll second rocket's advice to keep an eye on open connections -- this problem may manifest due to that, but it would probably effect event throughput, and not just modeling. this bug is unfortunately sketchy enough to where I'd have to recommend opening a support ticket, and possibly upgrading.

[22-Dec-2011 11:41:32] <mattray> nyeates: I got pinged the other day about using Chef to deploy Zenoss Enterprise

[22-Dec-2011 11:41:51] <Sam-I-Am> sup guys

[22-Dec-2011 11:42:56] <whyzgeek> john_zenoss: support ticket is already open

[22-Dec-2011 11:43:01] <whyzgeek> this is long standing issue

[22-Dec-2011 11:43:08] <whyzgeek> and we had memory problem

[22-Dec-2011 11:43:19] <whyzgeek> so we added memory

[22-Dec-2011 11:43:31] <whyzgeek> and I increased caches and queue sizes

[22-Dec-2011 11:43:39] <whyzgeek> hoping this would fix it

[22-Dec-2011 11:43:47] <whyzgeek> but has no impact on this

[22-Dec-2011 11:43:59] <whyzgeek> neither on frequency nor its behaviour

[22-Dec-2011 11:44:35] <whyzgeek> after memory upgrade the only thing I noticed is that when it works modeling is far faster

[22-Dec-2011 11:44:56] <nyeates> mattray: neat! Did you get to use the cookbook you made for zenoss?

[22-Dec-2011 11:45:56] <mattray> not yet, I don't think they wanted to handle all the coding and testing they'd have to do themselves (since it's not open source)

[22-Dec-2011 11:46:14] <rocket> whyzgeek: I would still recommend getting to 4.1.1 which should be out any moment.  It has a huge number of hub fixes and performance fixes in place ..

[22-Dec-2011 11:46:34] <rocket> whyzgeek: but I can understand restrictions to upgrading

[22-Dec-2011 11:46:54] <rocket> whyzgeek: do keep in mind that 2.5.2 is really best effort from a support perspective at this point

[22-Dec-2011 11:48:36] <Sam-I-Am> wow, i'm surprised 2.5.2 still lurks

[22-Dec-2011 11:48:51] <Sam-I-Am> development has gone quickly since i moved on to other projects

[22-Dec-2011 11:48:58] <Sam-I-Am> (and i'm trying to figure out how to push zenoss)

[22-Dec-2011 11:49:35] <Sam-I-Am> people i work with seem to toss zenoss into the "open source free stuff" bin which doesnt compete with commercial products (its sad, yet funny)

[22-Dec-2011 11:49:49] <Sam-I-Am> they think they cant make money on it, so they dont sell it

[22-Dec-2011 11:52:29] <nyeates> themactech just said that his company is making a significant amt of their services $ on monitoring

[22-Dec-2011 11:55:00] <Sam-I-Am> mine doesnt sell services yet, just products

[22-Dec-2011 11:58:45] <whyzgeek> rocket: we already started going through upgrade plan, I assume that this cronjob would work

[22-Dec-2011 11:59:08] <whyzgeek> how can we tell if a single hub is not enough anymore

[22-Dec-2011 11:59:44] <whyzgeek> also how can we gauge number of changes applied through zenhub?

[22-Dec-2011 12:00:01] [disconnected at Thu Dec 22 12:00:01 2011]

[22-Dec-2011 12:00:02] [connected at Thu Dec 22 12:00:02 2011]

[22-Dec-2011 12:00:17] [zenoss-logger (logger bot) has joined #zenoss]

[22-Dec-2011 12:00:53] <rocket> well the main indicator is a worklist length that never goes back down to 0

[22-Dec-2011 12:02:01] <rocket> the number of changes is much trickier to track in 2.5.2 as I dont think there is much instrumentation for that .. I believe you could grep for "Noticing object changed" and do a count on that .. but there isnt great builtin recording of those counts

[22-Dec-2011 12:04:35] <whyzgeek> well we did have a huge issue with worklist length but I stopped zenmodeler daemon completely and written a cronjob to do the modeling manually

[22-Dec-2011 12:04:53] <whyzgeek> the cronjobs ensures only a couple of devices gets modelled at a time

[22-Dec-2011 12:05:07] <whyzgeek> that fixed a lot of those issues

[22-Dec-2011 12:05:12] <nyeates> Im going to end dev chat now - Thks all - Have a great holiday and see you again on Jan 5

[22-Dec-2011 12:11:44] <whyzgeek> rocket: just checked worklist barely goes above 1,2

[22-Dec-2011 12:12:09] <whyzgeek> and then comes back to 0

[22-Dec-2011 12:17:49] <rocket> even in the dead state your describing?

[22-Dec-2011 12:18:57] <whyzgeek> no I didn't check it at that stage I willl wait for next occurrence and check

Comments (0)