Skip navigation
6732 Views 9 Replies Latest reply: Jun 25, 2012 11:00 AM by jshardlow RSS
brockp Rank: White Belt 12 posts since
Jan 11, 2011
Currently Being Moderated

Nov 10, 2011 9:59 PM

3.2.X Process monitoring bugs

Starting with 3.2.0 and continues in 3.2.1 there is a bug where processes that are found by the modler are 'not found' by zenprocess and showup as down even thought the processes are running.

 

This is easy to reproduce, the error manifests its self if you have two mysqld's running (like your normal system mysql and zenoss's mysqld.bin).

 

Use a regex of 'mysqld' for the process and say to ignore command line parameters, in my case an ubuntu machines with zenoss 3.2.1 and the stock ubuntu mysql you zenoss will find 3 mysqld's from modler:

 

2011-11-10 21:42:33,647 DEBUG zen.ZenModeler: snmpidx: 1070 process: {'procName': 'mysqld', 'parameters': '', '_procPath': '/usr/sbin/mysqld'}
2011-11-10 21:42:33,651 DEBUG zen.ZenModeler: snmpidx: 15432    process: {'procName': 'mysqld_safe', 'parameters': '/usr/local/zenoss/mysql/bin/mysqld_safe --defaults-file=/usr/local/zenoss/mysql/my.cnf --port=3307 --socket=/usr/local/zenoss/my', '_procPath': '/bin/sh'}
2011-11-10 21:42:33,651 DEBUG zen.ZenModeler: snmpidx: 15491    process: {'procName': 'mysqld.bin', 'parameters': '--defaults-file=/usr/local/zenoss/mysql/my.cnf --basedir=/usr/local/zenoss/mysql --datadir=/usr/local/zenoss/mysql/data --user=m', '_procPath': '/usr/local/zenoss/mysql/bin/mysqld.bin'}

 

Zen process on the other hand will only fine one of them:

2011-11-10 21:47:45,814 DEBUG zen.zenprocess: Found process 1070 on usr_local_zenoss_mysql_bin_mysqld.bin

2011-11-10 21:47:45,817 DEBUG zen.zenprocess: Found process 15491 on usr_local_zenoss_mysql_bin_mysqld.bin

 

2011-11-10 21:47:45,842 DEBUG zen.zenprocess: Queueing event {'monitor': 'localhost', 'component': '/usr/sbin/mysqld', 'agent': 'zenprocess', 'summary': 'Process not running: /usr/sbin/mysqld', 'manager': 'localhost6.localdomain6', 'eventGroup': 'Process', 'eventKey': '/Processes/MySQL/osProcessClasses/mysqld', 'device': 'myth', 'eventClass': '/Status/OSProcess', 'message': "Process not running: /usr/sbin/mysqld\n Using regex 'mysqld' \nAll Processes have stopped since the last model occurred. Last Modification time (2011/11/10 21:42:39)", 'severity': 4}

 

2011-11-10 21:47:45,843 DEBUG zen.zenprocess: Queueing event {'monitor': 'localhost', 'component': '/bin/sh', 'agent': 'zenprocess', 'summary': 'Process not running: /bin/sh', 'manager': 'localhost6.localdomain6', 'eventGroup': 'Process', 'eventKey': '/Processes/MySQL/osProcessClasses/mysqld', 'device': 'myth', 'eventClass': '/Status/OSProcess', 'message': "Process not running: /bin/sh\n Using regex 'mysqld' \nAll Processes have stopped since the last model occurred. Last Modification time (2011/11/10 21:42:39)", 'severity': 4}

 

Anoying thing is they are still running:

ps aux | grep mysqld

mysql     1070  0.1  0.5 848600 11732 ?        Ssl  10:24   0:52 /usr/sbin/mysqld

root     15432  0.0  0.0   4220   620 pts/0    S    20:50   0:00 /bin/sh /usr/local/zenoss/mysql/bin/mysqld_safe --defaults-file=/usr/local/zenoss/mysql/my.cnf --port=3307 --socket=/usr/local/zenoss/mysql/tmp/mysql.sock --old-passwords --datadir=/usr/local/zenoss/mysql/data --log-error=/usr/local/zenoss/mysql/data/mysqld.log --pid-file=/usr/local/zenoss/mysql/data/myth.pid --lower-case-table-names=1 --default-table-type=InnoDB

mysql    15491  0.1  1.2 196468 25552 pts/0    Sl   20:50   0:04 /usr/local/zenoss/mysql/bin/mysqld.bin --defaults-file=/usr/local/zenoss/mysql/my.cnf --basedir=/usr/local/zenoss/mysql --datadir=/usr/local/zenoss/mysql/data --user=mysql --pid-file=/usr/local/zenoss/mysql/data/myth.pid --skip-external-locking --port=3307 --socket=/usr/local/zenoss/mysql/tmp/mysql.sock --old-passwords --lower-case-table-names=1 --default-table-type=InnoDB

zenoss   18318  0.0  0.0   9140  1064 pts/0    S+   21:56   0:00 grep --color=auto mysqld

 

This used to work just fine in zenoss 3.1.x

 

If you massage the regex so to exclude the one it finds, example change 'mysqld' to 'mysqld$'  the system will start showing /usr/bin/mysqld as up, won't find mysqld.bin (as expected)  so zenoss should see the process just isn't displaying it correctly.

 

If you want my zenmodler or zenprocess log files let me know.

  • Luca Maranzano Rank: White Belt 26 posts since
    Feb 4, 2010
    Currently Being Moderated
    1. Nov 22, 2011 9:06 AM (in response to brockp)
    Re: 3.2.X Process monitoring bugs

    Hi!

    Same problem is occurring on our Zenoss 3.2.1 just upgraded from 3.1.0.

     

    Besides, from Infrastructure -> Processes -> Process Instances all entries are marked in RED as DOWN!

     

    I'll try to delete some Process and recreate from scratch to see it the error persists.

     

    More later,

    Luca

  • wizard113 Newbie 5 posts since
    Nov 25, 2008
    Currently Being Moderated
    2. Jan 3, 2012 8:33 PM (in response to brockp)
    Re: 3.2.X Process monitoring bugs

    Have you tried restarting zenprocess?  I ran into the same thing as I tried to monitor for the puppetd process, and restarting zenprocess did the trick.

  • nozen Rank: White Belt 38 posts since
    Dec 13, 2010
    Currently Being Moderated
    3. Jan 4, 2012 3:30 AM (in response to wizard113)
    Re: 3.2.X Process monitoring bugs

    i'm running 3.2.1 and i've seen the problem your having i'm only just starting to look at process monitoring.

     

    but i found this post to be most useful and this solved my linux process monitoring.

     

    i can't find the thread but it was on this forum and i used it today, bascially go through in this order:

     

    1. Delete process

    2. model device

    3. add process and include any changes e.g. error level, zmonitor etc

    4. model device

    5. should be ok now

     

    its not quite what you talking about it but i'm curious to see if it works.

     

    also check out this ticket

     

    http://dev.zenoss.org/trac/ticket/7870

     

    cheers,

    nozen.

  • Luca Maranzano Rank: White Belt 26 posts since
    Feb 4, 2010
    Currently Being Moderated
    4. Jan 5, 2012 5:22 PM (in response to nozen)
    Re: 3.2.X Process monitoring bugs

    Restarting zenprocess didn't help and even recreating the process the problem is still present.

     

    In the ticket 7870 on trac someone from Zenoss posted a message saying it has been fixed, but it seems impossibile to have any update about this issue (patch, new minor release, bho!), despite several requests from different users.

    This is quite annoying IMVHO.

     

    Still waiting.

    Cheers,

    Luca

  • jshardlow Rank: Green Belt 98 posts since
    Jun 12, 2008
    Currently Being Moderated
    5. Jan 30, 2012 6:24 AM (in response to Luca Maranzano)
    Re: 3.2.X Process monitoring bugs

    Have just upgraded to 3.2.1 myself and this is driving me crazy. In my case we have a script running a daemon (both found in ps/snmpwalk). Zenoss will pick up both, but one will be marked up and the other down. Annoying as same setup was working fine in 2.5.2.

     

    Even messing about with regex to try and ignore the script process doesn't seem to help either.

  • Luca Maranzano Rank: White Belt 26 posts since
    Feb 4, 2010
    Currently Being Moderated
    6. Jan 30, 2012 4:50 PM (in response to jshardlow)
    Re: 3.2.X Process monitoring bugs

    We have similar issues only for certain similar process, still waiting the fix, see the Ticket 7870 on Trac.

     

    Really annoying!

     

    Cheers

    Luca

  • omeganon Rank: White Belt 69 posts since
    Jun 23, 2011
    Currently Being Moderated
    7. Jan 31, 2012 10:48 AM (in response to brockp)
    Re: 3.2.X Process monitoring bugs

    This has been a longstanding bug that I've seen since 2.5.2. Process monitoring is very unreliable with several long running false positives daily. They'll show as down for hours or days then return to up status with no change on the device being monitored. I too am eagerly awaiting the fix for #7870 and have been watching it for months...

  • thedada Rank: White Belt 14 posts since
    May 16, 2012
    Currently Being Moderated
    8. Jun 25, 2012 10:46 AM (in response to omeganon)
    Re: 3.2.X Process monitoring bugs

    Same problem here,

     

    Did you get some patch since the 31/01 ?

  • jshardlow Rank: Green Belt 98 posts since
    Jun 12, 2008
    Currently Being Moderated
    9. Jun 25, 2012 11:00 AM (in response to thedada)
    Re: 3.2.X Process monitoring bugs

    As far as I know, and according to http://dev.zenoss.org/trac/ticket/7870 the Zenoss devs have no intention of fixing this problem in 3.x. Their advice is fix it yourself, or to wait for 4.x in which it has allegedly been fixed.

     

    Today I've loaded up the 4.x beta (currently 4.1.70) and the problem doesn't seem to exist any more. I've got some more testing to do, but at this rate I'm going to skip putting 3.x on my prod servers as it's not fit for purpose and will go to 4.x.

More Like This

  • Retrieving data ...

Legend

  • Correct Answers - 4 points
  • Helpful Answers - 2 points