No Node Left Behind

1 2 3 ... 32 Previous Next

Zenoss Blog: No Node Left Behind

470 Posts

If you have ever wondered what kind of crazy thinking goes into the decision to start an open source business (because we're all at least a little crazy, make no mistake), Tarus Balog gives an entertaining and informative look at how he started OpenNMS as an open source venture over on Opensource.com.

 

 

What struck me as interesting is Balog's ready admittance that sometimes it's not always about idealism:

"You might think that I was motivated by some sort of idealistic love of open source software. Nothing could be further from the truth. At the time, I was still running a Windows desktop. I undertook the OpenNMS project because I believed one thing: in the area of network management, open source represents the best business solution."

The start of a new series on this relatively new open source site sponsored by Red Hat, this should be a great look inside the world of open source business.


Okay, you gotta give props where props are due. Congratulations to Nagios for taking first place in their category  2009 LinuxQuestions.org Member's Choice Awards. Nagios was voted Monitoring Application of the Year by LinuxQuestions.org members, with 51.11 percent of the vote.

 

"This is another exciting achievement for everyone involved in Nagios and we're grateful to everyone who helps make it such a great IT monitoring solution" remarked Nagios Founder Ethan Galstad in a press release earlier this month.


Zenoss scored some high marks of its own, as part of a roundup of open source monitoring tools on TechTarget's SearchNetworking site.

"'When I first needed a network monitoring tool, I had no funding for a commercial alternative,; said David Nalley, a Unix administrator for document management solution provider KeyMark, who uses the Zenoss open source management tool.

"Zenoss, which recorded a 150% increase in revenue during 2009, counts 300 enterprise customers and more than 1 million downloads of its Zenoss Core open source project code."

Be sure to check out the review for a good look at the current state of the open source network management ecosystem.


Ruby devs, take note: there's now a Ruby library for accessing Zenoss through its REST interface available out on RubyGems.org.

 

It should be noted that this new library is clearly a work in progress, with new functionality being added all of the time. Still, it's a good way to get some Ruby-based hooks into the Zenoss toolset.

187 Views 0 Comments 0 References Permalink Tags: zenoss, nagios, api, droplets, ruby, opennms

The Zenoss in the Clouds ZenPack Contest has ended and the winners have been selected. The breadth and depth of entries was outstanding, here is the list of finalists. After consulting with our judges, these are the winners:

 

Grand Prize Winner

  • libvirt Virtualization - David Nicklay - The libvirt Virtualization ZenPack was selected as the Grand Prize for greatly expanding the number of virtualized technologies monitored by Zenoss. The ZenPack has been tested with KVM, QEMU and VMware and should work with Xen, OpenVZ, Virtual Box, Open Nebula and other virtualization technologies. By utilizing the common interface libvirt provides to virtualization technologies, there is now a baseline for a wide variety of virtualization technologies for all Zenoss users. Development and testing are ongoing and the ZenPack will continue to improve.

 

Runners-Up

  • AMQP Event Monitor - David Nalley - AMQP is an open standard application layer protocol for message oriented middleware, frequently used in enterprise business and cloud environments.
  • Ganglia - Jeff Schroeder - Monitors the open source Ganglia cluster/HPC monitoring system.
  • Google AppEngine - Colin Hudler - Comprehensive ZenPack for collecting metrics on Google App Engine applications.
  • Puppet - David Nicklay - Monitors the status of the Puppet master daemon, tracking and reporting on the status of clients and inserting new devices into Zenoss for monitoring.

 

For a complete list of all the Community ZenPacks that are available, please visit: http://www.zenoss.com/community/projects/zenpacks/

For Community ZenPack development and further information, please visit the Community ZenPack Repository

 

Thanks again to everyone who contributed their ZenPacks to the Zenoss Community!

300 Views 0 Comments Permalink Tags: community, google, zenpacks, puppet, cloud, virtualization, libvirt, appengine, amqp, ganglia

Last week, in the course of discussing the rise of DevOps as a development strategy, I mentioned the advent of new tools that would enable more automation to occur in the deployment process, using all of the tools available to both developers and system admins.

 

In that discussion, it was highlighted how DevOps encourages developers and operations staff to come together earlier in the development process so that apps will be more efficient to deploy and run within the production environment. This has become especially critical given the blurring of the functionality between platform and application found within cloud computing and virtual servers.

 

Right now, there are a lot (and then some) of tools available along the application provisioning trail that gets a developed application to its ultimate home. There are bootstrapping apps that directly deploy on virtual and cloud machines (Xen, OpenVZ), or right on the native OS (Kickstart, Cobbler). Then there's the system configuration tools, such as Chef or Puppet.

 

All of these tools, and the others in these classes, tend to do a good job handling their respective tasks. But as cloud and web deployment becomes more fast-paced and hyper-dependent on the strength of the platform on which the app sits, it's critical to have optimum operations performance so these apps will run effectively.

 

With a lot of finagling, it is certainly possible to smooth out the bottlenecks between these systems. Admins can write scripts to adapt to provisioning requirements on the fly. Developers can adjust their code to make such adaptations easier. This is one big benefit of DevOps.

 

But code can change, and data center environments more so, as hardware failures and load balancing can disrupt the most carefully planned deployment plan. Instead of adjusting manually, application provisioners can use a new class of applications designed to automate the entire process.

 

That's the goal of the open source ControlTier project. Founded by Alex Honor, a consultant for DTO Solutions, ControlTier is a "cross-platform build and deployment automation framework." In other words, a way for apps to be deployed efficiently based on previously set requirements and existing conditions.

 

Honor says that "ControlTier is a response to the orchestration problem." In the past, deployments were comprised of a series of small steps, which in turn were combined into larger steps, which were presented as a master to-do list. Each step, small or large, was a series of commands entered in a precise sequence, which might or might not sync with existing business practices.

 

Even after deployment, there can be ongoing needs for application management: rolling updates, coordinated shutdowns and restarts of tiers within the data center, and then status checks of how things went.

 

ControlTier steps in and lets deployment teams build configurations that will automate application deployment based on everyone's needs, not just the developers' or just the admins'

 

By setting up the deployment process in advance, not only does the application get tested, but the release process itself can be tested, too, Honor explained in a recent interview.

 

Honor's boss at DTO, Damon Edwards, agrees. "ControlTier removes the friction in the release process, using specification-driven automation." This is no surprise, since DTO's, formerly ControlTier Software, Inc., business model is geared towards automated infrastructure and process improvement.

 

With ControlTier, commands for all of the tasks needed to deploy can be entered and grouped based on specific priorities and timings. In a sense, it's like a configuration management tool like Chef or Puppet, except it's for app deployment, not systems management. Actually, Edwards explained, ControlTier can tap into these tools and more to direct them to gather information and perform operations necessary for an app deployment.

 

Eventually, Edwards added, ControlTier will be able to coordinate actions using monitoring tools like Zenoss and Nagios to get things done even more efficiently.

 

This type of automation is a great tool to use in any kind of complex data center environment, since it allows command and control functionality to be executed, then remixed as needed to deploy new apps or the same app on different systems. The coordination benefit between development and operations makes it all that much better.

330 Views 0 Comments 0 References Permalink Tags: devops, controltier, datacenter_barometer

175772486_e3cfe6f190_m.jpg

We've finally processed all the entries in the Zenoss in the Clouds ZenPack Contest.  The entries were very diverse and each provides a very useful solution.  Here is a quick rundown of all the ZenPacks entered into the contest, we will announce the winners later this week:

 

 

 

For a complete list of all the Community ZenPacks that are available,  please visit:  http://www.zenoss.com/community/projects/zenpacks/  For  Community ZenPack development and further information, please visit the   Community ZenPack  Repository

 

Thanks  again to everyone who contributed their ZenPacks to the Zenoss  Community

314 Views 0 Comments Permalink Tags: community, google, contest, zenpacks, vmware, puppet, cloud, virtualization, vm, libvirt, appengine, redis, amqp, esxi, ganglia

Have you recently downloaded Zenoss Core, or do you have questions about implementing the solution in your environment? If so, please register to attend our bi-weekly Getting Started with Zenoss Core Webinar. The March 9 session is still open for sign-up, and if you can’t make this session, the next March 23 one is on the schedule. You can register here:

 

Tuesday, March 9 1:00 p.m. EST

Tuesday, March 23 9:00 a.m. EST

 

Here’s what you’ll get out of the session:

  • An introduction to the Zenoss Community
  • Installing the software properly
  • Preparing your environment
  • Logging in to get started
  • Adding, classifying and auto-discovering your devices
  • Getting and staying organized
  • Seeing the “big picture” (dashboard, network map, event console, Google Maps, etc.)
  • Avoiding common mistakes

 

We also have a Zenoss engineer available to answer questions live – and there are usually lots of questions submitted! If you’re interested in seeing past Q&A logs, take a look at some of the previous Getting Started with Zenoss Q&A sessions where we document and upload all of the questions submitted along with answers.

283 Views 0 Comments Permalink Tags: zenoss, community, core, zenoss-core, getting-started, intro

You have to be careful when writing headlines. The wrong title for an article can bring a slew of readers into your site expecting one thing and getting another, and--worse--spoiling for a fight.

 

That was my initial reaction when I saw the headline "Linux Management and Monitoring Lacking" over on LinuxPlanet a while back. Excuse me?

 

Turns out the headline missed a critical word or two on the end, such as "Convergence." I know that because the Charlie Schluting piece was a re-post of the original article over on Enterprise Networking Planet, entitled "Time to Converge Monitoring and Management in Linux and Unix." Much less nerve-jangly.

 

In the ENP piece, Schluting argues that there's a disconnect between IT monitoring tools, such as Zenoss or Nagios, and configuration management apps like Puppet and Chef. He acknowledges that there's some "loose coupling" between these tools now, but there needs to be more.

 

I would suggest Schluting take a gander at ControlTier, a "cross-platform build and deployment automation framework" which will eventually enable users to automate the functionality between these services and more.

 

 


 

Open Sourcing Data Center Innovation: Another innovative direction for the data center can be found in the launch of the Open Data Center Initiative. The news actually came out in a Statement of Support on the first of the month, stealthily covered by the industry's

Green Data Center Blog. Fortunately, Michael Manos, Sr. VP of Digital Realty Trust, decoded the news in his LooseBolts blog later last week.

 

In a nutshell, the new project will apply open source collaborative methods to data-center design, both in software and hardware. I, for one, will be very interested to see what comes out of this project.

 

 


 

O'Reilly Gets Its Online Irish On: Those of you interested in web operations as a broader concept, take note: O'Reilly's free

Velocity Online Conference is kicking off in just over a week. The online event will take place from noon-2:15 p.m. EST (1700-1915 GMT) on March 17. Registration is free, and you won't even have to wear green.

 

 


 

Zenoss Core Moves Forward: As you may have read elsewhere, Zenoss announced the release of Zenoss Core 2.5.2, which will include "monitoring capabilities for the Xen Hypervisor via the Zenoss Xen monitoring plug-in, or Xen Virtual Hosts ZenPack." If you have any interest in virtual management, check out the new GPL release today.

 

Another new contribution to Zenoss was announced last week by Allen Sanabria, who's put together a script to automatically add multiple datapoints to Zenoss all at once, instead of one at a time. Sanabria claims the script for the Zenoss API is not finished yet, but Zenoss users may find it useful now.

353 Views 0 Comments Permalink Tags: puppet, chef, zenoss_core, xen, droplets, data_center, controltier, velocity_conference

You might think that Henry Ford, inventor of the Model T automobile, was also the inventor of the conveyor belt, given its importance within the manufacturing process for his cars. In fact, though Ford is credited with first implementing the conveyor belt/assembly line process in 1913, the invention of the actual modern belt system goes to Swedish company Sandvik, which came up with a steel conveyor belt in 1901.

 

The impact of the conveyor belt and the subsequent assembly-line manufacturing process that evolved from its use is felt in almost every thing produced today. The methodology extends beyond the assembly line. Product design is typically done in a serialized, straight-line fashion: subcomponents A1 through A11 are designed before building component A, and so on down the line. Product delivery also taps into the assembly-line ethos: goods are directly shipped in the same modular containers from factory to ship to train to truck to distribution center.

 

This methodology is often used for the software you're using now, too. Applications are designed in discrete phases, then coded, then tested, then packaged, then launched. Hopefully without flaws.

 

Launching software in a complex data center environment is a bit more complicated than burning an package onto a CD and shipping copies out to be loaded onto each machine. Real-time business practices must be adhered to, and data center environments are often shifted by the operations staff to meet the needs of those business practices, as well as the physical demands of the machines themselves.

 

So, developing in such an environment is much akin to pointing a gun at a target a mile away with only a notion of where the target will be by the time the bullet gets there. To compensate, development teams will either take up time to launch major point releases at a slower rate, creating more stable software that is perpetually behind the curve, or more recently will use a leaner iterative approach that overlaps the phases of development with launch early, launch often approach in the hopes of keeping up with the business and environment requirements with a series of small iterations to the code.

 

Enter the philosophy of agile development: a natural outgrowth of iterative development where traditional business requirements are actually de-emphasized (because often end-users don't know all the requirements) in favor of designing products with only some known requirements. End-users get involved in the design and coding process as much as possible so eventually only their true requirements are built into the software, as opposed to features they may not need.

 

Allowing the users to circle back to the beginning of the software design process instead of keeping them as passive recipients of the end product is a big part of what agile development is all about. While agile practices are present in proprietary software, anyone who's participated in an open source project will recognize many of the techniques.

 

The whole agile notion of getting users and developers is gaining traction within IT shops, and a growing application of the movement can be found in DevOps, where agile practices are applied to both the development and operations sides of the team.

 

DevOps, also referred to as agile systems administration, is a big part of how Kris Buytaert, a Senior Linux and Open Source Consultant with the Belgian firm Inuits, likes to create apps together for business. Buytaert describes himself as a developer who "then became an Op" and as such, began to see the challenges facing both sides of the application deployment process.

 

Operations staffers are usually invited to the application party too late to affect any real impact on the very applications they are expected to deploy and use. Developers were often oblivious to the load and memory usage demands of the environments to which they were sending their finished apps, which database systems were best to use, and so on.

 

"People think that operations work starts on deployment," Buytaert explained in a recent interview. But--especially with web app development--operations needs to to be involved with the platform and the application at a much earlier stage, he added.

 

By getting operations and development staff together on application creation sooner, non-functional requirements, like security, high-availability, and monitoring, can be discussed an properly Incorporated into the application at the design phase. As development proceeds, the DevOps method should allow for better version control, bug tracking, and deployment methods because developers will be more in tune with their target environment (testing or production).

 

While this all makes sense from an objective viewpoint, there are hurdles to getting DevOps practices going.

 

"The hardest issue is the human factor," Buytaert said, as operations and development teams have long held on to their own turfs not just from a sense of territorialism but also because their own performance is often only measured with metrics related to their own job responsibilities. If an operations staffer has certain metrics to meet in the server room, they may be reluctant to take time away now to work on application development that will affect them later.

 

Slowly but surely, though, both developers and admins are beginning to see that a little investment in time and expertise earlier in the application process could have big positive benefits later.

 

The assent of DevOps is being assisted by web application development, where systems and applications are more closely aligned than ever. Developers have found themselves dealing with more op issues, and admins are doing a lot of scripting on the fly to automate as much of their work as possible. With the merging of their responsibilities happening anyway, DevOps as a formal practice has become all the more attractive. The benefits of development/operations interaction for web deployment is most clearly illustrated in a presentation at last June's Velocity conference, where Flickr's John Allspaw and Paul Hammond highlighted how the photo sharing website can manage 10 or more deployments per day.

 

Buytaert is more than just a vocal advocate of DevOps, though he does that well. He is also involved with the organization of Devopdays, a conference that sprang from regional meetups happening in London and Belgium a couple of years ago. Other than these local events, and a set of meetings at FOSDEM, there was no centralized DevOps event, until the first Devopsday conference in Ghent, Belgium in October, 2009.

 

Now the conferences are growing. May 1-2 will see the next event, Devops Down Under, in Pyrmont, Australia, just outside Sydney. The following month, the US will play host to its first DevOp event, the DevOps Day USA conference, to take place on July 25 in Mountain View, CA. Both events are positioning themselves as continuations of the conversations started at the Ghent conference last year.

 

As the conversation continues, both sides are finding new opportunities to not only contribute ideas, but also automate their processes to further enhance the development-to-deployment process. These tools are starting to deliver full integration between source control, testing, and monitoring. With this new class of apps, the DevOps practice may become a measurable, quantified part of application development even sooner.

924 Views 0 Comments 0 References Permalink Tags: development, day, agile, datacenter, barometer, devops, usa, devopsday

Promoted from the QA Test Blog:

 

The final Zenoss 2.5.2 release is now available for download and installation.  Zenoss 2.5.2 is our largest maintenance release since QA has been tracking, with over 145 fixes between internal and external combined.  It also includes  the new Xen Virtual Hosts Core ZenPack, which allows monitoring of Xen servers.

 

While development with the new trunk UI continues, we felt that one last QA Test Day to cover the 2.5.2 maintenance release was called for.  Thursday, March 4th, from 10am until 5pm EST, the Zenoss QA team will be  available for answering questions and testing any issues that may arise with your upgrades from 2.5.1 and 2.4.x to 2.5.2.

 

The code can be found on the normal download locations.  For a list of the tickets fixed, and to view some important release details, please reference the  Zenoss 2.5.2 Release Notes.

 

For those of you that wish to join, we will be running this session in IRC and in the zenoss-testing forum.

Server: irc.freenode.net (port 6667)

Channel:  #zenoss-testing

 

We'll record  a transcript of the day's conversations and links will be available from the Testing and IRC pages.

 

381 Views 0 Comments Permalink Tags: zenoss, community, core, irc, king-crab, qa, test, upgrade, 2.5.2, upgrades

Now Available: Zenoss 2.5.2

Posted by Matt Ray Mar 2, 2010

We are pleased  to announce the Zenoss Core 2.5.2 maintenance  release, now available for  download from:

http://community.zenoss.org/community/download

 

Version 2.5.2 of Zenoss Core offers:

  • Improved  reliability and performance, with a focus on the new event console  introduced in the prior version.

  • A new Xen Virtual Hosts ZenPack for monitoring Xen para-virtualized domains  and their guests.  This ZenPack was previously available in Zenoss Enterprise.

  • More than 50 new ZenPacks contributed by the community since the release of 2.4

Prior  2.5.x versions of Zenoss Core offer these new features and  improvements:

  • A newly redesigned Event Console  offers inline event filtering and improved usability. A new "Event  Details" pane helps streamline troubleshooting tasks.

  • A new Community Site Window Portlet that provides  easy access to Zenoss information resources.  Zenoss wishes to thank Community member Ian Smith for providing this functionality, now incorporated in Zenoss Core.

  • The Amazon Web Services™  ZenPack, which allows you to monitor the performance and availability  of Amazon Elastic Compute Cloud™ (Amazon  EC2™) Web services.

 

The 2.5.2  Zenoss Core release notes are available from the Documentation page in  PDF and HTML formats:

http://community.zenoss.org/community/documentation/official_documentation/release_notes

 

Installation  and upgrades from earlier versions are covered in Zenoss Core  Installation, also available in PDF and HTML formats from the Documentation page: http://community.zenoss.org/community/documentation/official_documentation/installation-guide

 

Zenoss  thanks everyone who contributed to the testing effort for this release!

1,020 Views 1 Comments Permalink Tags: zenoss, community, zenoss-core, release, king-crab, maintenance, stable, 2.5, 2.5.2

Zenoss developers will be available for questions on Thursday, March 4 at 11am EST in the #zenoss IRC channel on irc.freenode.net (port 6667). Please drop in and bring your questions, answers,  suggestions and feedback.  Zenoss Developer Eric Miller and other developers will be available to answer your questions on  Zenoss, the 2.5.2 release and anything else you want to discuss.

 

There will also be a QA Day going on concurrently in the #zenoss-testing IRC channel.  The subject is 2.5.2 upgrades.

 

We’ll log the session and repost it here if you can’t make it.

 

Don’t forget you can search for answers to common questions by visiting the Zenoss Forums.

259 Views 0 Comments Permalink

The Zenoss Core project is driven by community participation.  We appreciate everyone who contributes to Zenoss by answering forum questions, contributing documentation and ZenPacks and recommending the software to others.  Zenoss Masters are at the top of the class when it comes to contributing to Zenoss and we are proud to recognize two new "Zenoss Masters", Chris Hubbard and Tom McNicholas.  We would like to thank them for their many contributions to the Zenoss Community and welcome them as our newest Zenoss Masters.

 


City of Houston Trims Monitoring Costs with Zenoss

Find out how Zenoss and Community Partner, Pate Consulting, gave the City of Houston greater visibility and control over their network devices while reducing monitoring costs by 500%.

 

Read More >>

Tip of the Month:Performance Tuning for Storage Tuning

This month's tip comes from Zenoss Community member Chris Krough.  Chris posted this in the Wikiwith the hope that other users will update it with their findings.

 

Read More >>

March Madness Special for Zenoss Community Users

During the month of March 2010, any Zenoss community user who converts to an enterprise subscription of 250 monitored devices or more will receive two seats of the new Advanced Training Course for free ($4,000 value). Anyone that upgrades to a 500 device subscription will receive a QuickStart Deployment package for free ($9,000 value).  If you are interested, please send an email to sales@zenoss.comreferencing  the March Madness Promo.

 

Contact Sales >>

Zenoss Cloud and Virtualization Survey

Please take part in the Zenoss Virtualization and Cloud Computing Survey and share your experiences with other Zenoss Community members. We will randomly draw one winner from all survey participants for a new Google Nexus One Phone(unlocked).

 

Read More >>

Zenoss Community Day - Austin, Texas

We just finished up a full day of training in Los Angeles and our next stop is Austin, TX, home of the Zenoss Engineering Center.  Zenoss Community Day is a full day of instructor led training. For those who can't attend we recorded a full day of training from this fall's session in Baltimore so you can have the benefit of the training wherever you are.

 

Read More >>

Zenoss is Hiring

Zenoss is looking for a Senior Software Developer that will focus on designing and implementing new features for Zenoss Enterprise and Core. Also Zenoss is recruiting Client Support Engineers to service our growing customer base. Check out our careers page and join a winning team today!

 

Read More >>

Webcasts on Demand

Learn how to overcome virtualization management challenges and minimize data center disruptions in this webinar:

Conquering Your Top 5 Virtualization Management Headaches

 

Read More >>

 

Thank you for your interest and support of Zenoss.

Mark Hinkle, VP of Community

 

Mark R. Hinkle
Vice President, Community
Zenoss Inc.
Follow me on Twitter: twitter.com/mrhinkle

384 Views 0 Comments Permalink Tags: zenpacks, survey, virtualization, zenmasters

175772486_e3cfe6f190_m.jpg

We're fast approaching the end of February and that means that the Zenoss in the Clouds ZenPack Contest submission deadline is fast approaching.  There are several really great Cloud ZenPacks already published and more are in the publishing queue.  All submissions must be in by March 1, so get those in now!  You can always make improvements after they've been submitted!  We'll announce the entries shortly after the deadline, then announce the winners after they've all been published and the judges have had a chance to check them out.  So send them in!

422 Views 0 Comments Permalink Tags: community, zenpack, monitoring, contest, zenpacks, cloud, cloudcomputing

This month's tip comes from Zenoss Community member Chris Krough.  Chris posted this in the Wiki with the hope that other users will update it with their findings.  Performance Tuning for Zenoss Storage is the link, here is the current version.

=================================================================================

 

Introduction

=================================================================================

 

This post discusses disk subsystem and process tuning options for running high volume Zenoss installations. The information is based on 64 bit Red Hat Enterprise Linux, but should apply to most Linux distributions supported by Zenoss.

 

  • General Zenoss Performance Bottlenecks
  • Filesystem tuning for configurations using standard spindle drives.
  • Filesystem tuning for configurations using solid state storage.

 


General Zenoss Performance Bottlenecks
=================================================================================

 

One of the frequently asked questions in the #zenoss IRC channel and the Zenoss forums is 'How big should my zenoss server be?'. Unfortunately no formula exists for calculating server size based on the number of monitored devices. There are several factors that affect the load on your monitoring infrastructure. The type of resource limitation your system will experience (block, CPU, memory) depends on what you are monitoring, how long you keep the data, how frequently you collect the data, responsiveness of the monitored equipment, and the performance of the networks the monitoring traffic will traverse.

 

Many administrators who are new to Zenoss will understandably try to scale their monitoring infrastructure based on the number of devices they intend to monitor. While this is a good start, Zenoss monitoring daemons are focused on RRD Datasources, not devices. When determining hardware requirements for scaling zenoss you should consider the number of monitored datasources over the number of monitored devices. A server providing a small number of services may only require a dozen monitored datasources whereas a large server or router could require thousands or even tens of thousands of monitored datasources.  Additionally, the amount of data recorded for each of those datasources affects the size of the resulting RRD file, which has a significant impact on block IO. Long consolidation periods, additional data (Holts Winters calculations, etc...), and the number of RRAs in an RRD file will all affect the total size of the RRD file and consequently drive changes in the amount of disk IO.

 

For most installations the disk subsystem is the first and most significant bottleneck. Zenoss records a single datasource per RRD file. The total number of RRD files in your system will equal the total number of datasources being monitored. Each of these RRD files needs to be opened, searched, modified, and written during each polling cycle. If you have an installation monitoring 50,000 datasources and the typical polling cycle of 300s, that is roughly 170 RRD file updates per second. Note that 170 is the number of files touched per second and not the number of transactions, which is likely much larger. The amount of data to be written and the number of disk transactions created during a polling cycle can easily exceed the write and IO speed of a single drive. The solution to overcoming single drive performance limitations is to use RAID storage arrays. Many users find that a high density RAID10 array provides the best balance of cost to performance for medium to large installations. Some very large installations will require dedicated high speed SAN storage or even solid state storage. Below I will suggest some storage configurations for both standard spindle disks and solid state storage on large zenoss installations.

 


Filesystem tuning for configurations using standard spindle drives.
=================================================================================
* RAID level

 

Due to the high throughput required by RRD updates, RAID10 is typically the most appropriate RAID level for Zenoss performance data. RAID10 provides excellent performance and redundancy at the cost of storage space. RAID5 is not an appropriate RAID level for the $ZENHOME/perf partition, the parity checks introduce too much overhead into the writing process. It is common for administrators to build out the a Zenoss server using a RAID1 or RAID5 array for the OS and related software (mount point '/') and a RAID10 on dedicated drives for the performance data (mount point '$ZENHOME/perf'). Dedicating a RAID10 array to $ZENHOME/perf helps the operating system and executables access data without being held up by RRD file updates. RAID level 10 performance increases as you increase the number of drives in the array.


* Filesystem Tuning

 

The second consideration for standard spindle drive storage is the choice of Filesystems. There are a number of high performance file systems available in most linux distributions. I am going to focus on EXT3 and EXT4 as they are the most common ones in use and come standard with RHEL and CentOS. ReiserFS and XFS may also be excellent choices for the $ZENHOME/perf partition, but I have not tested against them. When using RAID levels 0, 4, 5, and 6, it is beneficial to align filesystem blocks with RAID stripes. The mkfs command accepts arguments specifying the block size, stride, and stripe width of the filesystem. The following is an example calculation of filesystem options based on common settings for a six drive RAID10 array (mirror of stripes). Note that since we are dealing with a 'mirror of striped drives' we only need to be concerned with the RAID0 portion of our RAID10. The RAID1 portion of the array is not affected by filesystem alignment. To determine the correct filesystem arguments you can use the quick calculator here: http://busybox.net/~aldot/mkfs_stride.html or you can use the formula below.

 

Type of RAID:                 0
# of data disks:               3 (3 on one side of the mirror, 3 on the other side)
Filesystem block size:    4k
RAID Chunk size:           64K

 

stride = (chunk size / fs_block_size)
stripe = stride * #ofDataDisks

 

stride = (64k / 4k) = 16k
stripe = 16k * 3 = 48k

 

The appropriate filesystem creation options for the 6 disk RAID10 above are:
  mkfs.ext4 -b 4096 -E stride=16 -E stripe-width=48 /dev/xxx

 

* Kernel Elevator Tuning

 

The Linux kernel IO subsystem processes disk reads and writes according to scheduling algorithms known as Elevators. There is an excellent description of Elevators and tuning at http://www.redhat.com/docs/wp/performancetuning/iotuning/index.html. The default scheduler for Red Hat Linux 5 is the CFQ (Completely Fair Queuing) elevator. The available schedulers are noop, anticipatory, deadline, and cfq. The default 'cfq' scheduler or the 'deadline' scheduler are the best choices for most Zenoss installations. As each scheduler has different advantages and disadvantages it's best to try each in your environment. The scheduler can be specified by adding the "elevator=" option to the kernel line in grub.conf as follows:

 

kernel /vmlinuz-2.6.31.12-174.2.3.fc12.x86_64 ro root=/dev/mapper/vg_nergal-lv_root  LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us elevator=deadline

 

* EXT Mount Options

 

Journaling

 

By default, the EXT3 and EXT4 filesystem mounts with a journal in 'ordered' mode. In 'ordered' mode journal updates are committed to disk before any data is written. You can increase disk throughput by mounting the filesystem with the journal in 'writeback' mode. In 'writeback' mode journal updates are written to the disk according to the normal schedulers and data writes to disk are not held back waiting for journal updates as with 'ordered' mode. When using 'writeback' mode you should see a performance improvement, but it comes at a slight increase in data integrity risk due to unwritten data being lost in the event of a crash. Mounting with the journal in 'writeback' mode should only be done on system with a RAID controller that has an internal battery backup.


Timestamping

 

By default, the EXT filesystem mounts with the 'atime' option. 'atime' updates the inode access time each time a file is accessed. This update is typically unnecessary and creates a lot of extra writes. You can disable this with the 'noatime' mount option. The option 'nodiratime' is implied when using the 'noatime' option.

 

The final /etc/fstab line for the options above for a RAID10 partition at /dev/sdb1 is:

 

/dev/sdb1                /opt/zenoss/perf        ext4    noatime,data=writeback         0 0

 


Filesystem tuning for configurations using solid state storage.
=================================================================================

 

Very large configurations may find it more cost effective to use solid state storage drives or cards. Solid state drives offer a huge improvement in read/write speeds and total transactions per second.

 

Moving to SSD storage significantly increases the capability of the disk subsystem, allowing you to monitor more datasources per collector, but it also moves your IO bottleneck to other components in the system. After moving to solid state drives I found the zenperfsnmp daemon itself to be a major bottleneck. Zenperfsnmp is restricted to a single thread on a single core. SSD storage may be capable of updating RRD files much faster than a single core running zenperfsnmp can handle, resulting in most of the CPU cores and the drives idling while data is being created. Previously, with spindle drives in RAID10, zenperfsnmp running on a single core was able to produce RRD data faster than the disk subsystem could write it, leading to block IO being the bottleneck. One solution for this single core problem is to run multiple collectors on a single server. From the Zenoss GUI it appears that you have several collectors, when in fact you just have multiple copies of zenperfsnmp (or any other daemon) running in parallel on a single server. Unfortunately these individual instances of monitoring daemons will not share a common list of monitored devices. You will need to manually distribute the devices across the collectors, which can be done fairly easily using a python script in zendmd, or tediously in the GUI.

 

Example of creating multiple collectors on a single collection server:

  1. Create a new hub 'h0' on the master Zen server under Settings > Collectors
  2. Set the number of workers to 3 in h0_zenhub.py
  3. In the GUI, create 3 new collectors 'h0c0','h0c1','h0c2' on the server with solid state storage, with all three collectors reporting back to 'h0'
  4. Edit each of the collectors to have a different port for the "Render URL", edit the h0cn_zenrender.conf 'httpport' entry to reflect the correct port
  5. Assign these three new collectors as performance monitors for devices.

 

Now you should have 3 collectors; h0c0, h0c1, and h0c2, running on one of your distributed collectors (hopefully with SSDs). All three collectors will report performance data back to the three hub 'h0' workers on the master server. Each collector will spawn it's own performance collection daemons, resulting in higher CPU utilization for the server they run on.

 

Special Considerations for Solid State Storage

 

Solid state drives rely on flash memory cells for storage. Flash memory cells are reliable for a limited number of writes, after which they cannot be counted on to maintain data. Solid state drive controllers work around this limitation by 'write balancing', or distributing writes across cells to increase the average amount of time each cell is usable. There are two types of cells in use, MLC (Multi-Level) and SLC (Single-Level). It's important to understand the performance and reliability differences between the two types of cells. Based on current numbers, ML cells are 'reliable' for around 10,000 writes, SLC for around 100,000. On top of understanding this limitation in the number of reliable writes, you should also understand the concept of write amplification. Write Amplification is an effect where the minimum number of writeable blocks on a drive causes the number of writes to increase dramatically, significantly reducing the life expectancy of the drive. For an excellent technical introduction of SSD drive design and function, read and reread Anand's SSD article here: http://www.anandtech.com/storage/showdoc.aspx?i=3531&p=1. Be sure to have a solid understanding of these concepts before attempting to put solid state storage into production for Zenoss. RRD file storage is particularly abusive to solid state storage due to the high number of small random reads and writes needed when updating RRD files. Using the expected number of blocks written per day for your environment, and the write amplification factors for the drives you intend to use, calculate the true life expectancy of the SSDs before making a major purchase. Be wary of vendor predictions for solid state drive life expectancy.

 

Solid State storage is a fairly new technology for the server market. Consider running the SSD storage in RAID1 pairs to increase reliability. Some of the tuning options for standard spindle disks do not apply to solid state storage. Solid state arrays should be mounted with the 'noatime' fstab option to help reduce the number of writes to the cells. Some high performance solid state storage drivers may bypass kernel schedulers entirely. Consult the documentation for your storage solution and find out if kernel tuning is discouraged or recommended. It is not necessary to align filesystem stripes for a RAID 1 array. Any other RAID level above 0, whether hardware or software controlled, may reduce the performance of solid state drives.

 

=================================================================================

Thanks again Chris!

781 Views 0 Comments Permalink Tags: community, performance, tip-of-the-month, filesystems, rrd, storage, tuning

Zenoss is extremely proud to recognize 2 new "Zenoss Masters", Chris Hubbard and Tom McNicholas.  We would like to thank them for their many longterm contributions to  the Zenoss Community.

 

Chris has been active in the Zenoss community for nearly 3 years, contributing in the forums as 'guyverix' and with his ZenPacks:

 

 

Tom has also been active in the Zenoss community for nearly 3 years, joining about a week before Chris.  He uses the handle 'twm1010' and frequents the forums and IRC.

 

We are proud  to recognize the many contributions and generosity of these outstanding  members of the Zenoss Community.  On behalf of Zenoss, we would like to  thank Tom and Chris and everyone else who has contributed for their  continued dedication to improving Zenoss.

508 Views 0 Comments 0 References Permalink Tags: zenoss, community, zenoss-masters, masters, zenmasters

Have you recently downloaded Zenoss Core, or do you have questions  about implementing the solution in your environment? If so, please  register to attend our bi-weekly Getting Started with  Zenoss Core Webinar. The February 23 session is still open for  sign-up, and if you can’t make this session, the next ones will be on the schedule soon. You can register here:

 

Tuesday, February 23  9:00 a.m. EST

 

Here’s   what you’ll get out of the  session:

  • An introduction to the Zenoss Community
  • Installing  the software properly
  • Preparing your environment
  • Logging  in to get started
  • Adding, classifying and auto-discovering your  devices
  • Getting and staying organized
  • Seeing the “big  picture” (dashboard, network map, event console, Google Maps, etc.)
  • Avoiding  common mistakes

 

We also have a Zenoss engineer  available to answer questions live – and there are usually lots of  questions submitted! If you’re interested in seeing past Q&A logs,  take a look at some of the previous Getting Started  with Zenoss Q&A sessions where we document and  upload all of the questions submitted along with answers.

2,166 Views 0 Comments 0 References Permalink Tags: zenoss, community, core, zenoss-core, getting-started, intro
1 2 3 ... 32 Previous Next