No Node Left Behind

Zenoss Blog: No Node Left Behind

1 Post tagged with the los_alamos tag
roadrunner.jpg

Just in time for this week's SC09 High Performance Computing (HPC) conference comes the announcement from the Los Alamos National Laboratory (LANL) HPC Division that they are using Zenoss to monitor their HPC large scale clusters.  LANL currently has the #2 fastest supercomputer in the world, and they use a modified version of Zenoss to monitor it.  They are working to share their customizations and there is a High Performance Computing Development area dedicated to the work and an HPC group for further collaboration on using Zenoss in your HPC environment.

 

LANL HPC Deployment of Zenoss

Los Alamos National Laboratory High Performance Computing Division is currently deploying Zenoss with some modifications to monitor high performance large scale clusters.   We have created several ZenPacks that help to extend Zenoss in the areas of issue tracking, asset tracking, and scalability.  Attached is a file that gives a high level description of the enhancements LANL made to Zenoss for our deployment.

 

The basics are in place, but there are lots of opportunities for contribution. Here’s a partial list of things we think would be of great use to the HPC that we won’t get to any time soon:

• Direct feed of resource manager job allocation data

• Increased automation of event-->issue roll-up

• Performance data from the nodes

• End to end I/O subsystem view

• After-the-fact automated event/issue correlation

• Continuing filter/mapping refinement

• Better high-level reporting facilities

• Alternate visualization of data across event, performance, environmental data categories

• Appropriate and relevant monitoring data and rates for HPC Center networks.

We are currently working within our organization to authorize approval for sharing of the enhancements we have made.  Our next steps will involve working to get our changes integrated with the newer versions of Zenoss.  We look forward to working with others to add more functionality specific to high performance computing.

4,103 Views 3 Comments 0 References Permalink Tags: zenoss, core, hpc, lanl, sc09, los_alamos, supercomputing, roadrunner