We are considering supplementing our engineering lab via the use of cloud services such Amazon’s EC2 or Rackspace’s Cloud Servers. Currently, we use a large set of VMWare servers to host our QA testing environment, where we perform installations of Zenoss servers as well as setup target systems (Windows, Linux, etc.) to monitor. In addition, these VMWare servers are connected to the same private network as various routers, storage devices, and other types of OS’s, such as HP-UX, Solaris, etc. All of this is done to simulate what Zenoss users have in their IT environments, and for the most part, the setup works quite well except that there’s never enough. The QA engineers are constantly finding new combinations of OS’s, applications, etc. that require more VM guests to be created. Here is a quick brain dump of the comparisons thus far.
| External cloud
| On-premise virtual infrastructure |
|---|
| Upfront costs | | - $15K-100K for hardware and software. This includes 1-2 TB SAN, servers, and ESX licenses.
|
| On-going costs | - Low IT staff involvement.
- Incremental costs for running servers.
* Rackspace Cloud pricing * Amazon EC2 pricing | - Moderate IT staff involvement. QA team handles most maintenance besides ESX software upgrades, etc.
- No incremental costs for continually running monitoring servers or targets. (Granted, this isn’t entirely true since data center fees, electricity, etc are required, but CPU usage is not a fee-based factor.)
|
| Guest platform flexibility | - Linux flavors are abundant, but Windows OS is very limited.
* Rackspace Cloud platforms * Amazon EC2 platforms | - Practically all Linux and Windows platforms fully supported.
* VMWare ESX platforms |
| Access to monitoring targets | - Difficult to bridge gap to internal devices.
- Would require creating many target VMs in cloud.
- Having to start/stop target VMs would not be a practical approach.
| - All monitoring targets have easy access to shared network resources. This includes many devices that cannot be setup in an external cloud such as enterprise Unix flavors, networking equipment, specific storage devices, etc.
|
| Access to install artifacts | - Artifacts would need to be downloaded over Internet to guest servers. This is both a time and cost consideration.
| - Artifacts server hosts on same high speed Ethernet network as guest servers.
- Build servers can access fixed targets for unit tests.
|
| Other overhead | - Adding an additional environment will require extra cycles for reproducing defects, implementing automation in two locations, etc.
- Having additional setups requires more IT resources to deal with user credentials, multiple support organizations, etc.
| - All developers and QA team already know management interface and how to provision VMs, etc.
|
| Community | - The possibility of providing community access to external cloud resources could be easier due to separation of other corporate resources.
| - Access to our internal lab equipment is kept very secure due to potential access to corporate resources.
|
Obviously, VM sprawl is something that many IT organizations are dealing with, so our situation is far from unique. While some of these comparisons may be generic to lots of development organizations, the systems management aspect brings many additional requirements due to the variety of targets to be monitored. Even though this analysis is in its infancy, having the thought process published will generate ideas from others. Look for further posts as we get closer to making a decision.