Dealing With Virtual Machine Stall

How do you feel when you can’t take full advantage of an opportunity?

Well, it is only human nature to feel disappointed.

But, when it comes to business, it goes beyond mere personal disappointment when an opportunity is missed. Spending may be higher than it needs to be, profitability may be lower, and a competitive position may be compromised.

And, from the IT operations perspective, that is not a place anybody wants to be.

Recently, based on extensive and continuing discussions with customers, there is clearly a critical opportunity being missed for IT operations teams: the opportunity to increase the percentage of virtualized systems, as well as deploy advanced server virtualization features.

But, it doesn’t have to be this way.

VM Stall

Many  IT operations teams in large, complex IT environments have felt the need to put the brakes on expanding the percentage of systems virtualized and/or deploying advanced virtualization features such as high availability. The situation has actually gotten so prevalent that it has been termed virtual machine (VM) stall or “VM stall.” (As opposed to the term “VM sprawl,” which refers to the uncontrolled proliferation of VMs.)

Consistently, the top reason cited by IT operations leaders for VM stall is their teams’ lack of sufficient management visibility and insight. These leaders realize that they are missing out on opportunities to improve IT operational efficiency, and enhance the overall business.

But, they also realize that the alternative creates an even greater risk to the business than the missed opportunities. You can’t increase  the percentage of systems virtualized and/or deploy advanced features without having a sufficient management view across the data center.

Key factors cited by customers and industry analysts as creating these virtual data center (VDC) management blind spots include:

  • Difficulty keeping track of the shifting infrastructure
  • Added management complexity resulting from virtual abstractions
  • Difficulty linking physical and virtual infrastructure to business services
  • Assignments of incidents and problems to the wrong team
  • Excluding operations from decisions to enable new efficiencies associated with deploying advanced virtualization capabilities
  • Proliferation of inadequate toolsets that cannot adjust to the rapid acceleration of virtualization

A recent research report by Enterprise Management Associates research director Jim Frey, “2012 Network Management Megatrends,” highlights the overall IT operational need for greater visibility and insight into the VDC. According to the numbers in this report, almost three-quarters of the respondents (72%) indicated they could use more insight and visibility into their operations.

The EMA report goes on to describe how IT operations teams want and need to:

  • Reduce the flood of IT system-related events, and have line of sight visibility to compute, network, storage, application, and virtualization problems
  • Perform rapid triage and impact analysis across multiple silos of shifting information in both virtual and physical environments
  • Leverage the full value, power, and flexibility of a virtualized environment


So how big is the problem?

Opportunity Cost

Based on our discussions with customers and their experiences in their environments, as well as industry data, businesses lose an average of $336,000 per hour or $5,600 per minute or more in an outage. An average reported incident length of 90 minutes, results in an average cost per incident of approximately $505,500.

Furthermore, many businesses underestimate the costs associated with downtime, especially in virtualized environments.

For example, let’s say a $2 billion enterprise (a representative composite of the many customers we engage) is 50 percent virtualized with 1,000 VMs deployed.  

It is not uncommon for some of them to be investigating a mere ten symptoms per day at a cost of $365,000 annually ($100 per symptom x 10 per day x 365/days per year).

Similarly, seven business- critical outages is not necessarily atypical and can cause 4.7 hours of downtime annually at a cost of about $2,400,000 (4.7 hours x $505,500 per hour = $2,375,850).

The Potential

Fortunately for almost all of these customers who put on the brakes and consciously chose to go into VM stall, there is a light at the end of the tunnel. EMC IT Operations Intelligence version 9 provides the visibility and insight these customers need to take their foot off the brake and put it back on the accelerator.

Just announced in February 2012 and discussed here, this release is proving to be compelling to everyone who has seen it. Deployment and upgrade plans are being accelerated significantly because of the new features that provide the VDC visibility and insight that these customers so sorely needed.

EMC IT Operations Intelligence 9.0 spans operations management from the physical IT infrastructure to the VM. It enables automatic identification of the business impact and root cause of an IT infrastructure problem, whether physical, virtual, or both.

By extending automated discovery, root-cause analysis, and business-impact analysis to the VM level, IT Operations Intelligence unifies physical and virtual network, server, and storage infrastructure management. These capabilities deliver the much-needed visibility, insight, and understanding that allow IT organizations to solve problems and restore impacted IT services faster by putting the right person on the right problem at the right time.

By using EMC IT Operations Intelligence 9.0, the $2 billion enterprise described above can save $292,000 ($365,000 per year x 80% faster problem identification) by increasing IT efficiency.

Another roughly $1.4M ($2,375,850 downtime cost annually x 60% = $1,425,510) can be realized by shrinking downtime for applications and services by 60%.

Is this starting to sound transformational?

I think so but more importantly, others do too.

If you want to find out more, watch Enterprise Management Associates Research Director Jim Frey analyze the new capabilities in EMC IT Operations Intelligence.

You can also listen to Bruce George, a solutions architect with the EMC IT Management & Automation Group talk about managing availability and performance for large scale VDI deployment using EMC IT Operations Intelligence and VMware vCenter Operations.

Your Story

But more importantly, let us know what you think about the role management visibility plays in helping or hindering your ability to better leverage virtualization in your data center.

Do the observations cited here match your realities? If not, where and why are they different?

Let’s discuss.

About the Author: Mark Prahl