Reworking data protection for a virtualized environment
When IT professionals who work at large and midsize organizations are asked to list their organizations’ IT priorities, “improving data backup and recovery” consistently ranks near — or at — the top of the most-cited priorities they mention.1 But backup and recovery are not alone at the top of people’s minds in terms of strategic importance. The urgency that organizations feel about data protection is influenced by another phenomenon: increases in server virtualization.
Both a tactical association and a strategic association exist between backup and virtualization. As virtualization becomes ever more of a mainstay in data centers, traditional approaches for backing up an IT environment pose a data protection challenge that continues to escalate, growing in proportion to the pace at which IT environments become virtualized.
Accordingly, if tasked with managing what has turned into a highly virtualized IT environment, IT professionals must take the time to reassess their organizations’ data protection strategies.
Old backup processes, new virtual machines
Imagine looking at a performance meter for a traditional physical server, which shows lines indicating routine, periodic spikes in processor and storage read/write activity. Now, imagine looking at that meter as the server is being backed up. It would definitely display heightened I/O activity tied to both processor and storage, either rapidly spiking or just pegged to the top of the meter. This relatively heightened activity occurs because a traditional backup application basically orders the physical server to “give me all the data you have, as fast as possible.” Such a command is achievable for physical servers because they typically are underutilized and usually have plenty of excess processing headroom to accommodate resource-intensive backup operations.
The situation is dramatically different when one physical server hosts many virtual machines inside it. Although one virtual machine consumes the same underutilized resources that it might have on its own physical server, other virtual machines consume the rest of the resources. In a well-managed virtualization host, a majority of the resources are in use, as they should be, which means the extra headroom that legacy backup applications assume will be there is not available.
Ultimately, traditional approaches do not work in a highly virtualized environment. And because inefficient backup is not acceptable, IT professionals who manage highly virtualized environments need to rethink their data protection strategies. The advantages that make server virtualization appealing — the device consolidation and footprint reduction, the near-instant server setup, the power and cooling savings, and the simplified disaster recovery testing, to name a few — also create challenges in protecting what is really important: the information.
Key considerations in reshaping a data protection strategy
The Enterprise Strategy Group (ESG) routinely surveys IT professionals about how easy or difficult it is for them to implement backup and recovery processes for virtualized servers. In one survey, 87 percent of responding IT managers reported that virtual server backup/recovery is among their top 10 challenges, with 9 percent calling it their “most significant data protection challenge.”2 Among specific concerns, ESG found that basic recoverability of data was most commonly mentioned, followed by the ability to validate the success of backup and recovery operations (see figure).
Unreliable and complicated backups, combined with a lack of assurance in protection, recoverability and monitoring, continue to plague backup administration of virtualized environments. This situation often results in consistently high levels of data protection–focused investment appearing prominently alongside high levels of server virtualization investment.
To bring their organizations’ data protection strategies in line with the requirements of a virtualized environment, IT professionals should consider four important capabilities when assessing and implementing backup solutions.
1. Embrace source-side deduplication
Virtual machines that use the same or similar operating systems and that host similar applications generate many redundant binaries. Source-side deduplication — through VMware® changed-block tracking, file-system filtering, Microsoft® NT File System journaling and other means — is especially valuable to help eliminate those redundancies.
The key is to get the deduplication process as close to the virtual machines as possible. Conversely, a process that uses only storage-centric deduplication results in data from all the virtual machines flowing from the host to the backup server — consuming compute, storage and networking resources. The data ends up in a deduplication storage device, which then discards much of the data because it is a version of data already received during backup operations for similar applications and other virtual machines.
In contrast, when the deduplication discernment process is positioned as close as possible to production workloads, less redundant data is moving across the network only to be rejected. Instead, the overall backup infrastructure is reduced. That is source-side deduplication, and it is a huge win.
2. Make sure deduplication is global across hypervisors
Consider an environment with 20 hypervisors from one vendor, each running 20 virtual machines. Many deduplication methods could help reduce those 20 virtual machines per host down to a single set of application binaries. On the other hand, in a situation with 20 hosts on different hypervisors, deduplication may not be possible across those hypervisors. As a result, far too much data may be sent to deduplicated storage, which just discards it — after creating a huge I/O penalty for the IT environment along the way.
In 2012, ESG conducted a study on storage infrastructure spending.3 IT professionals who were buying a large amount of disk were asked how they planned to use all of it. The most frequently mentioned answer was that the disk supported a data protection solution. It may be reasonable to conclude, then, that taking advantage of global source-side deduplication significantly helps reduce storage spending.
3. Look for robust post-process application handling
Some vendors sell backup applications that are agent-based; others provide agentless technologies that, despite the name, insert a small executable file inside the virtual machine to support certain situations. Regardless of the terminology, the important distinction in backing up virtualized servers is whether the widgets, agents or modules behave like traditional physical backup agents (bad) or simply help with application quiescing in support of virtual machine–centric backup behavior (good). Not many use-case scenarios exist that warrant putting agents inside virtual machines for backing up data the traditional way. So for most scenarios, the important function of agents is to support application management to help ensure a recoverable backup.
With some backup products, the hypervisors’ application programming interfaces enable the backup software to freeze the storage for an adequate backup of the application itself, but a mechanism is still needed to notify applications that they can truncate their backup-transaction logs, reset their checkpoints and go back to doing work. In the end, it does not matter if the backup vendor refers to that activity as agent-based or agentless. The important outcome is to end up with virtualized applications that are properly groomed for overall continued, consistent operational efficiency.
4. Emphasize integrated monitoring and management
Many organizations spend time and money establishing a highly virtualized, easily managed private cloud infrastructure that enables them to provision virtual machines on the fly — and then find themselves needing to step completely out of that world to configure the backup of those virtual machines. The goal is to achieve as much integrated visibility as possible, at best to include integrated management but at least to offer integrated monitoring so that administrators can observe how the environment’s many separate but often interrelated data protection processes are functioning. The most efficient management and monitoring solutions are those that integrate either at the hypervisor layer or within the private cloud management interface. This integration minimizes the number of management consoles needed to determine whether the provisioned virtual machines are being protected adequately.
The path to virtualization protection
In general, ESG has been seeing an uptick in organizations preferring to use a unified solution to protect both physical and virtual servers, rather than running a separate solution just for protecting virtual machines. Although backup vendors on both sides of the unified-versus-separate argument are still actively innovating, the real battleground of virtualization protection is not centered on the unified-versus-separate issue or whether one can back up a virtual machine. It is centered on how agile IT can be in recovering the data, the whole virtual machine or a set of virtual machines. For example, can IT restore a whole virtual machine without needing to put it back on the original host? Or accomplish item-level, file-level and even message-level recovery from within a virtual machine?
Four key data protection capabilities — source-side deduplication, global deduplication across hypervisors, robust post-process application handling, and integrated monitoring and management — offer an indication of where virtualization protection is today: in the midst of continuing advances that IT administrators soon won’t want to live without. And they even provide a glimpse into how it is going to keep evolving, with multihypervisor strategies becoming pervasive and with the unified-versus-separate physical and virtual server protection debate continuing to grow.
Jason Buffington is senior analyst at the Enterprise Strategy Group, focusing primarily on data protection, Microsoft® Windows Server® infrastructure, management and virtualization. Follow Jason on Twitter @JBuff.
Enterprise Strategy Group:
Insights on keeping data safe
Visit the Technical Optimist blog to get fresh ideas on what IT professionals should be looking for to protect their virtualized environments.
1 “Research Report: 2013 IT Spending Intentions Survey,” by Jennifer Gahm, Bill Lundell and John McKnight, Enterprise Strategy Group, January 2013, qrs.ly/ki3gwq5; “Research Report: 2012 IT Spending Intentions Survey,” by Jennifer Gahm, Kristine Kao, Bill Lundell and John McKnight, Enterprise Strategy Group, January 2012, qrs.ly/nf3gwq6; “Research Report: 2011 IT Spending Intentions Survey,” by Jennifer Gahm, Bill Lundell and John McKnight, Enterprise Strategy Group, January 2011, qrs.ly/3x3gwq7; “Research Report: 2010 IT Spending Intentions Survey,” by Jennifer Gahm, Bill Lundell and John McKnight, Enterprise Strategy Group, January 2010, qrs.ly/9d3gwqb.
2 Source: ESG Research, “Virtual Server Data Protection,” September 2011.
3 Research Brief, “2012 Storage Infrastructure Spending Trends,” by Bill Lundell, Terri McClure and Mark Peters, Enterprise Strategy Group, March 2012, qrs.ly/512z6az.