Unsolved

This post is more than 5 years old

2 Intern

 • 

174 Posts

12896

May 10th, 2007 13:00

Proactive Maintenance and Power Management with Dell OpenManage and VMware Virtualization

Administrators can combine the Dell OpenManage™ systems management suite with the cluster resources of VMware® Infrastructure 3 to achieve proactive maintenance that enhances service continuity along with adaptive power utilization that helps reduce data center power and cooling costs.

By Balasubramanian Chandrasekaran and Puneet Dhawan

Cluster virtualization enables enterprises to consolidate existing workloads, reduce data center power and cooling costs, and respond to changing business needs. Administrators can take advantage of the Dell OpenManage suite and VMware Infrastructure 3 to manage virtualized cluster environments on Dell™ PowerEdge™ servers. Combining the tools available with Dell OpenManage and VMware Infrastructure 3 can provide proactive maintenance that enhances service continuity along with adaptive power utilization that helps reduce data center power and cooling costs.

Topics in this Article

Dell OpenManage and VMware Infrastructure 3 tools
Dell OpenManage
Dell OpenManage Server Administrator.
Dell OpenManage IT Assistant.
VMware Infrastructure 3
VMware Distributed Resource Scheduler
VMware Infrastructure Software Development Kit
Dell OpenManage and VMware Infrastructure 3 integration
Steps for a proactive response to hardware faults
Configuring adaptive power management
Integrated management of virtualized clusters
Quick Links

Dell OpenManage and VMware Infrastructure 3 tools

Dell OpenManage and VMware Infrastructure 3 software are designed to simplify the management of virtualized environments running on Dell PowerEdge servers. Key tools for managing virtualized clusters include Dell OpenManage Server Administrator (OMSA), Dell OpenManage IT Assistant, VMware Distributed Resource Scheduler (DRS), and the VMware Infrastructure Software Development Kit (SDK).

Dell OpenManage

The Dell OpenManage suite consists of systems management applications for managing Dell PowerEdge servers, offering a comprehensive set of standards-based and interoperable tools for server deployment, monitoring, and change management. The two Dell OpenManage applications that are most relevant for the virtualization management described in this article are OMSA and IT Assistant.

Dell OpenManage Server Administrator.

OMSA enables administrators to easily manage individual servers and internal storage arrays by performing tasks such as reviewing server status and inventory, configuring BIOS and RAID, setting actions based on events, and powering servers up and down. OMSA is fully qualified to run within a VMware Infrastructure 3 environment.

Dell OpenManage IT Assistant.

IT Assistant provides a comprehensive, standards-based, one-view console for managing Dell serv-ers, storage, tape libraries, network switches, printers, and clients systems. Among other features, IT Assistant enables administrators to capture events and alerts generated by Dell servers running OMSA, configure actions based on these events and alerts, and monitor server performance statistics.

VMware Infrastructure 3

The VMware Infrastructure 3 suite includes enterprise-class virtualization software to enable consolidation, management, resource optimization, and high availability for IT data centers. The software supports Internet SCSI (iSCSI), network attached storage, 64-bit virtual machines (VMs), four-way symmetric multiprocessing for VMs, and up to 16 GB of VM memory, along with features like VMware High Availability (VMware HA) and DRS. VMware Infrastructure 3 is fully qualified to run on Dell PowerEdge servers, including those that offer enhanced processor technologies such as quad-core processors, Intel® Virtualization Technology, and AMD™ Virtualization.

Two key components of VMware Infrastructure 3 relevant to enabling proactive maintenance and adaptive power management are DRS and the VMware Infrastructure SDK. DRS provides dynamic resource scheduling and physical resource optimization; the SDK enables third-party applications to manage and control ESX Server hosts and VMs.

VMware Distributed Resource Scheduler

VMware Infrastructure 3 introduced the concept of an ESX Server cluster, a group of loosely tied ESX Server hosts that can be managed as a single entity. As the name suggests, DRS groups distributed computing re-sources on physical servers into a single pool, and schedules VMs on servers that can best serve the resource requirements of these VMs. It is built on VMware VMotion™ technology. DRS provides the following major features:

• Automatic initial placement of VMs on a “best-fit” cluster host
• Automatic resource optimization and relocation based on changes in a cluster’s computing resources, such as the addition or removal of a host
• Automatic relocation of VMs based on resource requirements

VMware Infrastructure 3 also introduced the maintenance-mode host status, which migrates all VMs from a particular ESX Server host to other hosts. DRS automatically checks that the VMs are relocated among the remaining hosts in a way that helps maximize resource optimization.

VMware Infrastructure Software Development Kit.

The VMware Infrastructure SDK allows developers to build custom Simple Object Access Protocol (SOAP)–based applications to manage ESX Server hosts and VMs. It also allows administrators to integrate existing management applications with VMware Infrastructure 3 and automate cloning and configuration of VMs, performance reporting, and other tasks.

Dell OpenManage and VMware Infrastructure 3 integration

Administrators can integrate different components of the Dell OpenManage suite and VMware Infrastructure 3 to enable comprehen-sive physical and virtual infrastructure management and task automation. Two examples can illustrate this process: creating a proac-tive response to hardware faults and configuring adaptive power management.
The scripts and program files to implement these examples are available at www.dell.com/downloads/global/solutions/prctv.zip.1 These scripts and programs provide a framework to integrate systems management with VMware VirtualCenter by using the VMware Infrastructure SDK. Administrators can modify the code to fit their environment.

Creating a proactive response to hardware faults

Administrators can integrate Dell OpenManage systems monitoring with VMware VirtualCenter to help improve cluster fault toler-ance and enable proactive maintenance. During an event such as a server hardware fault, the VMs from the faulty server can be proactively migrated to other healthy servers in the DRS cluster. This response helps avoid VM downtime from any additional hardware faults.
The steps described in this section are based on a Dell white paper2 about using Dell OpenManage with ESX Server 2 and VirtualCenter 1; that paper discusses algorithms to choose target servers for migration based on the processor load on each of the candidate servers in the migration pool. However, such an approach is complex, requiring developers to build an optimal algorithm to choose target servers and make decisions for each migrating VM.
This article extends the same concepts to DRS clusters and takes advantage of the fully automated VMotion capabilities for load balancing and managing VM resource requirements. An action at the IT Assistant server layer based on a hardware fault, such as a loss of power redundancy reported by an OMSA agent, can automatically put ESX Server hosts into maintenance mode and migrate VMs to another healthy host.

Figure 1 illustrates the following sequence of actions that take place when a hardware fault occurs:

1. OMSA sends a Simple Network Management Protocol (SNMP) trap to the IT Assistant server that contains information about the server and the alert.
2. The IT Assistant server filters the traps as configured by administrators, and invokes a Java program by passing the server name and severity as arguments.
3. For any alerts selected by administrators, the Java program connects to the VirtualCenter server using the VMware Infra-structure SDK and issues a command to put the faulty ESX Server host in maintenance mode.
4. All VMs from the faulty host are migrated to other hosts in the cluster, with placement decided by the DRS algorithm.



Figure 1. Steps for a proactive response to hardware faults

The administrators can now look at the faulty server and perform maintenance actions, and the running VMs have avoided downtime. Once server health is restored, the following sequence of actions occurs:

1. OMSA sends an SNMP normal alert (an SNMP trap with the severity level set to Normal) to the IT Assistant server.
2. The IT Assistant server filters the traps as configured by administrators, and invokes the Java program by passing the server name and severity as arguments.
3. For normal alerts, the Java program sends an SNMP query to the server for global health information, to help ensure that other server subsystems are also healthy. If the global status of the server is healthy, the Java program connects to the VirtualCenter server using the VMware Infrastructure SDK and issues a command to remove the server from maintenance mode.
4. The DRS service discovers the addition of the new server into the cluster and redistributes the VMs to balance the cluster load.

Configuring adaptive power management

Using power efficiently and containing infrastructure costs are important elements of effective data center management. According to a Gartner report, most large enterprise IT organizations spend approximately 5 percent of their total IT budgets on energy, and this could double or triple within five years. The report also estimates that most enterprise data centers waste more than 60 percent of the energy used to cool their equipment.3
These expenditures put large data centers under constant pressure to tackle increasing power and cooling demands. Server virtualization enables enterprises to consolidate a large number of underutilized servers, helping mitigate these costs.
Because typical data center workloads have characteristic patterns of utilization peaks and troughs, keeping all cluster servers powered on all the time is rarely optimal; automatically consolidating workloads on fewer servers during off-peak hours helps avoid running servers unnecessarily and reduce associated costs. For example, resource utilization for workloads like internal data shares and e-mail, printing, and Web servers may peak during office hours but decrease during nights and weekends.
Administrators can combine the Dell OpenManage systems management suite with Virtual Infrastructure 3 DRS clusters to achieve adaptive power management. The resource utilization of the DRS cluster is constantly monitored for resource utilization using the VMware Infrastructure SDK; when the average cluster utilization falls below a set threshold, VMs are automatically consolidated to fewer servers, and the unneeded servers are automatically powered down using Dell Remote Access Controllers (DRACs) to save power. When the resource utilization of the VMs increases, servers are automatically powered on to meet the increase in demand (see Figure 2).



Figure 2. Adaptive power management model

Adaptive power management algorithm. Adaptive power management aims to minimize the number of powered-up physical serv-ers supporting the virtual infrastructure workload so that average cluster utilization remains within specified limits. The steps carried out by the adaptive power management algorithm are as follows:

1. The Java program polls the VirtualCenter server and measures cluster-level processor and memory utilization.
2. If the average processor or memory utilization falls below an administrator-configured minimum threshold, one server at a time is put into maintenance mode, and VMs are automatically migrated to other servers by the DRS service. Once the migration is complete, the server is powered down. This process is repeated until the cluster utilization rises above the minimum threshold.
3. If the average processor or memory utilization increases above an administrator-configured maximum threshold, one server at a time is powered on. The DRS service discovers the addition of the new server into the cluster and redistributes the VMs to balance the cluster load. This process is repeated until the cluster utilization falls below the maximum threshold.

Configurable parameters. Administrators can define a set of configurable parameters in the PowerSave.xml configuration file available at www.dell.com/downloads/global/solutions/prctv.zip:

• Minimum utilization: The cpuMin and memMin parameters define minimum limits on cluster processor and memory utilization as a percentage of total available resources. Servers can be automatically powered down when average cluster utilization falls below both of these values.
• Maximum utilization: The cpuMax and memMax parameters define maximum limits on cluster processor and memory utilization as a percentage of total available resources. Servers can be automatically powered up when average cluster utilization rises above either of these values.
• Number of active VMs per server: The VMMin and VMMax parameters define the limits on the number of VMs that can run on each physical server at one time. Administrators can use these parameters, for example, to prevent additional servers from being powered up when the average number of active VMs per server is below the minimum threshold, or to prevent servers from being powered down when the average number of VMs per server is above the maximum threshold. They can also avoid these limits by setting VMMin to 1 and VMMax to an extremely high value.
• Timing: The opTimeout parameter defines how often to poll for utilization information; the steadyState parameter defines the time between when a parameter threshold is passed and when the corresponding action begins, which helps ensure that random workload spikes and troughs do not cause servers to be powered up or down unnecessarily.

Integrated management of virtualized clusters

Integrating the powerful systems management capabilities of the Dell OpenManage and VMware Infrastructure 3 suites can provide enhanced control over important cluster functionality such as proactive responses to hardware faults and adaptive power management. The processes described in this article are intended to help simplify management and optimize physical resource utilization, which can help reduce total data center costs.

Balasubramanian Chandrasekaran is a systems engineer in the Dell Virtualization Solutions Engineering group. His research interests include data center virtualization, high-speed interconnects, and high-performance computing. Balasubramanian has an M.S. in Computer Science from the Ohio State University.

Puneet Dhawan is a systems engineer in the Dell Virtualization Solutions Engineering group. Puneet has a bachelor’s degree in Electrical Engineering from Punjab Engineering College (PEC) and a master’s degree in Computer Engineering from Texas A&M University.

Quick Links

Dell and VMware:
www.dell.com/vmware

Supporting scripts and programs:
www.dell.com/downloads/global/solutions/prctv.zip

Footnotes
1 These scripts and programs are provided as is, without any implied support or warranty.
2 “Implementing Fault-Tolerance Through Dell OpenManage and the VMware Software Development Kit,” by Dave Jaffe and Todd Muirhead, Dell Enterprise Product Group, September 2005, www.dell.com/downloads/global/solutions/OM-VMware-Integration.pdf.
3 “Why ‘Going Green’ Will Become Essential for Data Centers,” by Rakesh Kumar, Gartner, Inc., October 10, 2006.
No Responses!
No Events found!

Top