Thermal design insights into the Dell M910 blade server

As the lead thermal engineer for the PowerEdge M910 program, I wanted to share some insights into the thermal design and fan control algorithms of the M910 blade. The M910 is built around the Intel 7500 series Nehalem-EX processor architecture with 32 DDR3 DIMMs. CPU configurations include 2 or 4 of the 95 or 105W CPUs or 2 of the 130W CPUs.

image

Our design goal is to minimize the power required to cool the hardware while still maintaining hardware component temperatures within specification. For blade server thermal design, this means minimizing the speed of the 9 system fans within the M1000e chassis. For a particular blade installed in the M1000e chassis there are as many as 6 fans that provide cooling to that blade. When it comes to minimizing fan speed, the critical information needed is component temperature and hardware configuration. Under a full configuration the M910 has as more than three dozen temperature sensors that are used as input to the fan control algorithms. These sensors are used to determine the thermal margin of the components within the blade. When the thermal margin of the components becomes too small the fan control algorithms automatically increase fan speeds to keep the components below their maximum temperature limit.

The M910 fan speed control algorithms use the features found in Dell’s 11G portfolio thermal algorithms, including ambient temperature sensing for fan speed control based on the external environment, a PID control algorithm for the CPUs, and an IOH algorithm for chip set cooling, among others. For the M910, a PID control algorithm was added for the 32 DIMMs within the blade to provide a more granular approach to fan speed adjustments for DIMM cooling. This allows smaller incremental changes to the fan speed compared with previous DIMM algorithms.

Dell’s 11G servers use configuration curves to set the minimum (idle) fan speeds that the system will operate at for a given hardware configuration and ambient inlet temperature. The fan speeds will increase however if the temperature sensors within the blade indicate that additional cooling is needed. Like Dell’s 11G monolithic servers, the M910 utilizes configuration based fan curves to set the fan speed based on the hardware installed in the system. For example, a configuration of 32 of the 16GB DIMMs requires more cooling than 8 of the 4GB DIMMs. Accordingly, the fan speed settings are higher for the 16GB DIMM configurations.

In addition to configuration based cooling, there are three settings within the Power Management section of the BIOS settings that play into fan speed control: CPU Power and Performance Management, Fan Power and Performance Management, and Memory Power and Performance Management. Within each of these settings there are options for Maximum Performance and Minimum Power among other options. Setting these options for Minimum Power will impose fan speed caps for some of the algorithms. This allows the system to draw less power by being more conservative with fan speed changes while still maintaining component specs.

This is just a quick overview of a few of the M910 thermal algorithms. I hope this has given you a basic understanding for some of the features in the M910 and for the detail and complexity that we put into our servers to optimize cooling and minimize fan power.

Here’s a couple of related links you might also find interesting.

YouTube Video: Dell PowerEdge M910 Blade Server with Intel Xeon 7500 Processors

White Paper: Thermal Design of the Dell PowerEdge M-Series

Power-to-Cool Blog Post: Power To Cool – Waste Heat Management with Minimum Waste!

About the Author: Robert Curtis