Dell DSS 8440: A Dynamic Machine Learning Server

We are taking advantage of this year’s Dell Technologies World gathering to introduce Dell’s latest machine learning offering to our customers. Our Extreme Scale Infrastructure (ESI) team, by design, is constantly pushing boundries to solve the most pressing problems in today’s large-scale data centers. With the increasing demand for machine learning solutions, we are excited to be announcing the DSS 8440 accelerator-optimized server, specifically designed for high performance machine learning training.

The Challenge

Data center workloads continue to evolve in challenging ways as the computing landscape responds to the rapid advancement of new technologies. The availability of massive amounts of data – both structured and unstructured – and the emergence of cloud native applications – with their demands for higher throughput and parallel computing – are driving data centers to look for more advanced processing solutions to incorporate into their existing infrastructures. In particular, they are looking for accelerator solutions that deliver more computing horsepower than the general-purpose CPUs that are becoming a bottleneck for overall processing.

The DSS 8440 is a 4U 2-socket accelerator-optimized server designed to deliver exceptionally high compute performance. Its open architecture maximizes customer choice for machine learning infrastructure while also delivering best of breed technology (#1 server provider & the #1 GPU provider). It lets you tailor your machine learning infrastructure to your specific needs – without lock-in.

With a choice of 4, 8 or 10 of the industry-leading NVIDIA® Tesla® V100 Tensor Core GPUs, combined with 2 Intel CPUs for system functions, a high performance switched PCIe fabric for rapid IO and up to 10 local NVMe and SATA drives for optimized access to data, this server has both the performance and flexibility to be an ideal solution for machine learning training – as well as other compute-intensive workloads like simulation, modeling and predictive analysis in engineering and scientific environments.

The DSS 8440 and Machine Learning

Machine learning encompasses two distinctly different workloads; training and inference. While each benefits from accelerator use, they do so in different ways, and rely on different accelerator characteristics that may vary from accelerator to accelerator. The initial release of the DSS 8440 is specifically targeted at complex, training workloads. It provides more of the raw compute horsepower needed to quickly process the increasing complicated models that are being developed for complex workloads like image recognition, facial recognition and natural language translation.

Machine learning training flow

At the simplest level, machine learning training involves “training” a model by iteratively running massive amounts of data through a weighted, multi-layered algorithm (thousands of times!), comparing it to a specifically targeted outcome and iteratively adjusting the model/weights to ultimately result in a “trained” model that allows for a fast and accurate way to make future predictions. Inference is the production or real-time use of that trained model to make relevant predictions based on new data.

Training workloads demand extremely high-performance compute capability. To train a model for a typical image recognition workload requires accelerators that can rapidly process multiple layers of matrices in a highly iterative way – accelerators that can scale to match the need. NVIDIA® Tesla® V100 GPUs are such an accelerator. The DSS 8440 with NVIDIA GPUs and a PCIe fabric interconnect has demonstrated scaling capability to near-equivalent performance to the industry-leading DGX-1 server (within 5%) when using the most common machine learning frameworks (i.e., TensorFlow) and popular convolutional neural network (CNN) models (i.e., image recognition).

Note that Dell is also partnering with the start-up accelerator company Graphcore to achieve new levels of training performance. Graphcore is developing machine learning specific, graph-based technology to enable even higher performance for training workloads. Graphcore accelerators will be available with DSS 8440 in a future release. See the Graphcore sidebar for more details.

Inference workloads, while still requiring acceleration, do not demand as high a level of performance, because they only need one pass through the trained model to determine the result.

Machine learning inference flow

However, inference workloads demand the fastest possible response time, so they require accelerators that provide lower overall latency. While this release of the DSS 8440 is not targeted for inference usage, note that the accelerator card that Graphcore is developing can support training and inference. (It lowers over all latency by loading the full machine learning model into accelerator memory.)

Exceptional throughput performance

With the ability to scale up to 10 accelerators, the DSS 8440 can deliver higher performance for today’s increasingly complex computing challenges. Its low latency, switched PCIe fabric for GPU-to-GPU communication enables it to deliver near equivalent performance to competitive systems based on the more expensive SXM2 interconnect. In fact, for the most common type of training workloads, not only is the DSS 8440 throughput performance exceptional, it also provides better power efficiency (performance/watt).

Most of the competitive accelerator optimized systems in the marketplace today are 8-way systems. An obvious advantage of the DSS 8440 10 GPU scaling capability is that it can provide more raw horsepower for compute-hungry workloads. More horsepower that can be used to concentrate on increasingly complex machine learning tasks, or conversely, may be distributed across a wider range of workloads – whether machine learning or other compute-intensive tasks. This type of distributed, departmental sharing of accelerated resources is a common practice in scientific and academic environments where those resources are at a premium and typically need to be re-assigned as needed among dynamic projects.

Better performance per watt

One of the challenges faced as accelerator capacity is increased is the additional energy required to drive an increased number of accelerators. Large scale data centers understand the importance of energy savings at scale. The DSS 8440 configured with 8 GPUs has proven to be more efficient on a performance per watt basis than a similarly configured competitive SXM2-based server – up to 13.5% more efficient.

That is, when performing convolutional neural network (CNN) training for image recognition it processes more images than the competitive system, while using the same amount of energy. This testing was done using the most common machine learning frameworks – TensorFlow, PyTorch and MXNet – and in all three cases the DSS 8440 bested the competition. Over time, and at data center scale, this advantage can result in significant operational savings.

Accelerated development with NVIDIA GPU Cloud (NGC)

When the DSS 8440 is configured with NVIDIA V100 GPUs you get the best of both worlds – working with the world’s #1 server provider (Dell) and the industry’s #1 provider of GPU accelerators (NVIDIA). In addition, you can take advantage of the work NVIDIA has done with NVIDIA GPU Cloud (NGC), a program that offers a registry for pre-validated, pre-optimized containers for a wide range of machine learning frameworks, including TensorFlow, PyTorch, and MXNet. Along with the performance-tuned NVIDIA AI stack these pre-integrated containers include NVIDIA® CUDA® Toolkit, NVIDIA deep learning libraries, and the top AI software. They help data scientists and researchers rapidly build, train, and deploy AI models to meet continually evolving demands

More power, more efficiency – the DSS 8440

Solve tougher challenges faster. Reduce the time it takes to train machine learning models with the scalable acceleration provided by the DSS 8440. Whether detecting patterns in online retail, diagnosing symptoms in the medical arena, or analyzing deep space data, more computing horsepower allows you to get better results sooner – improving service to customers, creating healthier patients, advancing the progress of research. And you can meet those challenges while simultaneously gaining greater energy efficiency for your data center. The DSS 8440 is the ideal machine learning solution for data centers that are scaling to meet the demands of today’s applications and want to contain the cost and inefficiencies that typically comes with scale.

Contact the Dell Extreme Scale Infrastructure team for more information about the DSS 8440 accelerator-optimized server.

* Based on internal test benchmarks compared to published scores from NVIDIA

About the Author: James Mouton

James Mouton is senior vice president and general manager for the Extreme Scale Infrastructure (ESI) division at Dell EMC. This division has dedicated resources to support the unique infrastructure needs for our largest customers. James and his team are responsible for ensuring the tools, resources, processes, relationships and pipeline are in place to grow the business while exceeding customer requirements for scale, flexibility and performance. Prior to joining Dell EMC, James was a 25-year veteran of HP where he held a variety of key leadership positions. Most recently, James was the senior vice president and general manager of the Business PC Solutions organization in the Printing and Personal Systems (PPS) group where he was responsible for product development and customer experience for business client solutions worldwide. James also oversaw the Commercial PC, Consumer PC and Premium PC businesses in previous roles. Before joining the HP PC organization, Mouton was the chief technologist for the Technology Solutions Group (TSG) and he also served as the senior vice president and general manager of HP’s Industry Standard Server business where he drove server growth in worldwide market share. Before joining HP, James worked with Texas Instruments in their Electronic Defense Systems Business. Based in Houston, Texas, James has a bachelor’s degree in electrical engineering from Texas A&M University.