PowerEdge: HPC and AI Performance on DSS8440 with V100S GPUs

Summary: This blog presents the results of the study evaluating 8x V100S on DSS8440 for different HPC and deep learning applications including HPL, LAMMPS, and MLPerf-v0.6 suite.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Instructions

Authors: Frank Han, Rengan Xu, Quy Ta
Dell EMC HPC and AI Innovation Lab, May 2020

 

 

Executive Summary

This blog presents the results of the study evaluating 8x V100S on DSS8440 for different HPC and deep learning applications including HPL, LAMMPS, and MLPerf-v0.6 This hyperlink is taking you to a website outside of Dell Technologies. suite. In summary:

  • Applications limited by GPU bandwidth like LAMMPS can take advantage of the new V100S GPUs and get boosted performance for both single and multiple GPUs.
  • Deep learning applications, like those tested in MLPerf, get benefits from the higher boosted clock and higher bandwidth of V100S.
  • GPU compute-bound applications like the HPC benchmark HPL gets the same performance as V100-PCIe.

The rest of this blog lays out the details of this testing. In the future, the same applications are run on DSS8440 with RTX GPUs (in place of the V100S), and other tests, like V100S performance on the AMD platform, are also ran.

 


 

Overview of the Testbed

The  Dell DSS8440 server  is an accelerator-optimized server, designed for high-performance computing, and deep learning workloads. The NVIDIA V100S This hyperlink is taking you to a website outside of Dell Technologies. is the latest member in the Tesla Volta series, and it is a double-width 32G PCIe based GPU card. This blog presents the results of the study evaluating 8x V100S on DSS8440 for different HPC and deep learning applications including HPL, LAMMPS, and MLPerf-v0.6 This hyperlink is taking you to a website outside of Dell Technologies. suite.

The hardware and software details of the DSS 8440 server tested and the comparison of V100S and V100-PCIe are listed in Table 1 and Table 2.

 


hardware and software details 
Table 1: The hardware and software details

 


PCIe difference in specification 
Table 2: V100S and V100-PCIe difference in specification

 

 

HPC Application Performance

 PCIe HPL results on DSS8440 
Figure 1: V100S and V100-PCIe HPL results on DSS8440

 

Figure 1 shows the HPL performance numbers. There is not much difference between V100S and V100-PCIe, because HPL is an extreme stress test application. There is little temperature room for the GPU boost feature This hyperlink is taking you to a website outside of Dell Technologies., therefore the frequency of the GPUs fall back to the base clock rate quickly. Because V100S and V100-PCIe have almost the same base clock rate, for GPU compute bounded applications like HPL, V100S delivers about the same level of performance as V100-PCIe. 

LAMMPS results on DSS8440 
Figure 2: V100S and V100-PCIe LAMMPS results on DSS8440

 

Figure 2 has the timestep/s results of LAMMPS with Lennard Jones dataset. LAMMPS is an example of molecular dynamics code which is known to be a GPU bandwidth bounded application. V100S delivers 27% more performance than V100-PCIe in this testing. The speedup is contributed not only from the 15% higher boost frequency and 26% more bandwidth but also from the newer software version. V100-PCIe numbers were obtained using old KOKKOS package in LAMMPS 8Feb2019 version. However, the newer version 24Jan2020 had added support for using cuFFT on the GPU with KOKKOS. Most details can be found in this LAMMPS 24Jan2020 release note This hyperlink is taking you to a website outside of Dell Technologies..

 

Deep Learning Application Performance

 MLPerf results on DSS8440  
Figure 3: V100S and V100-PCIe MLPerf results on DSS8440

 

MLPerf training closed division 0.6 version This hyperlink is taking you to a website outside of Dell Technologies. has six subtests covering wide deep learning domains including image classification (ResNet-50), object detection (Mask R-CNN and SSD), Translation (NMT and Transformer), and reinforcement learning (MiniGo). The comparison results of both GPU cards are in Figure 3. Around 1-5% performance gains were observed across the MLPerf suite for V100S, which is consistent with the 1-5% higher throughput in the result log files. The real-time output of GPU clock rate was monitored, and it was observed that V100S GPUs were running at 1-5% higher in all those tests, so the performance benefits came from the higher boosted frequency of V100S.

 

 

Conclusions and Future works 

In this blog, HPC applications performance with HPL, LAMMPS, and deep learning performance with MLPerf were compared with V100S and V100-PCIe GPU cards on the same DSS8440 server. Application limited by GPU bandwidth like LAMMPS can take advantage of the new V100S GPUs and will get boosted performance for both single and multiple GPUs. Deep learning applications tested in MLPerf also get benefits from the higher boosted clock and higher bandwidth of V100S. The GPU compute bounded HPC benchmark HPL gets the same performance as V100-PCIe. In the future, the same applications on DSS8440 will be run with RTX GPUs, and some other tests like V100S performance on the AMD platform will be explored.

Affected Products

DSS 8440, High Performance Computing Solution Resources
Article Properties
Article Number: 000133353
Article Type: How To
Last Modified: 16 Jul 2025
Version:  4
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.