文書番号: 000134624

Deep Learning Performance on V100 GPUs with MLPerf Training v0.6 Benchmarks

文書の内容

現象

Article written by Rengan Xu, Frank Han and Quy Ta of HPC and AI Innovation Lab in November 2019.

解決方法

Abstract

Dell EMC Ready Solutions for AI – Deep Learning with NVIDA v1.1 and the corresponding reference architecture guide were released in February 2019. This blog quantifies the deep learning training performance on this reference architecture using imaging benchmarks in MLPerf suite. The evaluation is performed on up to eight nodes. As a result, Dell EMC’s scale-out solution can achieve comparable performance to other scale-up solutions for imaging models.

Overview

After the initial version 1.0 of Dell EMC Ready Solutions for AI – Deep Learning with NVIDIA was released, this solution was updated to version 1.1 in February 2019. Detailed information about the solution and the infrastructure can be found in the architecture guide "Dell EMC Ready Solution for AI – Deep Learning with NVIDIA". Briefly speaking, the major differences in the solution v1.1 are that the configuration M of GPU server updated from configuration K, and the GPU memory is increased to 32 GB from 16 GB. The MLPerf v0.6 benchmark suite is chosen to evaluate the performance of the solution. All the available MLPerf v0.6 training benchmarks are listed in Table 1, but this blog only focuses on ResNet-50, SSD and Mask-R-CNN models.
SLN319504_en_US__1000_tab1
The hardware and software details used for this evaluation are summarized in Table 2.
SLN319504_en_US__2000_tab2

Performance Evaluation

Figure 1 to Figure 3 show the training time in minutes with C4140-M-32GB in the ready solution v1.1 with different MLPerf benchmarks. The testing was scaled from one node (4 V100) to eight nodes (32 V100). The Dell EMC Ready Solution for AI – Deep Learning with NVIDIA is a scale-out solution which can utilize more resources as more nodes are added in the solution. There is an alternate solution called scale-up solution from other vendors, which utilizes more GPUs within one server. We also compared our scale-out solution with other vendor’s scale-up solution* in these figures. The following conclusions can be made from these figures:

The performance of our scale-out solution scales well with increasing number of nodes or GPUs. With one EDR InfiniBand, compared to the performance of 1 node, the speedup of using 8 nodes for ResNet-50, SSD, Mask-R-CNN is 6.83x, 5.57x and 5.68x, respectively,
With two EDR InfiniBand, compared to the performance of 1 node, the speedup of using 8 nodes for ResNet-50, SSD, Mask-R-CNN is 6.83x, 5.78x and 5.74x, respectively. This shows that the additional InfiniBand does not have big impact on performance for those imaging models.
The performance of the scale-out solution is close to that of the scale-up solution for these imaging models, with the same number of GPUs.

Figure 1: The performance of ResNet-50 v1.5
SLN319504_en_US__4000_ssd

Figure 2: The performance of SSD
SLN319504_en_US__5000_maskrcnn_2IB

Figure 3: The performance of Mask-R-CNN

Conclusions

In this blog, we quantified the performance of the Dell EMC Ready Solution for Artificial Intelligence – Deep Learning with NVIDIA v1.1 using the latest MLPerf benchmarks. The results show that the scale-out solution can achieve comparable performance to other scale-up solutions for imaging models. And the additional EDR InfiniBand card does not have significant performance benefits.

*The data of scale-up systems was publicly available at the MLPerf v0.6 results web page.

文書のプロパティ

影響を受ける製品

High Performance Computing Solution Resources

最後に公開された日付

21 2月 2021

バージョン

文書の種類

Solution

トップに戻る

ようこそ

Dellへようこそ