As organizations strive to gain a competitive edge in an increasingly digital global economy, Artificial Intelligence (AI) is garnering a lot of attention. Not surprisingly, AI initiatives are springing up in various business domains, such as manufacturing, customer support, marketing and sales. In fact, Gartner predicts that AI-derived global business value is forecast to reach $3.9 trillion by 2022[i]. As companies scramble to determine how to turn the promise of AI into reality, they are faced with a multitude of complex choices related to software stacks, neural networks and infrastructure components, with significant implications on the time-to-value associated with these initiatives.
In such a complex environment, it is critical that organizations be able to rely on vendors that they trust. Over the last few years, Dell and NVIDIA have established a strong partnership to help organizations accelerate their AI initiatives. For organizations that prefer to build their own solution, we offer Dell’s ultra-dense PowerEdge C-series, with NVIDIA’s TESLA V100 Tensor Core GPUs, which allows scale-out AI solutions from four up to hundreds of GPUs per cluster. For customers looking to leverage a pre-validated hardware and software stack for their Deep Learning initiatives, we offer Dell Ready Solutions for AI: Deep Learning with NVIDIA, which also feature Dell Isilon All-Flash storage. Our partnership is built on the philosophy of offering flexibility and informed choice across a broad portfolio.
“We’re now in the age of AI, where every industry will be enabled and powered by AI,” said Ian Buck, vice president and general manager of Accelerated Computing at NVIDIA. “Working together, NVIDIA and Dell are bringing some of the most powerful GPU-accelerated solutions to enterprises so they can quickly integrate AI into their business.”
Today our companies are expanding our collaboration by announcing a new reference architecture for AI featuring NVIDIA DGX-1 servers complemented with the high-performance of Dell Isilon All-Flash network-attached storage (NAS). This new offer pairs the NVIDIA DGX-1, purpose-built for the unique demands of AI and deep learning, powered by eight advanced data center accelerators – the NVIDIA Tesla V100 Tensor Core GPU – with Dell Isilon scale-out all-flash storage. This combination is designed to give customers more flexibility in how they deploy AI, and delivers breakthrough performance for large-scale deep learning. As Dell often takes the role of trusted IT advisor with a broad solution portfolio, this enables us to support customers looking to deploy NVIDIA DGX-1 with Dell’s scale-out storage and do so more simply and with less risk.
Key components of this offering include:
- NVIDIA DGX-1 servers which integrate up to eight NVIDIA Tesla V100 Tensor Core GPUs fully interconnected in a hybrid cube-mesh topology. Each DGX-1 server can deliver 1 petaFLOPS of AI performance, and is powered by the DGX software stack which includes NVIDIA-optimized versions of the most popular deep learning frameworks, for maximized training performance.
- Dell Isilon All-Flash scale-out NAS storage delivers the scale (up to 33 PB), performance (up to 540 GB/s), and concurrency (up to millions of connections) to eliminate the storage I/O bottleneck and accelerate AI workloads at scale.
To validate the new reference architecture, we ran multiple industry-standard image classification benchmarks using 22 TB datasets to simulate real-world training and inference workloads (check out the details in this joint white paper). This testing was done on systems ranging from one DGX-1, all the way to eight DGX-1 servers connected to eight Isilon F800 nodes. The results from two of the key benchmarks are shown below in Figures 1 and 2. Just as we proved out with the Ready Solution for Deep Learning with C4140 servers, Isilon was able to keep pace and linearly scale to achieve the maximum performance of the compute layer.
Figure 1: ResNet-50 Model Training Benchmark
Figure 2: Inception-v4 Large Image Classification
Here are some of the key findings from our testing of the Isilon and NVIDIA DGX-1 reference architecture:
- Achieved compelling performance results across industry standard AI benchmarks from eight through 72 GPUs without degradation to throughput or performance.
- Linear scalability from 8-72 GPUs delivering up to 19.9 GB/s while keeping the GPUs pegged at >97% utilization.
- The Isilon F800 system can deliver up to 96% throughput of local memory, bringing it extremely close to the maximum theoretical performance limit an NVIDIA DGX-1 system can achieve.
With superior capabilities for low latency, high throughput and massively parallel I/O, Dell Isilon and NVIDIA DGX-1 servers complement each other extremely well to address deep learning workloads, effectively compressing the time needed for training and testing analytical models for multi-petabyte data sets on AI platforms.
This new reference architecture extends the commitment of both Dell and NVIDIA to make AI simple and more accessible to organizations wishing to apply AI and deep learning techniques to develop new applications or solve complex data-centric problems. This pre-validated reference architecture adds to our unmatched set of joint offerings across Dell Precision workstations, Dell PowerEdge servers, Dell Ready Solutions and Dell Networking.
Stay tuned for more details on the performance benchmarks of this reference architecture and for upcoming announcements as our server, storage, and software innovations for AI and deep learning use cases continue to evolve.
[1] Gartner press release: April 2018 https://www.gartner.com/newsroom/id/3872933