NAMD Performance on PowerEdge R740 with Volta GPUs
Summary: PowerEdge R740, NAMD, NVIDIA V100-PCIe GPUs, HPC, High Performance Computing, HPC and AI Innovation Lab, Performance
Symptoms
Cause
Resolution
In this article we are going to discuss the NAMD performance with NVIDIAs latest Volta GPU V100-32GB in a single Dell EMC PowerEdge R740 server.
The PowerEdge R740 is a 2U dual socket server with Intel Skylake processors. It can be configured to support up to 3 double wide GPUs and a high speed networking adapter.
Table 1 shows information about the hardware configuration and application details used for the tests. A nightly build version of NAMD was compiled for this test dated 08-17-2018.
| Server | PowerEdge R740 |
| Processor | 2 * Intel Xeon 6148 – 20 core processor @ 2.4GHz |
| Memory | 192GB @ 2666MT/s |
| GPU | 3 * NVIDIA Volta V100-PCIe(32GB) |
| Power Supply | 2*1600W |
| Operating System | RHEL 7.4 Kernel: 3.10.0-693.el7.x86_64 |
| BIOS Options | System Profile – Performance Logical processor – Disabled Turbo mode – Enabled |
| CUDA Version and Driver | CUDA 9.2 (396.26) |
| NAMD | Git-2018-08-17_Source (Nightly Build version), multi-core build |
| Compiler | Intel 2018 u3 |
Performance Results
NAMD was tested with 3 different datasets: ApoA1, F1ATPase and STMV which consist of 92K, 327K and 1066k atoms respectively. Apoa1 is a smaller dataset compared to F1ATPase and STMV. The performance metric here is "ns/days". The data shown in this section is based on the average of 10 tests.
Figure 1 shows the NAMD performance with CPU and multiple GPUs on three datasets. Instructions from Intel’s website were followed to compile NAMD for Intel Xeon processors. Performance improvement from CPU to GPUs is noted in the graph below.
Figure 1 NAMD performance on CPU and GPUs
- Performance is relative to the Dual-CPU results across the different datasets.
- The GPU versions of NAMD provide up to 11.2x speedup compared to the CPU version for all three data sets.
- An additional 23%-33% performance increase was measured when the 2nd GPU card was added. A more modest 6-9% was observed from 2 GPU to 3 GPU tests.
NAMD was also tested with different numbers of CPU cores across 1, 2 & 3 GPUs to identify the minimum number of CPU cores needed for best performance.
Figure 2 STMV performance with different CPU core counts and GPUs
- As seen in Figure 2 using too few CPU cores has a negative impact on performance and performance improves as we increase the number of cores
- Testing one V100 with 20+ CPU cores achieved good performance. There’s an additional 2-3% performance advantage when we use more than 20 cores.
- This test was also performed on the other two datasets ApoA1 and F1ATPase. Similar performance behavior was observed with them.
Summary
NAMD performance results on Dell EMC PowerEdge R740 server and NVIDIA Volta V100-32GB GPUs have been presented here. There is ~6.6x-11.2x speedup over CPU-only tests when using GPUs. A R740 server with 3 V100 cards provides the best performance while a R740 with 2 V100 cards is within 91% of the 3 GPU configuration. In case of budget limitations one GPU per R740 would be also be a good choice as it gives up to 8x speedup compared to CPU. Tests with varied CPU cores and GPUs were also conducted, and we noticed that using too few CPU cores (less than 20 cores in our tests) reduces NAMD performance on GPU tests.