Molecular Dynamics Simulation with GROMACS on AMD EPYC – ROME

Molecular Dynamics Simulation with GROMACS on AMD EPYC – ROME


Savitha Pareek, HPC and AI Innovation Lab, November 2019

AMD recently announced its 2nd generation EPYC processors (codenamed "ROME") which support up to 64 cores, and DellEMC has just released High Performance Computing (HPC) servers designed from the ground up to take full advantage of these new processors. We have been evaluating applications on these servers in our HPC and AI innovation Labs, including the Molecular Dynamics Application – GROningen MAchine for Chemical Simulations (GROMACS) application and report our findings for GROMACS in this blog.



GROMACS is a free and open-source parallel molecular dynamics package designed for simulations of biochemical molecules such as proteins, lipids, and nucleic acids. It is used by a wide variety of researchers, particularly for biomolecular and chemistry simulations. It supports all the usual algorithms expected from modern molecular dynamics implementation. It is open-source software with the latest versions available under the GNU Lesser General Public License (LGPL). The code is mainly written in C and makes use of both MPI and OpenMP parallelism.

This blog describes the performance of GROMACS on two-socket PowerEdge servers using the latest addition to the AMD® EPYC Rome processors listed in Table 1(a). For this study, we carried out all benchmarks on a single server equipped with two processors, running only a single job at a time on the server. We compared performance improvements on the 2nd generation AMD EPYC Rome (7xx2 series) based PowerEdge servers with the previous generation DellEMC PowerEdge servers equipped with the 1st generation AMD EPYC Naples (7xx1 series) processors listed in table 1(b).

Table 1(a)-ROME CPU models evaluated for single node study

CPU

Cores/Socket

Config

Base frequency

TDP

7742

64c

4c per CCX

2.25 GHz

225W

7702

64c

4c per CCX

2.0 GHz

200W

7502

32c

4c per CCX

2.5 GHz

180W

7452

32c

4c per CCX

2.35 GHz

155W

7402

24c

3c per CCX

2.8 GHz

180W


Table 1(b)- Naples CPU model evaluated for comparison

CPU

Cores/Socket

Config

Base Clock

TDP

7601

32c

4c per CCX

2.2 GHz

180W

Server configurations are included in Table 2(a), with the list of the benchmark data sets given in Table 2(b).

Table 2(a)-Testbed

Component

ROME Platform

NAPLES Platform

Processor

As shown in Table.1a

As shown in Table.1b

Memory

256 GB, 16x16GB 3200 MT/s DDR4

256 GB, 16x16GB 2400 MT/s DDR4

Operating System

Red Hat Enterprise Linux 7.6

Red Hat Enterprise Linux 7.5

Kernel

3.10.0.957.27.2.e17.x86_64

3.10.0-862.el7.x86_64

Application

GROMACS – 2019.2

Table 2(b)- Benchmark datasets used for GROMACS performance evaluation on ROME

Dataset

Details

Water Molecule

1536K and 3072K

HecBioSim

1400K and 3000K

Prace – Lignocellulose

3M

For this single node study, we compiled GROMACS version 2019.3, with the latest OPENMPI and FFTW, testing several different compilers, associated high-level compiler options and electrostatic field load balancing (i.e. PME, etc). We carried out two studies for our blog: our first study focused on the performance of the Rome based systems with hyperthreading enabled vs hyperthreading disabled; and our second study investigated the performance advantage obtained with Rome over the Naples system. For our Hyperthreading study, our Hyperthreading results were obtained by enabling Hyperthreading through the BIOS and adjusting the benchmarking parameters to run each benchmark with twice as many threads as the non-Hyperthreaded counterpart. As an example, for the 24-core based 7402 benchmarks, the non-Hyperthreaded single node used 48 threads (dual-processor server) and the Hyperthreaded results used 96 threads. Our results are presented in Figure 1.


Figure 1. GROMACS performance evaluation with hyper-threading disabled vs hyper-threading enabled on ROME

For these benchmarks, the electrostatic field used was Particle Mesh Ewald (PME) for Water-1536K, Water-3072K, and the HECBIOSIM datasets (1.4M and 3M). We used the reaction field (RF) electrostatic force for the Lignocellulose_3M case.

While the performance gains observed (higher is better) with enabling Hyperthreading were varied both with respect to the different processors and data sets, they were consistently better than the non-Hyperthreaded baselines (1.0). GROMACS shows a clear performance boost with hyperthreading enabled across the ROME SKUs.

In the second study, we have compared the Rome based servers to the Naples based server, using Hyperthreading enabled for all tests based on the results from the first study. We have measured the relative performance w.r.t to Naples 7601 as baseline (1.0) with the other ROME SKUs. These results are shown in Figure 2.

Figure 2. Performance evaluation across different AMD EPYC Generation Processors

Comparing the 32-core based servers (7551,7601,7452,7502), we observed a generational performance improvement of about 50%. The 24-core Rome based 7402, while lacking as many cores as the Naples systems, still managed to outperform the Naples based systems by about 20-40%, depending on the respective benchmark. The 64-core based (7702,7742) systems displayed close to a 250% increase in overall performance over the 32-core based Naples server. Overall, the Rome results, particularly with Hyperthreading enabled demonstrated a substantial performance improvement for GROMACS over Naples.

Conclusion

Dell EMC PowerEdge servers equipped with the AMD ROME processors offer significant single node performance gains over previous generation Naples counterparts for applications such as GROMACS. We found a strong positive correlation with overall system performance and processor core count and a weak correlation with processor frequency. The 64-core Rome processors delivered a sizable performance advantage over the 24-core and 32-core processors. We are in the processing of exploring how these single node performance gains (with and without Hyperthreading) will translate into multi-node performance gains for Molecular Dynamic applications on our new Minerva Cluster at the HPC and AI Innovation Lab. Watch this blog site for updates.



Quick Tips content is self-published by the Dell Support Professionals who resolve issues daily. In order to achieve a speedy publication, Quick Tips may represent only partial solutions or work-arounds that are still in development or pending further proof of successfully resolving an issue. As such Quick Tips have not been reviewed, validated or approved by Dell and should be used with appropriate caution. Dell shall not be liable for any loss, including but not limited to loss of data, loss of profit or loss of revenue, which customers may incur by following any procedure or advice set out in the Quick Tips.

Article ID: SLN319583

Last Date Modified: 11/20/2019 04:17 AM


Rate this article

Accurate
Useful
Easy to understand
Was this article helpful?
Yes No
Send us feedback
Comments cannot contain these special characters: <>()\
Sorry, our feedback system is currently down. Please try again later.

Thank you for your feedback.