Moonfall with Gravity Waves at Durham University

Dell Technologies HPC & AI Center of Excellence goes to the moon.

While the recent movie Moonfall shows the moon was created (spoiler alert!) by aliens, the Institute for Computational Cosmology (ICC) at Durham University has long been making and sharing discoveries about the formation of the moon. Both agree that the Earth’s moon is integral to life on Earth, helping stabilize the Earth’s orbit and produce a stable climate without those movie gravity waves.

Outside of the movies, scientists believe the Earth’s moon was formed 4.5 billion years ago from a collision between Earth and another body approximately the size of Mars. Durham University researchers continue working to confirm our moon’s origins. Large computer simulations help narrow down the different theories around how the moon was formed, as well as help us understand atmospheric evolution and formation of the universe.

Hosted by Durham University, the COSMA high performance computing cluster provides the Memory Intensive service of the DiRAC HPC facility, a tier one national supercomputing facility in the UK. COSMA, short form for Cosmology Machine, is specifically designed to support the largest cosmological simulations and is used by researchers across the UK, as well as their collaborators around the world.

The ICC stays on the cutting edge of technology to empower leading-edge, next-generation scientific discoveries. Researchers recently amped up compute power in the DiRAC COSMA8 system to expand and accelerate research possibilities. COSMA8’s new capabilities enable the exploration of big, complex topics about the universe, such as dark matter and energy, galaxies, planetary collisions, planetary simulations and black hole mergers.

These large-scale simulations can start from the Big Bang, 13.8 billion years ago, and run through to the present day. That means a lot of data must be written to storage to allow these simulations to run through different time steps, spanning almost 14 billion years of history. Large simulations can take a few months to run, then data sets are extracted for analysis, comparison and validation with telescope observations and statistical evaluation. Data sets are segmented based on relevant, research-specific criteria, for example, using distinct time steps or narrower portions of the universe or specific telescope observations.

Behind the scenes with COSMA8

The team doesn’t just push the boundaries of the universe, it also pushes the boundaries of information technology. COSMA8 has AMD EPYC™ processors inside Dell PowerEdge C6525 servers connected with NVIDIA Quantum 200Gb/s InfiniBand. The InfiniBand network brings in-network computing engines, the world’s only fully offloaded remote direct memory access, smart routing, and self-healing capabilities. COSMA8 has more than 4 times as many cores per server node than the previous system.

“We chose these processors to get a large number of cores per node and to cut down the amount of inter-node communication while achieving high clock rates so that all parts of the code stood up well,” states Dr. Alastair Basden, Head of COSMA HPC Services at Durham University and DiRAC Technical Manager. “We also look forward to putting new AMD EPYC processors with AMD 3D V-Cache™ technology to the test.”

The current server processors provide 3 times larger cache than those in the previous system, generating a two-fold benefit. First, more cache-sharing improves performance by limiting data movement between cores and RAM, resulting in enhanced per-core performance. Although that movement is automated, it can incur processing degradation. Second, the larger cache provides simplification for codes which are cache-aware.

A tip from Dr Basden: It is worth spending the time to investigate which systems will run your code the fastest. When research teams apply for time on the COSMA8 system, they can also apply for research software engineer time to get help optimizing scientific codes.

In addition to running simulations on the liquid-cooled COSMA8 system, teams can:

    • Access Dell PowerEdge servers with the AMD Instinct™ Mi100 accelerator and soon the AMD Instinct Mi210, converting codes with HIPIFY to the ROCm™ open software platform.
    • For artificial intelligence work such as searching for gravitational lensing, teams can tap into PowerEdge servers with NVIDIA A100 Tensor Core GPUs with Liqid Matrix, enabling the IT team to compose servers on demand and re-allocate resources as needed, all via software.
    • Try Rockport®, a distributed, high-performance direct interconnect that is self-discovering, self-configuring and self-healing.

They can also test their code on the Durham Intelligent NIC Environment (DINE) system with multiple generations of NVIDIA BlueField DPUs. These SmartNICs enable direct access from servers to remote memory to improve the performance of massively parallel codes, in preparation for future exascale systems. They also allow investigations into in-network computing, leading to a novel HPC computing environment.

More storage equals more insights

DiRAC systems at Durham University are in high demand with every node almost always fully in use. The quick adoption may be because it is very easy for researchers to move codes from previous systems to COSMA8. “With technology transitions, code transfer and adjustments are always big concerns. With the COSMA8 technology upgrades, it literally just works out of the box,” shares Dr. Basden.

Since large simulations require a lot of server memory to run, COSMA 8 leverages two types of storage to ensure memory is freed up for processing. Generic, long-term, “bulk” storage is used to store data long-term. Temporary, “scratch” storage is used while large simulations are running to dump data into fast storage and free up memory. This prevents compute time degradation and allows for a job re-start if needed by writing checkpoints to storage to preserve simulation-generated data.

When more server memory is available for large-scale simulations, it ladders up exploration of the universe. For example, expanding particle numbers within a simulation, running larger simulations, and conducting more detailed simulations to cover a larger fraction of the universe all become possible. 

Earth-shattering science

COSMA8 delivers the powerful high-performance compute and storage capabilities that researchers need to unlock insights about a complex and largely unknown universe.  “About 75% of the universe is made up of dark matter that we don’t understand,” Dr. Basden explains. “By running large-scale simulations, we are able to find out more about dark matter and ultimately, we begin to understand more about the universe and how it impacts our world.”

World-leading technology in the DiRAC Memory Intensive service at Durham University is fueling cutting-edge scientific exploration. Each simulation requires massive amounts of data processed at high speeds. One simulation can produce hundreds of terabytes of data. COSMA8, with increased core densities, enhanced memory levels and speed, faster processing and more storage, is opening the door to advanced discoveries.

“COSMA8 uses a rich mix of breakthrough technologies which, in turn, allows immensely detailed simulations at greater scale,” explains Dr. Basden. “Finding the answers to questions about the universe, dark matter and planetary impacts relies on leading-edge technology that enables our researchers to unlock difficult-to-generate insights about the world in which we live.”

Read how Durham University is unlocking cosmological secrets. Watch modeling planets or atmospheric impact of planetary collisions.

John Lockman

About the Author: John Lockman

As a developer and evangelist for containerization and orchestration technologies, John develops future concept architectures in the Dell Chief Technology Innovation Office. He specializes in nature-inspired programming, artificial intelligence, and high performance computing. John brings a passion for exploration of novel compute paradigms and crafting tools that make advanced computing accessible to a larger audience.