Skip to main content
  • Place orders quickly and easily
  • View orders and track your shipping status
  • Enjoy members-only rewards and discounts
  • Create and access a list of your products
  • Manage your Dell EMC sites, products, and product-level contacts using Company Administration.
Some article numbers may have changed. If this isn't what you're looking for, try searching all articles. Search articles

Dell EMC Ready Solution for HPC Life Sciences: BWA-GATK Pipeline throughput tests with Cascade Lake CPU and Lustre ME4 Refresh

Summary: Dell EMC Ready Solution for HPC Life Sciences: BWA-GATK Pipeline throughput tests with Cascade Lake CPU and Lustre ME4 Refresh

This article may have been automatically translated. If you have any feedback regarding its quality, please let us know using the form at the bottom of this page.

Article Content


64-compute node configuration of Dell EMC Ready Solutions for HPC Life Sciences can process 194 genomes per day (50x depth of coverage).


Variant callingThis hyperlink is taking you to a website outside of Dell Technologies. is a process by which we identify variants from sequence data. This process helps determine if there are single nucleotide polymorphisms (SNPs), insertions and deletions (indels), and or structural variants (SVs) at a given position in an individual genome or transcriptome. The main goal of identifying genomic variations is linking to human diseases. Although not all human diseases are associated with genetic variations, variant calling can provide a valuable guideline for geneticists working on a particular disease caused by genetic variations. BWA-GATK is one of the Next Generation Sequencing (NGS) computational tools that are designed to identify germline and somatic mutations from human NGS data. There are a handful of variant identification tools, and we understand that there is not a single tool that performs perfectly (1). However, we chose GATK which is one of most popular tools as our benchmarking tool to demonstrate how well the Dell EMC Ready Solutions for HPC Life Sciences can process complex and massive NGS workloads. 
The purpose of this blog is to provide valuable performance information about the Intel® Xeon® Gold 6248 processor for BWA-GATK pipeline benchmark with Dell EMC Ready Solutions for HPC Lustre Storage (ME4 series refresh) (2). The Xeon® Gold 6248 CPU features 20 physical cores or 40 logical cores when using hyper threading. The test cluster configurations are summarized in Table 1.

Table 1 Tested compute node configuration
Dell EMC PowerEdge C6420
CPU 2x Xeon® Gold 6248 20 cores 2.5 GHz (Cascade Lake)
RAM 12x 16GB at 2933 MTps
Interconnect Intel® Omni-Path
BIOS System Profile Performance Optimized
Logical Processor Disabled
Virtualization Technology Disabled
BWA 0.7.15-r1140
Samtools 1.6
GATK 3.6-0-g89b7209

The tested compute nodes were connected to Dell EMC Ready Solutions for HPC Lustre Storage via Intel® Omni-Path. The summary configuration of the storage is listed in Table 2.
Table 2 Solution hardware and software specifications
Dell EMC Ready Solution for Lustre Storage
Number of nodes 1x Dell EMC PowerEdge R640 as Integrated Manager for Lustre (IML)
2x Dell EMC PowerEdge R740 as Metadata Server (MDS) 
2x Dell EMC PowerEdge R740 as Object Storage Server (OSS)
Processors IML server: Dual Intel Xeon Gold 5118 @ 2.3 GHz
MDS and OSS servers: Dual Intel Xeon Gold 6136 @ 3.00 GHz
Memory IML server: 12 x 8 GB 2,666 MT/s DDR4 RDIMMs
MDS and OSS servers: 24 x 16 GiB 2,666 MT/s DDR4 RDIMMs
External storage
2 x Dell 12 Gb/s SAS HBAs (on each MDS)
4 x Dell 12 Gb/s SAS HBAs (on each OSS)
Object storage
4x ME4084 with a total of 336 x 8TB NL 7.2K rpm SAS HDDs
Metadata storage
1x ME4024 with 24x 960GB SAS SSDs. Supports up to 4.68 B inodes
RAID controllers Duplex SAS RAID controllers in the ME4084 and ME4024 enclosures
Operating system CentOS 7.5 x86_64
Red Hat Enterprise Linux (RHEL) 7.5 x86_64
BIOS version 1.4.5
Intel Omni-Path
IFS version
Lustre file system
IML version

The test data was chosen from one of Illumina’s Platinum Genomes. ERR194161 was processed with Illumina HiSeq 2000 submitted by Illumina and can be obtained from EMBL-EBI. The DNA identifier for this individual is NA12878. The description of the data from the linked website shows that this sample has a >30x depth of coverage.

Performance Evaluation

Single Sample Multiple Nodes Performance

In Figure 1, the runtime in various number of samples and compute nodes with 50x Whole Genome Sequencing (WGS) data are summarized. The tests performed here are designed to demonstrate performance at the server level, not for comparisons on individual components. The data points in Figure 1 are calculated based on the total number of samples, one sample per compute node (X axis in the figure) that are processed concurrently. The details of BWA-GATK pipeline information can be obtained from the Broad Institute web site (3). The maximum number of compute nodes used for the tests are 64x C6420s. C6420s with Lustre ME4 show a better scaling behavior than Lustre MD3.

  Performance comparisons between Lustre MD3 and Lustre ME4
Figure 1 Performance comparisons between Lustre MD3 and Lustre ME4

Multiple Sample Multiple Nodes Performance

A typical way of running NGS pipeline is to run multiple samples on a compute node and use multiple compute nodes to maximize the throughput of NGS data process. The number of compute nodes used for the tests are 64 of C6420 compute nodes, and the number of samples per node is five samples. Up-to 320 samples are processed concurrently to estimate the maximum number of genomes per day without a job failure.
As shown in Figure 2, single C6420 compute node can process 3.24 of 50x whole human genomes per day when 5 samples are processed concurrently. For each sample, 7 cores and 30 GB memory are allocated. 

  Throughput Tests with up to 64 C6420s and the Lustre ME4
Figure 2 Throughput Tests with up-to 64 C6420s and the Lustre ME4

320 of 50x whole human genomes can be processed with 64 of C6420 compute nodes in 40 hours.  In other words, the performance of the test configuration summarizes as 194 genomes per day for whole human genome with 50x depth of coverage.


As the data size of WGS has been growing constantly. The current average size of WGS is 50x. This is 5 times larger than a typical WGS 4 years ago when we started to benchmark BWA-GATK pipeline. The increasing data does not strain storage side capacity since most applications in the pipeline are also bounded by CPU clock speed. Hence, with growing data size, the pipeline runs longer rather than generating more writes.
However, there are a greater number of temporary files are generated during the process due to the more data needs to be parallelized, and this increased number of temporary files opened at the same time exhausts the open file limit in a Linux operating system. One of the applications silently fails to complete by hitting the limit of the number of open files. A simple solution is to increase the limit to >150K. 
Nonetheless, the Ready Solution with Lustre ME4 as a scratch space has a better throughput capacity than the previous version. Now, 64 nodes Ready Solution marks 194 genomes per day processing power for 50x WGS.


1. A survey of tools for variant analysis of next-generation genome sequencing data. Pabinger S, Dander A, Fischer M, Snajder R, Sperk M, Efremova M, Krabichler B, Speicher MR, Zschocke J, Trajanoski Z. 2, s.l. : Brief Bioinform, 2014 Mar, Vol. 15 (2). 10.1093/bib/bbs086.
2. Dell EMC Ready Solution for HPC Lustre Storage.  (Article no longer available for reference, pulled by HPC team)
3. Genome Analysis Toolkit. This hyperlink is taking you to a website outside of Dell Technologies.

Article Properties

Affected Product

ME Series, Dell EMC Ready Solution Resources, PowerEdge C6420, Dell EMC PowerVault ME4024, Dell EMC PowerVault ME4084, Red Hat Enterprise Linux Version 7

Last Published Date

11 Jan 2024



Article Type