Dell EMC Ready Solution for HPC Life Sciences: BWA-GATK Pipeline throughput tests with Cascade Lake CPU and Lustre/ME4 Refresh

Dell EMC Ready Solution for HPC Life Sciences: BWA-GATK Pipeline throughput tests with Cascade Lake CPU and Lustre/ME4 Refresh


64-compute node configuration of Dell EMC Ready Solutions for HPC Life Sciences can process 194 genomes per day (50x depth of coverage).

Overview

Variant calling is a process by which we identify variants from sequence data. This process helps determine if there are single nucleotide polymorphisms (SNPs) , insertions and deletions (indels) , and/or structural variants (SVs) at a given position in an individual genome or transcriptome. The main goal of identifying genomic variations is linking to human diseases. Although not all human diseases are associated with genetic variations, variant calling can provide a valuable guideline for geneticists working on a particular disease caused by genetic variations. BWA-GATK is one of the Next Generation Sequencing (NGS) computational tools that is designed to identify germline and somatic mutations from human NGS data. There are a handful of variant identification tools, and we understand that there is not a single tool that performs perfectly (1). However, we chose GATK which is one of most popular tools as our benchmarking tool to demonstrate how well the Dell EMC Ready Solutions for HPC Life Sciences can process complex and massive NGS workloads.
The purpose of this blog is to provide valuable performance information on the Intel® Xeon® Gold 6248 processor for BWA-GATK pipeline benchmark with Dell EMC Ready Solutions for HPC Lustre Storage (ME4 series refresh) (2). The Xeon® Gold 6248 CPU features 20 physical cores or 40 logical cores when utilizing hyper threading. The test cluster configurations are summarized in Table 1.



Table 1 Tested compute node configuration
Dell EMC PowerEdge C6420
CPU 2x Xeon® Gold 6248 20 cores 2.5 GHz (Cascade Lake)
RAM 12x 16GB at 2933 MTps
OS RHEL 7.6
Interconnect Intel® Omni-Path
BIOS System Profile Performance Optimized
Logical Processor Disabled
Virtualization Technology Disabled
BWA 0.7.15-r1140
Samtools 1.6
GATK 3.6-0-g89b7209

The tested compute nodes were connected to Dell EMC Ready Solutions for HPC Lustre Storage via Intel® Omni-Path. The summary configuration of the storage is listed in Table 2.

Table 2 Solution hardware and software specifications

Dell EMC Ready Solution for Lustre Storage
Number of nodes 1x Dell EMC PowerEdge R640 as Integrated Manager for Lustre (IML)
2x Dell EMC PowerEdge R740 as Metadata Server (MDS)
2x Dell EMC PowerEdge R740 as Object Storage Server (OSS)
Processors IML server: Dual Intel Xeon Gold 5118 @ 2.3 GHz
MDS and OSS servers: Dual Intel Xeon Gold 6136 @ 3.00 GHz
Memory IML server: 12 x 8 GB 2,666 MT/s DDR4 RDIMMs
MDS and OSS servers: 24 x 16 GiB 2,666 MT/s DDR4 RDIMMs
External storage
controllers
2 x Dell 12 Gb/s SAS HBAs (on each MDS)
4 x Dell 12 Gb/s SAS HBAs (on each OSS)
Object storage
enclosures
4x ME4084 with a total of 336 x 8TB NL 7.2K rpm SAS HDDs
Metadata storage
enclosure
1x ME4024 with 24x 960GB SAS SSDs. Supports up to 4.68 B inodes
RAID controllers Duplex SAS RAID controllers in the ME4084 and ME4024 enclosures
Operating system CentOS 7.5 x86_64
Red Hat Enterprise Linux (RHEL) 7.5 x86_64
BIOS version 1.4.5
Intel Omni-Path
IFS version
10.8.0.0
Lustre file system
version
2.10.4
IML version 4.0.7.0

The test data was chosen from one of Illumina’s Platinum Genomes. ERR194161 was processed with Illumina HiSeq 2000 submitted by Illumina and can be obtained from EMBL-EBI. The DNA identifier for this individual is NA12878. The description of the data from the linked website shows that this sample has a >30x depth of coverage.



Performance Evaluation

Single Sample/Multiple Nodes Performance

In Figure 1, the runtime in various number of samples and compute nodes with 50x Whole Genome Sequencing (WGS) data are summarized. The tests performed here are designed to demonstrate performance at the server level, not for comparisons on individual components. The data points in Figure 1 are calculated based on the total number of samples, one sample per compute node (X axis in the figure) that are processed concurrently. The details of BWA-GATK pipeline information can be obtained from the Broad Institute web site (3). The maximum number of compute nodes used for the tests are 64x C6420s. C6420s with Lustre/ME4 show a better scaling behavior than Lustre/MD3.


Performance comparisons between Lustre/MD3 and Lustre/ME4
Figure 1 Performance comparisons between Lustre/MD3 and Lustre/ME4



Multiple Sample/Multiple Nodes Performance

A typical way of running NGS pipeline is to run multiple samples on a compute node and use multiple compute nodes to maximize the throughput of NGS data process. The number of compute nodes used for the tests are 64 of C6420 compute nodes, and the number of samples per node is five samples. Up-to 320 samples are processed concurrently to estimate the maximum number of genomes per day without a job failure.
As shown in Figure 2, single C6420 compute node can process 3.24 of 50x whole human genomes per day when 5 samples are processed concurrently. For each sample, 7 cores and 30 GB memory are allocated.


Throughput Tests with up-to 64 C6420s and the Lustre/ME4
Figure 2 Throughput Tests with up-to 64 C6420s and the Lustre/ME4

320 of 50x whole human genomes can be processed with 64 of C6420 compute nodes in 40 hours. In other words, the performance of the test configuration summarizes as 194 genomes per day for whole human genome with 50x depth of coverage.



Conclusion

As the data size of WGS has been growing constantly. The current average size of WGS is 50x. This is 5 times larger than a typical WGS 4 years ago when we started to benchmark BWA-GATK pipeline. The increasing data does not strain storage side capacity since most applications in the pipeline are also bounded by CPU clock speed. Hence, with growing data size, the pipeline runs longer rather than generating more writes.
However, there are a greater number of temporary files are generated during the process due to the more data needs to be parallelized, and this increased number of temporary files opened at the same time exhausts the open file limit in a Linux operating system. One of the applications silently fails to complete by hitting the limit of the number of open files. A simple solution is to increase the limit to >150K.
Nonetheless, the Ready Solution with Lustre/ME4 as a scratch space has a better throughput capacity than the previous version. Now, 64 nodes Ready Solution marks 194 genomes per day processing power for 50x WGS.



Resources

1. A survey of tools for variant analysis of next-generation genome sequencing data. Pabinger S, Dander A, Fischer M, Snajder R, Sperk M, Efremova M, Krabichler B, Speicher MR, Zschocke J, Trajanoski Z. 2, s.l. : Brief Bioinform, 2014 Mar, Vol. 15 (2). 10.1093/bib/bbs086.
2. Dell EMC Ready Solution for HPC Lustre Storage. [Online] https://www.dellemc.com/resources/en-us/asset/white-papers/solutions/h17632_ready_hpc_lustre_wp.pdf
3. Genome Analysis Toolkit. [Online] https://software.broadinstitute.org/gatk/





Article ID: SLN319560

Last Date Modified: 11/19/2019 10:00 AM


Rate this article

Accurate
Useful
Easy to understand
Was this article helpful?
Yes No
Send us feedback
Comments cannot contain these special characters: <>()\
Sorry, our feedback system is currently down. Please try again later.

Thank you for your feedback.