64-compute node configuration of Dell EMC Ready Solutions for HPC Life Sciences can process 194 genomes per day (50x depth of coverage).
Overview
Variant calling is a process by which we identify variants from sequence data. This process helps determine if there are single nucleotide polymorphisms (SNPs), insertions and deletions (indels), and or structural variants (SVs) at a given position in an individual genome or transcriptome. The main goal of identifying genomic variations is linking to human diseases. Although not all human diseases are associated with genetic variations, variant calling can provide a valuable guideline for geneticists working on a particular disease caused by genetic variations. BWA-GATK is one of the Next Generation Sequencing (NGS) computational tools that are designed to identify germline and somatic mutations from human NGS data. There are a handful of variant identification tools, and we understand that there is not a single tool that performs perfectly (1). However, we chose GATK which is one of most popular tools as our benchmarking tool to demonstrate how well the Dell EMC Ready Solutions for HPC Life Sciences can process complex and massive NGS workloads.
The purpose of this blog is to provide valuable performance information about the Intel® Xeon® Gold 6248 processor for BWA-GATK pipeline benchmark with Dell EMC Ready Solutions for HPC Lustre Storage (ME4 series refresh) (2). The Xeon® Gold 6248 CPU features 20 physical cores or 40 logical cores when using hyper threading. The test cluster configurations are summarized in Table 1.
Dell EMC PowerEdge C6420 | |
---|---|
CPU | 2x Xeon® Gold 6248 20 cores 2.5 GHz (Cascade Lake) |
RAM | 12x 16GB at 2933 MTps |
OS | RHEL 7.6 |
Interconnect | Intel® Omni-Path |
BIOS System Profile | Performance Optimized |
Logical Processor | Disabled |
Virtualization Technology | Disabled |
BWA | 0.7.15-r1140 |
Samtools | 1.6 |
GATK | 3.6-0-g89b7209 |
Dell EMC Ready Solution for Lustre Storage | |
---|---|
Number of nodes | 1x Dell EMC PowerEdge R640 as Integrated Manager for Lustre (IML) 2x Dell EMC PowerEdge R740 as Metadata Server (MDS) 2x Dell EMC PowerEdge R740 as Object Storage Server (OSS) |
Processors | IML server: Dual Intel Xeon Gold 5118 @ 2.3 GHz MDS and OSS servers: Dual Intel Xeon Gold 6136 @ 3.00 GHz |
Memory | IML server: 12 x 8 GB 2,666 MT/s DDR4 RDIMMs MDS and OSS servers: 24 x 16 GiB 2,666 MT/s DDR4 RDIMMs |
External storage controllers |
2 x Dell 12 Gb/s SAS HBAs (on each MDS) 4 x Dell 12 Gb/s SAS HBAs (on each OSS) |
Object storage enclosures |
4x ME4084 with a total of 336 x 8TB NL 7.2K rpm SAS HDDs |
Metadata storage enclosure |
1x ME4024 with 24x 960GB SAS SSDs. Supports up to 4.68 B inodes |
RAID controllers | Duplex SAS RAID controllers in the ME4084 and ME4024 enclosures |
Operating system | CentOS 7.5 x86_64 Red Hat Enterprise Linux (RHEL) 7.5 x86_64 |
BIOS version | 1.4.5 |
Intel Omni-Path IFS version |
10.8.0.0 |
Lustre file system version |
2.10.4 |
IML version | 4.0.7.0 |