Skip to main content
  • Place orders quickly and easily
  • View orders and track your shipping status
  • Enjoy members-only rewards and discounts
  • Create and access a list of your products
  • Manage your Dell EMC sites, products, and product-level contacts using Company Administration.

Article Number: 000147919


SPAdes assembler test with Intel® Optane™ DC P4800X and Intel® Memory Drive Technology

Summary: HPC High Performance Computing, HPC & AI Innovation Lab, Genomics, De Novo Assembly, Next Generation Sequencing, SPAdes, Intel Memory Drive Technology, Optane DC P4800X

Article Content


Symptoms

Article written by Kihoon Yoon of HPC and AI Innovation Lab in March 2019
Resolution

Overview


We have been searching for a new technology that makes De Novo Assembly more affordable. Although we characterized two De Novo Assembly applications; SOAPdenovo2 and SPAdes[1] with Dell EMC PowerEdge R940, this R940 system for ultra-deep sequencing data analysis can be quite expensive due to the requirement of large memory, and there was not an alternative with an acceptable performance until now. Intel® Memory Drive Technology (IMDT) is one of the new technologies allowing us to build a more cost-effective solution for problems demanding large memory. IMDT integrates Intel® Optane™ Solid State Drive (SSD) into the memory subsystem transparently and leverages economic benefit of SSDs as shown in Figure 1. It allows for system memory to be assembled from DRAM and the low latency PCIe-based Intel® SSD and the low latency PCIe-based Intel® SSD.
   SLN316587_en_US__1Figure1_IMDT
  
IMDT is designed for high-concurrency and in-memory analytics workloads and optimized for up to 8x system memory expansion over installed DRAM capacity. In other words, a system only needs 382GB RAM installed to map 3TB Optane memory. Further, IMDT has ultra-low latencies and close-to DRAM performance, and it consistently provides reliable SDM quality of service.
Table 1 shows an overview of configuration for two systems we used in a performance benchmark study. The 4-processor R940 is set up with 1.5TB DRAM while the 2-processor R740xd has 382GB DRAM and 3TB Optane memory with IMDT. The rest of the components are similarly configured.
 
Table 1 Dell EMC PowerEdge R940/DRAM and R740xd/Optane Configurations
  Dell EMC PowerEdge R940 Dell EMC PowerEdge R740xd
CPU 4x Intel® Xeon® Platinum 8168 CPU, 24c @ 2.70GHz 2x Intel® Xeon® Platinum 8168 CPU, 24c @ 2.70GHz
RAM 48x 32GB @2666 MHz 24x 16GB @2666 MHz with 3TB SDM
OS RHEL 7.4 RHEL 7.4RHEL 7.4
Kernel 3.10.0-693.el7.x86_64 3.10.0-693.21.1.el7_lustre.x86_64
BIOS System Profile Performance OptimizedPerformance OptimizedPerformance Optimized SDM/Default
Logical Processor Enabled Enabled
Virtualization Technology Enabled Enabled
SPAdes Version 3.10.1 3.10.1
Python Version 2.7.13 2.7.13

The data used for the tests is a paired-end read, ERR318658 which can be downloaded from European Nucleotide Archive (ENA). The read generated from blood sample as a control to identify somatic alterations in the primary and metastatic colorectal tumors. This data contains 3.2 Billion Reads (BR) with the read length of 101 nucleotides.

Performance Evaluation


In the benchmark comparison presented here, SPAdes runs three sets of de Bruijn graphs with 21-mer, 33-mer, and 55-mer consecutively. Hyperthreading is enabled both R940 and R740xd since IMDT is optimized for use with hyperthreading turned-on. The number of cores tested here are 28, 46, and 92 cores. Although both systems have hyperthreading turned-on, 92-core is lower than the number of physical cores in R940, yet it is almost double the number of physical cores on the R740xd. This increases the runtime gap between R940/DRAM and R740xd/Optane for 92-core test as shown in Figure 2 although with only a surprising low addition of runtime.

SLN316587_en_US__2Figure2_IMDT
  
Overall, the R740xd/Optane system scales well even when hyperthreading is utilized. The runtime differences are marginal when the number of physical cores used is similar and a high degree of parallelism is used as shown in 28-core and 46-core tests.

Conclusion


R940 with 1.5TB DRAM costs more than twice the R740xd/Optane with 382 GB DRAM and 3 TB available memory via Optane and IMDT. Thus, with the R740xd with Optane/IMDT cost is significantly reduced, while the performance is still in a reasonable range. Indeed, less than 10 days of runtime for 3.2 billion reads is quite impressive. This is a good alternative De Novo assembly system for customers with limited budget.   
 
[1] 1SPAdes – St. Petersburg genome assembler – is an assembly toolkit containing various assembly pipelines. The denovo assembly methods used by SPAdes may cause the software to require very large system memory in order to be able to handle certain genomes. http://cab.spbu.ru/software/spades/

Article Properties


Last Published Date

21 Feb 2021

Version

3

Article Type

Solution