SPAdes assembler test with Intel® Optane™ DC P4800X and Intel® Memory Drive Technology

SPAdes assembler test with Intel® Optane™ DC P4800X and Intel® Memory Drive Technology



Article written by Kihoon Yoon of HPC and AI Innovation Lab in March 2019

Overview


We have been searching for a new technology that makes De Novo Assembly more affordable. Although we characterized two De Novo Assembly applications; SOAPdenovo2 and SPAdes[1] with Dell EMC PowerEdge R940, this R940 system for ultra-deep sequencing data analysis can be quite expensive due to the requirement of large memory, and there was not an alternative with an acceptable performance until now. Intel® Memory Drive Technology (IMDT) is one of the new technologies allowing us to build a more cost-effective solution for problems demanding large memory. IMDT integrates Intel® Optane™ Solid State Drive (SSD) into the memory subsystem transparently and leverages economic benefit of SSDs as shown in Figure 1. It allows for system memory to be assembled from DRAM and the low latency PCIe-based Intel® SSD and the low latency PCIe-based Intel® SSD.


IMDT is designed for high-concurrency and in-memory analytics workloads and optimized for up to 8x system memory expansion over installed DRAM capacity. In other words, a system only needs 382GB RAM installed to map 3TB Optane memory. Further, IMDT has ultra-low latencies and close-to DRAM performance, and it consistently provides reliable SDM quality of service.
Table 1 shows an overview of configuration for two systems we used in a performance benchmark study. The 4-processor R940 is set up with 1.5TB DRAM while the 2-processor R740xd has 382GB DRAM and 3TB Optane memory with IMDT. The rest of the components are similarly configured.
Table 1 Dell EMC PowerEdge R940/DRAM and R740xd/Optane Configurations
Dell EMC PowerEdge R940 Dell EMC PowerEdge R740xd
CPU 4x Intel® Xeon® Platinum 8168 CPU, 24c @ 2.70GHz 2x Intel® Xeon® Platinum 8168 CPU, 24c @ 2.70GHz
RAM 48x 32GB @2666 MHz 24x 16GB @2666 MHz with 3TB SDM
OS RHEL 7.4 RHEL 7.4RHEL 7.4
Kernel 3.10.0-693.el7.x86_64 3.10.0-693.21.1.el7_lustre.x86_64
BIOS System Profile Performance OptimizedPerformance OptimizedPerformance Optimized SDM/Default
Logical Processor Enabled Enabled
Virtualization Technology Enabled Enabled
SPAdes Version 3.10.1 3.10.1
Python Version 2.7.13 2.7.13

The data used for the tests is a paired-end read, ERR318658 which can be downloaded from European Nucleotide Archive (ENA). The read generated from blood sample as a control to identify somatic alterations in the primary and metastatic colorectal tumors. This data contains 3.2 Billion Reads (BR) with the read length of 101 nucleotides.

Performance Evaluation


In the benchmark comparison presented here, SPAdes runs three sets of de Bruijn graphs with 21-mer, 33-mer, and 55-mer consecutively. Hyperthreading is enabled both R940 and R740xd since IMDT is optimized for use with hyperthreading turned-on. The number of cores tested here are 28, 46, and 92 cores. Although both systems have hyperthreading turned-on, 92-core is lower than the number of physical cores in R940, yet it is almost double the number of physical cores on the R740xd. This increases the runtime gap between R940/DRAM and R740xd/Optane for 92-core test as shown in Figure 2 although with only a surprising low addition of runtime.



Overall, the R740xd/Optane system scales well even when hyperthreading is utilized. The runtime differences are marginal when the number of physical cores used is similar and a high degree of parallelism is used as shown in 28-core and 46-core tests.

Conclusion


R940 with 1.5TB DRAM costs more than twice the R740xd/Optane with 382 GB DRAM and 3 TB available memory via Optane and IMDT. Thus, with the R740xd with Optane/IMDT cost is significantly reduced, while the performance is still in a reasonable range. Indeed, less than 10 days of runtime for 3.2 billion reads is quite impressive. This is a good alternative De Novo assembly system for customers with limited budget.

[1] 1SPAdes – St. Petersburg genome assembler – is an assembly toolkit containing various assembly pipelines. The denovo assembly methods used by SPAdes may cause the software to require very large system memory in order to be able to handle certain genomes. http://cab.spbu.ru/software/spades/


Quick Tips content is self-published by the Dell Support Professionals who resolve issues daily. In order to achieve a speedy publication, Quick Tips may represent only partial solutions or work-arounds that are still in development or pending further proof of successfully resolving an issue. As such Quick Tips have not been reviewed, validated or approved by Dell and should be used with appropriate caution. Dell shall not be liable for any loss, including but not limited to loss of data, loss of profit or loss of revenue, which customers may incur by following any procedure or advice set out in the Quick Tips.

Article ID: SLN316587

Last Date Modified: 03/19/2019 02:02 PM


Rate this article

Accurate
Useful
Easy to understand
Was this article helpful?
Yes No
Send us feedback
Comments cannot contain these special characters: <>()\
Sorry, our feedback system is currently down. Please try again later.

Thank you for your feedback.