Boost Genomic Sequencing with Falcon Accelerated Genomics Pipeline (FAGP) on Intel FPGA PAC

Boost Genomic Sequencing with Falcon Accelerated Genomics Pipeline (FAGP) on Intel FPGA PAC


Falcon Accelerated Genomics Pipeline with a single Intel FPGA Programmable Acceleration Card can process 50x whole human genomes in less than 3 hours through Alternative Variant Calling Pipeline.



Overview, Market Challenge (need), Falcon solution answers the need

Precision medicine, genomics, and epigenetics are using genomic sequencing to conduct research, improve diagnosis, develop pharmaceuticals, increase the quality of care for healthcare providers, and optimize crop production. For life sciences, genome analysis is now a key application, due in part to the large cost reduction of data collection from advances in next-generation sequencing (NGS). In addition to increased data collection, there has also been significant growth in the range of genomic applications used across universities, genomic research centers, pharmaceutical companies, and healthcare organizations.
Every seven months the amount of genome data is doubling (1). Consequently, data processing in an efficient and cost-effective manner has become critical. The computational power of processor-only solutions is not scaling fast enough to keep up with genomic data growth. This has led to the need for hardware acceleration. Accelerators such as FPGAs are becoming pivotal in matching the computational demands of this genomic data explosion. Compared to other hardware-accelerated solutions, the Falcon Accelerated Genomics Pipeline (FAGP) offers flexibility, high throughput, and a lower cost per sample.



What is FPGA, Intel PAC offering & Advantage

FPGAs are silicon devices that can be dynamically reprogrammed with a data path that exactly matches your workloads, such as Genomic Sequencing, Data Analytics, or Compression as illustrated in Figure 1. This versatility enables the provisioning of faster processing, more power-efficient computation, and lower latency service – lowering your total cost of ownership and maximizing compute capacity within the power, space, and cooling constraints of your data centers.
Traditionally, FPGAs require deep domain expertise to program. To simplify the development flow and enable rapid deployment across the data center, Intel offers an Acceleration Platform that includes PCI Express* (PCIe*) - based Intel FPGA Programmable Acceleration Cards (Intel FPGA PAC) and the Intel® Acceleration Stack for Intel Xeon® CPU with FPGAs. These Intel platforms are qualified, validated, and deployed through Dell EMC. Together with ecosystem partners like Falcon Computing, Intel Acceleration Platform offers a reliable and ready-to-go solution with transparent hardware under-the-hood.



standard GATK pipeline
Figure 1 Improved accuracy and speed on standard GATK pipeline



Falcon Solution Details:

Genome Analysis Toolkit (GATK) is the gold standard for genomic data processing accepted by the genomics community (2). Its Best Practice Workflow (BPW) is well-known for its slowness in computation to generate results for large samples such as Whole-Genome (WGS). To address this issue, Falcon Computing Solutions has developed a flexible software package of tools that follows the BPW and can be easily implemented in multiple platforms and architectures. It is fast by several orders of magnitude when compared to CPU-based GATK pipelines.
FAGP provides an end-to-end solution to cost-effectively analyze genomic data using the GATK pipeline with high performance, accuracy, and reproducibility. The solution delivers up to 15x speedup with the same accuracy as GATK (3). This means an analysis that typically takes 50 to 60 hours can be conducted in under 4 hours (3). FAGP provides exceptional levels of acceleration and accuracy in conjunction with high-performance, reliable Intel Arria 10 FPGAs and Intel® Xeon® processors.
FAGP follows GATK BPW. It implements acceleration in many components of the pipelines from alignment (BWA) to variant calling (HaplotypeCaller) (4). In addition to the accelerated BWA it also includes an accelerated version of the aligner Minimap2 that is part of the Alternate Genomic Pipeline from Falcon (5). The alternate pipeline provides an even faster solution. It can complete 50x Whole Genome Sequencing within 3 hours. Both aligners have the feature to generate marked duplicates and sorted reads without the need to use additional tools.
FAGP achieves high performance/throughput by accelerating intensive computation in GATK pipeline using Intel FPGA PAC platforms. This is different from scale-out solutions that achieve high throughput by adding more CPU resources. Such scale-out solutions have limited ability to reduce costs or per-sample latency.
Another advantage of Falcon solution is that it is an open pipeline as GATK. Users can control individual steps in the pipelines. Intermediate data are saved and can be accessed.


Table 1 Advantages of Falcon Accelerated Genomics Pipeline

Falcon Accelerated Genomics Pipeline (FAGP) Advantages
True GATK Support for multiple GATK versions, including 4.0
Industry-scale Run five whole genomes or 24 whole exomes in one day
Alternative variant < 3-hour turnaround time on-prem for WGS (50X)
Speed Execute GATK best practices pipeline up to >15x times faster
Leverage existing No need to rewrite working algorithms



Dell Hardware Configuration

Table 2 Dell EMC PowerEdge R740xd as a testbed

Dell EMC PowerEdge R740xd
Processor 2x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Memory 384GB @ 32x 16GB RDIMM, 2666MT/s, Dual Rank
Storage 4x 1.2TB 10K RPM SAS 12Gbps 512n 2.5in Hot-plug Hard Drive in RAID 0 2x INTEL SSDPEDMD020T4 DC P3700 1.8T in software RAID 0
FPGA Intel Programmable Acceleration Card with Intel Arria® 10 GX FPGA (Intel Acceleration Stack 1.1)
System Profile Performance
BIOS version 2.1.3
Hyperthreading Enabled
OS Red Hat Enterprise Linux Server release 7.4 (Maipo) (3.10.0-693.el7.x86_64)



Performance Evaluation

In our benchmark testing, we used whole human genome sequencing data at 10x, 30x, and 50x depth of coverage.


Table 3 Tested whole-genome sequencing data



Results:

Table 4 summarizes the time taken to complete the GATK 4.0 Best Practices Pipeline over three test cycles using FAGP and the Intel FPGA PAC housed in the DELL EMC PowerEdge R740xd server.


Table 4 Total runtimes from Best Practice Pipeline version 2.1.1
Sample Depth of Coverage Test 1 Runtime (minutes)
Test 2
Test 3
ERR091571 10x 75.63 76.67 76.38
SRR3124837 30x 160.00 162.77 161.38
ERR194161 50x 242.97 250.65 247.18

Table 5 summarizes the time (in minutes) taken to complete the alternative pipeline: Falcon Germline over three test cycles using FAGP and the Intel FPGA PAC housed in the DELL EMC PowerEdge R740xd server.


Table 5 Total runtimes from Alternative Variant Calling Pipeline
Sample Depth of Coverage Test 1 Runtime (minutes)
Test 2
Test 3
ERR091571 10x 62.70 58.21 59.80
SRR3124837 30x 130.38 129.90 129.95
ERR194161 50x 171.52 171.87 171.37



Summary of Falcon Genomic Solution

The Falcon Accelerated Genomics Pipeline offers high throughput, low cost/sample/day benefit. Together with the Intel FPGA Programmable Acceleration Card and certified DELL server, FAGP provides a complete solution that can be easily adopted for your genomic sequencing applications.
"At TCGB, we provide genome sequencing services to our nationwide clients. The Falcon Accelerated Genomics Pipeline* has enabled us to cut our turnaround from days into few hours while maintaining the accuracy of industry-standard GATK pipelines."
— Dr Xinmin Li, Director of Technology Center for Genomics & Bioinformatics (TCGB) UCLA



Resources

1. Sequencing the genome creates so much data we don’t know what to do with it. [Online] https://www.washingtonpost.com/news/speaking-of-science/wp/2015/07/07/sequencing-the-genome-creates-so-much-data-we-dont-know-what-to-do-with-it.
2. GATK. [Online] https://software.broadinstitute.org/gatk/
3. Accelerated Genomics. [Online] http://www.falconcomputing.com/falcon-accelerated-genomics-pipeline
4. BWA. [Online] http://bio-bwa.sourceforge.net/bwa.shtml
5. Minimap2. [Online] https://github.com/lh3/minimap2






Article ID: SLN319291

Last Date Modified: 10/31/2019 08:05 AM


Rate this article

Accurate
Useful
Easy to understand
Was this article helpful?
Yes No
Send us feedback
Comments cannot contain these special characters: <>()\
Sorry, our feedback system is currently down. Please try again later.

Thank you for your feedback.