23rd October 2014
BLAST is designed to match up biological sequences to other biological sequences. The assumption underlying algorithms is that biological sequences which look “the same” perform a similar biological function and may be related through evolution. This is not always true, but is true often enough to be useful to molecular biologists.
However, with the exponential growth of sequence database sizes as well as the increasing length of the query sequences, the execution of the BLAST algorithm like NCBI blast+ still suffers from long runtime. A standard BLAST algorithm involves three steps: 1. Find seeds; 2. Ungapped extension; 3. Gapped extension. The bulk of the data is being processed by the first two steps, and these are the most computationally demanding parts (typically >90% of runtime).
FPGAs have already been proven to be suitable for tasks in bioinformatics, due to their feature of flexible memory architecture and massive parallelization. For our analysis, high-bandwidth on-chip RAM is the key to achieve speedups of several orders of magnitude on FPGAs. The on-chip RAM resources determine the number of pipelines implemented on a single chip and the size of query for each search. By offloading the bottleneck code of NCBI blast+ to an Analytics Engine Appliance with a single Virtex-7 FPGA, our preliminary implementation results show a speedup of 80x overall a single 3.2GHz Core-i5. Note that the next generation FPGA technologies, such as UltraScale or Stratix 10, deliver higher performance and higher density on-chip memory, which could easily double the speedup to 160x.