Accelerating Bioinformatics Algorithms with Reconfigurable Computing - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

Accelerating Bioinformatics Algorithms with Reconfigurable Computing

Description:

Accelerating Bioinformatics Algorithms with Reconfigurable Computing Presentation to MAPLD Conference September 2004 Overview The Problem BioInformatics Algorithm ... – PowerPoint PPT presentation

Number of Views:35

Avg rating:3.0/5.0

Slides: 29

Provided by: klabsOrgm

Learn more at: http://klabs.org

Category:

more less

Transcript and Presenter's Notes

Title: Accelerating Bioinformatics Algorithms with Reconfigurable Computing

1
Accelerating Bioinformatics Algorithms with
Reconfigurable Computing

Presentation to MAPLD Conference
September 2004

2
Overview

The Problem
BioInformatics Algorithm Smith Waterman
Current Implementations
The Solution
Viva as a Reconfigurable Computing SW HW Design
Tool
Hypercomputer Architecture for High-End RC
applications
The Implementation
Smith Waterman Viva Code
Smith Waterman Pipeline Design
Smith Waterman Pipeline applied to Hypercomputer
Architecture
Smith Waterman Pipeline Primitives inside the
FPGA
The Results
Visualization of Rat vs. Human Genetic Code
Informal Benchmarks
Other Potential Applications
Seismic Data Processing Weather Modeling Image
Rendering

Page 2
3
The ProblemEnormous Biosciences Problems

Exploding Datasets in Biosciences
DNA Sequencing
Gene Expression
Protein Identification

Page 3
4
The Need High-Speed High Sensitivity Algorithms

High-Speed High-Sensitivity DNA and Protein
Searching Algorithms
Critical in virtually every branch of molecular
biology.
Smith-Waterman
Theoretically optimal for sequence matching.
BUT Compute Intensive!
BLAST and FASTA
Approximations.
Faster than Smith Waterman, but less sensitive.

Page 4
5
The Need High-Speed High Sensitivity
Algorithms

Comparative Genomics Comparing the genomes of
related species
Identifying genes, defining gene structure,
elucidating evolutionary change, identifying
regulatory elements and revealing combinatorial
control of gene regulation
Sequencing Effort
Human sequence is completed other organisms now
being sequenced
Sequencing effort will require high sensitivity
DNA searches and alignments
SmithWaterman preferred method of choicemore
accurate, specific
NCBI BLAST, WU BLAWST not effective in
low-coverage DNA situations
RNA interference (RNAi) seeking novel therapies
developing new drugs.
The process Choosing the correct genetic
sequence to effectively block a targeted
messenger RNA (mRNA) without silencing additional
genes
Due to word length limitations, BLAST algorithms
can miss sequences that have one or more
mismatches compared to the query siRNA sequence
Genome Annotation
BLAST does not allow for long introns or
frameshifts
Smith-Waterman is both frameshift- and
intron-tolerant.

Page5
6
The Need High-Speed Smith Waterman

Large Matrix comparison
Large datasets
High level of detail for each SW calculation
NOT heuristic approximations

Page 6
7
The Need High Performance Biosciences Platform

Cluster Computingmost widely used platform. BUT
there are diminishing returns
Expensive to build, difficult to maintain
Require significant power, air conditioning, and
physical space
Architecture inherently limits scalability and
performance
Reconfigurable Computing(RC)the promising
alternative
Advantages of a Custom Chip
Implement algorithms directly in hardware
Performance advantages of an ASIC, but without
chip development cost
Advantages of a General Purpose Platform
Development time comparable to software
development
FPGAs can be reconfigured to perform other
computational tasks.

Page 7
8
The SolutionFPGA-Programming Environment Viva

VIVA GRAPHICAL LANGUAGE
Capture natively parallel code
Accommodate data of any type, size, or precision
Tune algorithms for speed of execution or
conservation of hardware resources
VIVA EDITOR
Call Viva algorithms from legacy code such as C,
C, or Fortran
Interactively debug code
Import/Export EDIF files

VIVA COMPILER/SYNTHESIZER
Program multi-million gate designs
Compile hardware designs quickly for efficient
development
VIVA LIBRARIES
Reuse flexible Viva objects which accept any data
type or size
Target any hardware platform with a System
Description
Prototype Viva on any X-86-based Windows machine

Page 8
9
The Solution FPGA-based Hypercomputers
Page 9
10
Structure of an FPGA Processing Element
Page 10
11
Structure of a Processing Element Quad
Page 11
12
Structure of a Hypercomputer Accelerator Board
Page 12
13
The Prototype ImplementationSmith Waterman in
Viva Code

Page 13
14
Smith Waterman Program Flow

As the query sequence is loaded, the Init_Cells
object creates our initial column and stores it
in SW_Cell_Mem.
After this initialization period, SW_Cell_Mem
will provide a cell to the chain SW_Iteration
objects every clock cycle. It will also write a
newly calculated cell every clock cycle.
The SW_Cell_Mem object stores every nth column,
where n is the number of SW_Iteration objects.

Page 14
15
Smith Waterman Cells

There are as many cells as there are characters
in the query sequence.
The array of cells represent a column of the
scoring matrix.
The initial (zero) column is initialized and
stored into the cell memory object, SW_Cell_Mem.
Each cell contains the following four parameters
Pattern a character from the query sequence
Score the score of this cell in the current i,j
position
PatternStart the position in the query sequence
from which the score was calculated
DataStart the position in the reference
sequence from which the score was calculated

Page 15
16
Cell Data Types

Data Element size may be adjusted depending on
usage
Pattern contains as many bits as needed to
encode characters from the sequences 4 bits for
nucleotides.
Score and PatternStart Equal in size. Must be
large enough to encode the number of entries in
the query sequence
DataStart will be the largest data set as it
must be able to encode any position in the
reference sequence.
Right size for the job
Less circuitry is needed to calculate matches in
smaller sequences.
Smaller sequences may exploit more parallelism.

Page 16
17
Smith Waterman Data Sets
In this example, our Pattern contains 4 bits, for
modeling nucleotides. The Score and PatternStart
parameters contain 26 bits, so our query sequence
may contain up to 67,108,864 characters. The
DataStart parameter contains 27 bits, meaning our
reference sequence may contain up to 134,217,728
characters.
Page 17
18
Smith Waterman Iteration
Page 18
19
SW_Iteration Object

Inputs
Matrix_In receives a constant stream of cells.
It is imperative for efficiency that the pipe
remains full.
Data receives a single character from the
reference sequence. The cells computed will be
for the column of the scoring matrix
corresponding to the Data value.
CountBy the radix of the algorithm (number of
iteration objects)
Init_J_In this iteration objects index in the
chain of iteration objects
ClkG System Clock
Token_In a token pulse precedes a set of cells,
allowing the iteration object to clear-out data
from the previous set of cells
Init initialization pulse utilized only before
search commences
G accompanies each valid cell

Page 19
20
SW_Iteration Object

Outputs
Matrix_Out newly-computed cell
Token_Out passes token to next iteration object
D accompanies each newly-computed cell
Init_J_Out used by next iteration object
I J current row and column used to report
results

Page 20
21
Pipe Stages

The SW_Iteration object contains four pipe
stages.
A cell is received by and produced by the
SW_Iteration object every clock cycle.
When a cell enters, it is coming from the
previous column, so its values are those of the
West neighbor.
Since the cell in the row above any given cell is
in the next pipe stage, access to both the North
and Northwest neighbors values are possible.

Page 21
22
Parallelism

If a given hardware system has enough physical
resources to accommodate n SW_Iteration objects,
the Smith Waterman program may operate on n
columns in parallel.
Hence n cells are computed every clock cycle.
Each Virtex II 6000 can support 64 iteration
objects

Page 22
23
The ImplementationPipeline Primitives Inside
the FPGA
Page 23
24
The Implementation Smith Waterman Pipeline
XPR Router
PE2
PE1 (Controller)
PE3
PE4
PE5
PE6
PE7
PE8
X86 System

Bus Controller
XPE Data Distribution
Page 24
25
The Results Rat vs. Human Genetic Code
Page 25
26
The Results Bacteria to Bacteria Comparison
Page 26
27
The Results Informal Statistics

Total Operations / Second
1 Smith-Waterman Step includes
25 Logic Operations (Adds, compares, mostly 26-27
bit ops, some single bit ops)
13 Data Reorder Operations (Move, Combine)
11 Data Stor (Assignment)
Logic Operations Only
25 Ops 25Mhz 448 Smith-Waterman kernels
280Billion Operations / Second
Logic Data Operations
49 Ops 25Mhz 448 Smith-Waterman kernels
550Billion Operations / Second
Total Aggregate Communications Bandwidth of
Systolic Array
12 88 25Mhz 26.4 Gb/s plus 7 22 50Mhz
7Gb/s 34.1 Gb/s
Resources Consumed / Resources Available
PE2 PE7 60 to 70 consumed
PE1 20 consumed XPE 5 XPR .1
Compilation time
Gates 70 Million Total
Time to compile 20 Minutes
Power Consumption
Meter50 Watts

Page 27
28
Summary Conclusions

This Viva prototype of the Smith-Waterman
algorithm demonstrates that the algorithm can be
parallelized for fast operation in an FPGA system
and validates the usage of FPGAs to increase the
speed of the Smith-Waterman algorithm compared to
clusters
Speed of the Prototype
An HC-62 has the bandwidth to pass cells between
7 FPGAs, allowing for 448 parallel SW_Iteration
objects
At a conservative 30 Mhz system clock speed, this
gives 30,000 448 13.4 Billion Smith Waterman
steps/second.
Opportunities to further optimize the algorithm
include
Increasing the number of SW_Iterations that can
be done in parallel (up to 100 Billion Smith
Waterman steps/second)
Increasing the clock speed of the hardware (up to
1 Trillion Smith Waterman steps/second)